Sunday, November 19, 2006

XML Sucks?

I searched the Internet and found something funny about XML. As we all know that we've discussed about XML before. We said it's the ultimate solution to problems brought by HTML. Well, some guys don't think so.

Generally speaking, though XML brings us tremedous benefits over HTML, inevitably it has got some defacts. The sum up list is listed below:
  • XmlIsTooComplex for what it does.
  • It's too hard for programs to parse and too verbose and unreadable for humans to write.
  • The benefits of "everyone is using XML, so we should too" are usually outweighed by the costs of time, training and mistakes involved in understanding it.
  • Because it's increasingly used for data interchange, it is promoted as a data storage model. XML is only a data encoding format.
  • or just comments wrapped around data. Too much comments and symbols.
  • , when they could just be comments instead.
  • Encourages non-relational data structures
    • ie. Data is not even in 1st normal form let alone 5th.
  • Poor OnceAndOnlyOnce syntax factoring
  • It's a poor copy of EssExpressions
  • It is ExtremelyInterstrangled.
  • Perhaps worst of all too many programmers don't understand the need for data description languages with broad support.
  • Transformations, even identity transforms, result in changes to format (whitespace, attribute ordering, attribute quoting, whitespace around attributes, newlines). These problems can make "diff"ing the XML source very difficult.
I picked up several for details (affirmative for XML is in italics. Opposite is in normal form.):

XML is too hard for programs to parse and too verbose and unwritable for humans.

It's not too hard for programs to parse - XML is a subset of SGML, which is well understood and well implemented, and because it's more rigorous than HTML it's easier to parse than HTML, which is a solved problem. It's not too hard for humans, by a long shot; a well-written DTD is a cakewalk to write in.

Tedious rather than hard. It takes more time and code to extract the information you want from XML than it does to have the information formatted in flat files. Parsing flat files is easier than processing DOM unless tools are provided.

Well, this is certainly true. You get an old argument of the virtues of (new thingy) over (old thingy). People thought HTML was silly in the light of Gopher, which was flat text, easier to write, edit and parse, and faster to transmit; over time they were shown to be incorrect (correction: over time they were shown different means serve different purposes). XML provides a mechanism for us to provide a parsable definition of document structure, which means that unlike CommaSeparatedValues or similar setups, the software doesn't have to know the document's structure ahead of time
(given an XML parser; magic? Fact: xml is a document format; The use of DOM and IPC is the key to the success of XML (see SOAP). File space requirements matter less every day (tell that to a CPU designer, and he will laugh loud), and though not trivial, XPath and XSLT are important features over and above what CSV provides. For many applications it's overkill. So is sending readme.1st files in RTF.


The benefits of "everyone is using XML, so we should too" are usually outweighed by the costs of time, training and mistakes involved in understanding it.

What are those costs? Many people said this about HTML, but frankly it's just not that hard - commands go in angle brackets, slash means off, i for italic, hit save, you're done. Technical workers can handle that, and XML is no worse (if they need to write their own DTDs, that's a worse, but give that job to qualified staff. Training: everything takes training.

Some things more than others.

Most things more than XML.


Because XML is increasingly used for data interchange, it is too easily promoted as a data storage model. XML is only a data encoding format.

It's not designed as a data storage model, although models can be built on top of it. Compared to older ASN.1 (correction: ASN.1 is really only a language for defining protocols; actually the protocol defined in ASN.1 can use XML as its data transfer format) or GIOP, such XML models suck. Inherent limitations make them unsalvageable. But many folks confuse storage and exchange. XML must be concrete enough for light-weight programs to parse; the same data may be described in many ways, and different XML representations are suitable for different tasks, in opposition to the OnceAndOnlyOnce goal. In contrast the relational model and SQL use a canonical representation not favoring a particular task. In particular, many to many relationships are problematic in XML. We have gone back to the sequential text file model at the expense of the kind of abstraction we gained when moving from COBOL to SQL. If you really want to process data sequentially, COBOL is a far better tool than XSLT applied to XML - but sensible people use SQL. XML should just be used for transport, and there should be a canonical representation (schema) of the relational model. A simple subset of SQL could be implemented to operate on this representation to allow programmers to extract data. Imagine how much simpler life would be if instead of writing XML parsers, and editing enormous, complex and verbose text files by hand, we had a simple SQL-style interface. In fact - I think I'll write one! (that will be easier than XPath and XQuery?)
-- Tim Glover (ed SkipSailors)

Why do people insist on complaining that XML doesn't do this or XML doesn't do that, when XML is just supposed to be a data storage and transport mechanism? And now this comparison to COBOL? COBOL?!? Oy, vay!

XML isn't a database language per se. It is a means of expressing data in a tree structure. If you need flat storage of your data for relational reasons and you don't feel like parsing out an XML file full of relational data items then how about using something other than XML? Although any data can be stored in an XML format; it's just a matter of designing the storage translation in and out. XML reliably transports the stored data for you .

XML is a means of storing data in a tree structure and can express relationships. The XML community try to push it far too far. XML databases are a silly idea. XSLT is a silly idea. When you start embedding Java in XML a la Cocoon you know you've gone completely bonkers. I have another problem, XML has to be processed by a computer program eventually, be it xslt, java, whatever. Because XML is very concrete and highly non-canonical it introduces a very strong coupling between the actual representation chosen and the processing program, to which I object. You cannot change your XML DTD to optimize a particular task without having to rewrite all your existing programs. I don't think this has really hit home yet - but it will. It is going to cause BIG problems. SQL solves this problem by providing an abstract interface to the data. My comparison with COBOL was with XSLT, a programming language written in XML for XML, not with XML itself. They are very similar - XML elements correspond to sequential file record types. XML attributes correspond to COBOL data division templates (conceptually at any rate. COBOL is very concrete in its layout of data attributes). COBOL has the great advantage over XSLT that it provides a very clean separation of program from data. In XSLT these are hopelessly confused, which causes much of the difficulty in reading and understanding it. Thanks for engaging in this with me - I find it a useful and constructive discussion.

-- Tim


XML is not a good basis for developing data models. It is not a shortcoming of XML, rather a problem that engineers pick the wrong tool for the job so often. Don't use a screwdriver as a crowbar.

_____________________________


To sum up, though XML is not a new technique, the usage of it is still controversial. Because it's flexibility and power, this technique has been applied to multiple areas for data transportation, storage and manupulation. It shows great potential to replace techniques such as SQL and flat files. However, when we try to apply XML as a universial solution, it shows great defacts--after all, XML wasn't designed to do those. It's so powerful that people almost forget what it is for at the first place. So when we try to apply XML to our works, think clearly how well it would be used. If there is a more mature, convenience technique, don't use XML.

1 Comments:

At 12:25 PM, November 20, 2008, Blogger Steve sculpts critters said...

I see you like contests.
Have a go at winning one of my unusually popular bronze mice (I'm an artist). Details are on my blog.
You can call him 'XML mouse' if you win!
Good luck,
Cheers,
Steve.

 

Post a Comment

<< Home