fhwang.net

Your HTML could use some X

To much fanfare, David Heinemeier Hansson has just released Rails, and this has spurred lots of meaty discussion among Rubyists about what a web application framework is for.

Although I haven't looked at Rails myself, the fact that it uses HTML + code (like eRuby, ASP, PHP, etc., etc.) is enough for me to not use it. And besides, I've got my personal wierdo experiment going on and it works okay for now.

Much of the discussion veers away from XHTML, though, which isn't much of a surprise. As I've written before, a lot of Rubyists find XML pretty distasteful. But James Britt isn't one of them, and in one of his posts he brings up a possible benefit of avoiding squiggly tags in your templates: Doing so might make it possible to validate those templates as XML.

I'd rather not have to "compile" a page each time I want to verify that HTML attributes are properly quoted, or that there are no syntax errors in the code.

But if you have an XML-driven templating system (like XSLT, sorta), then checking the template itself for validity isn't going to do much good, is it? I mean, you can write valid XSLT that produces complete garbage. (In the same way that your statically typed code can compile and still be incorrect.) If, like James and myself, you're producing XHTML and you're super-concerned with validity, you have to think on at least three levels.

  1. Is this valid XML, with the attributes being quoted and the <p> tags being closed and all that?
  2. Does this conform to XHTML or RSS or Docbook or some other specific document format?
  3. Does this look good? Are the CSS styles correctly applied, are the right elements blockquoted, etc.?

Not to sound like a salesman, but Dauxite more or less has this problem solved, though the result might be more cumbersome than many people want to deal with. (How much is correctness worth to you?)

  1. Documents are turned into XML as soon as possible, and then transformed into different XML documents as they go down the content stream. The transformations are done either with xsltproc (for simpler XSLT transformations) or with REXML (for the fine-grained stuff). So XML validity is guaranteed because both xsltproc and REXML will go kerplooey when forced to consume bad XML.
  2. Before being published, the final document is checked by xmllint, so you get validity against a DTD, which is what DTDs are for.
  3. Checking that the HTML actually looks good is much tougher, of course: You can't write a DTD for intent. I don't test that much on this stage, but when I do I usually end up just writing some greps on the end product. (This is usually done to trap some bug in my XSLT.)

The underlying lesson here is that, for all the hooty-hoo about the Semantic Web and machine readable documents, in 2004 the main beneficiaries of XHTML are not human readers or software agents. The main beneficiaries are the programmers who produce the stuff in the first place, because when you enter XML-land you get to leverage a ton of validity checking tools, and you didn't even have to write a single line of C++ yourself.

(And by the way, why are people talking about terrible CSS examples like <a href="foo" style="color:red">foo</a> ? How exactly is that sort of nasty, crufty inline non-semantic CSS any better than the <font> tag?)

blog comments powered by Disqus

« Previous post

Next post »