fhwang.net

Justifying Dauxite

May 6, 2004

<< Detailing a site structure | XML, and what it's good for >>

The content filter is an application of the Decorator pattern in the Gang of Four book, also called Pipelines in a recent Pragmatic Programmers article. It's a powerful pattern, but I didn't realize just how powerful until I implemented Dauxite.

Let's look at an extended example, using the page http://fhwang.net/bio.html, which contains a short biographical description. (And a photo that's really out-of-date. Software can't do everything.) In site.xml the relevant XML is:

<site>
  <directory name="">
    <index_html name="index">
      <rss_retriever domain_class="BlogEntry" count="10" />
      <renderer />
    </index_html>
    <file_html name="bio" content_parent="index" />
  </directory>
</site>

In Ruby, bio.html is an instance of FileHtml. Each instance of FileHtml contains these content filters:

1. FileInput —> 2. PageWrapper —> 3. SiteWrapper

  1. FileInput reads a file containing XHTML and passes that file's contents downstream. This is implemented in Ruby with a simple file reading procedure.
  2. PageWrapper slaps an XHTML breadcrumb on top. It handles the work of climbing up the content hierarchy using the content_parent attribute of each node; in the case of bio.html, its parent is index.html, which has no parent. This is implemented in Ruby.
  3. SiteWrapper wraps its input in a mostly static template which contains site-wide information such as the headers, graphic at the top of the page, and nav section on the right side. This simple step is implemented in XSLT using xsltproc.

Looking up the content hierarchy, we see that bio's parent is http://fhwang.net/index.html, which contains the 10 most recent blog entries. When this Node is instantiated, it contains a different sequence of content filters:

1. RssRetriever —> 2. Renderer —> 3. SiteWrapper

  1. RssRetriever pulls rows out of a given MySQL table and uses them to generate an RSS file. In this case, it will pull out the 10 most recent rows in the BlogEntries table. RssRetriever is implemented in Ruby.
  2. Renderer runs the input through an XSLT file, which in this case is index.xsl. index.xsl loops through the RSS and creates one chunk of XHTML for each <item>.
  3. As described above, SiteWrapper wraps up the whole thing in site-wide information.

This approach has some intriguing consequences:

Decomposing a problem along a new axis can shed light on the domain in ways that lead to other solutions. Halfway through programming Dauxite, I decided I needed a caching mechanism because generating the entire tree was becoming time-consuming. The solution was to have each Node ask its content filters if any of them rely on input that has changed, and therefore need a refresh on their behalf. FileInput, for example, compares the mtime of its data file to the mtime of the last generated copy of the Node's contents. RssRetriever uses the time of the last modified blog entry. SiteWrapper never needs an update on its behalf.

This solution asks: What information does our result depend upon, and how do we know when that information has changed? Such a solution would've been nearly impossible to see if all the generation code was mashed together in one eRuby file or one Amrita method call. Decomposing the generation into a chain of individual content filters made it easy for me to see the nature of the problem, and the solution practically wrote itself.

<< Detailing a site structure | XML, and what it's good for >>