Justifying Dauxite
Home > Writing > Technology > Justifying Dauxite > Hey man, don't bogart that content
May 6, 2004
<< Detailing a site structure
|
XML, and what it's good for >>
The content filter is an application of the Decorator pattern in the Gang of Four book, also called Pipelines in a recent Pragmatic Programmers article. It's a powerful pattern, but I didn't realize just how powerful until I implemented Dauxite.
Let's look at an extended example, using the page http://fhwang.net/bio.html, which contains a short biographical description. (And a photo that's really out-of-date. Software can't do everything.) In site.xml the relevant XML is:
<site>
<directory name="">
<index_html name="index">
<rss_retriever domain_class="BlogEntry" count="10" />
<renderer />
</index_html>
<file_html name="bio" content_parent="index" />
</directory>
</site>
In Ruby, bio.html is an instance of FileHtml. Each instance of FileHtml contains these content filters:
1. FileInput —> 2. PageWrapper —> 3. SiteWrapper
- FileInput reads a file containing XHTML and passes that file's contents downstream. This is implemented in Ruby with a simple file reading procedure.
- PageWrapper slaps an XHTML breadcrumb on top. It handles the work of climbing up the content hierarchy using the content_parent attribute of each node; in the case of bio.html, its parent is index.html, which has no parent. This is implemented in Ruby.
- SiteWrapper wraps its input in a mostly static template which contains site-wide information such as the headers, graphic at the top of the page, and nav section on the right side. This simple step is implemented in XSLT using xsltproc.
Looking up the content hierarchy, we see that bio's parent is http://fhwang.net/index.html, which contains the 10 most recent blog entries. When this Node is instantiated, it contains a different sequence of content filters:
1. RssRetriever —> 2. Renderer —> 3. SiteWrapper
- RssRetriever pulls rows out of a given MySQL table and uses them to generate an RSS file. In this case, it will pull out the 10 most recent rows in the BlogEntries table. RssRetriever is implemented in Ruby.
- Renderer runs the input through an XSLT file, which in this case is index.xsl. index.xsl loops through the RSS and creates one chunk of XHTML for each <item>.
- As described above, SiteWrapper wraps up the whole thing in site-wide information.
This approach has some intriguing consequences:
- You can implement each content filter using whatever approach makes the most sense for the problem at hand. Some of the content filters use Ruby and some delegate the work to XSLT. The node that owns the content filter doesn't care how the processing happens.
- Unit-testing becomes much easier because everything's broken up into small phases. All you do to test a content filter is pass it XML and test the XML that comes out at the other end.
- Reuse becomes as simple as chaining together the right sequence of content filters. Both of the pages described above, for example, have common elements such as the site nav, but index.html doesn't have a breadcrumb. So both pages use SiteWrapper (which adds the site nav) but only bio.html uses PageWrapper (which adds the breadcrumb).
- Any interesting information passed out of a content filter must be reflected in XML, since that's the only information passed between content filters. This might seem quite strict, but in practice I found that it helped to narrow the interface in a way that decreased coupling.
- It's easy to add new sources of input. Pages in Dauxite take their input initially from XHTML files, database tables, and Docbook files, all of which call for different initial content filters. There is even a MemoryInput content filter, which holds a previously processed XML document in memory. (This comes into play for large Docbook files, which are split up in memory and automatically turned into a number of pages.)
- However, when you decompose a problem this way, you end up having to do a lot more naming, and I suspect that this is a difficult task for most programmers. I'm probably better-than-average at this, since I've written so much, but Dauxite still has plenty of naming mistakes in it. "Renderer" is generic, and the "PageWrapper" and "SiteWrapper" aren't distinct enough from each other for my tastes.
Decomposing a problem along a new axis can shed light on the domain in ways that lead to other solutions. Halfway through programming Dauxite, I decided I needed a caching mechanism because generating the entire tree was becoming time-consuming. The solution was to have each Node ask its content filters if any of them rely on input that has changed, and therefore need a refresh on their behalf. FileInput, for example, compares the mtime of its data file to the mtime of the last generated copy of the Node's contents. RssRetriever uses the time of the last modified blog entry. SiteWrapper never needs an update on its behalf.
This solution asks: What information does our result depend upon, and how do we know when that information has changed? Such a solution would've been nearly impossible to see if all the generation code was mashed together in one eRuby file or one Amrita method call. Decomposing the generation into a chain of individual content filters made it easy for me to see the nature of the problem, and the solution practically wrote itself.
<< Detailing a site structure
|
XML, and what it's good for >>