fhwang.net

Justifying Dauxite

May 6, 2004

<< Seeing both sides of the problem | Hey man, don't bogart that content >>

So I wrote Dauxite. When I started I had a couple of design principles in mind:

  1. Logic and presentation should be separate in a way that pleases both my inner OO zealot and my inner metrosexual.
  2. When processing page content, Dauxite should spend most of its energy on creating and transforming XML, as opposed to manipulating in-memory data structures that get turned into XHTML at the last minute.

Since Dauxite is mostly intended for my own use, I'm allowed some major drawbacks:

  1. The program will work by pre-publishing static pages, so it can be slow.
  2. The site has no CGI, so there's no need to worry about responding to user input.

(Currently I have no plans to release Dauxite, 'cause it's still sort of messy, I have my hands full maintaining Lafcadio, and I'm not crazy about the idea of releasing unmaintained code.)

As with previous attempts, Dauxite starts by loading a single XML file which describes every resource on the site. The actual site contains about 100 individual pages, so what's below is just a sample of the file, which is called site.xml.

<site>
  <directory name="">
    <index_html name="index">
      <rss_retriever domain_class="BlogEntry" count="10" />
      <renderer />
    </index_html>
    <directory name="rss">
      <rss name="latest">
        <rss_retriever domain_class="BlogEntry" count="10" />
        <renderer name="rss_strip" />
      </rss>
    </directory>
    <file_html name="bio" content_parent="index" />
    <directory name="art">
      <file_html name="art" content_parent="index" file_name="index" />
      <file_html name="analog" content_parent="art" />
      <file_html name="a1" content_parent="analog" />
      <file_html name="a2" content_parent="analog" />
      <file_html name="a3" content_parent="analog" />
      <plain_text name="mokovaDump"><file_input /></plain_text>
    </directory>
  </directory>
</site>

You can add new pages to the site by adding elements like <rss> or <file_html> to site.xml. These elements are represented in Ruby as Nodes, which are responsible for knowing things like their file's path on the web server, and what their content parents are. (Taken together, all the content_parent attributes describe a hierarchy which is used when generating each page's breadcrumb.)

Some of the node elements contain extra elements like <rss_retriever> or <file_input>. These are represented in Ruby as ContentFilters, which are the building blocks of HTML generation in Dauxite. To render its content, a node chains its content filters end-to-end. Each content filter receives XML input from the previous content filter (except for the first, which has to start from scratch), manipulates the XML, and then passes it along.

site.xml actually hides a lot of its content filter data, because many of Node's subclasses pre-define common combinations of nodes and content filters. For example,

<file_html name="a3" content_parent="analog" />

is the same as

<html name="a3" content_parent="analog"><file_input /></html>

site.xml is a little messy, because it isn't screamingly obvious which XML elements are Nodes and which are ContentFilters. (Content filter elements can't contain other elements, but directories can, and directories are nodes.) If this format were designed by a standards committee it might look like

<node type="directory" name="rss">
  <node type="rss" name="latest">
    <content_filter type="rss_retriever" domain_class="BlogEntry" count="10" />
    <content_filter type="renderer" name="rss_strip" />
  </node>
</node>

but I figure, hey, we're all adults here.

<< Seeing both sides of the problem | Hey man, don't bogart that content >>