[uf-discuss] Storing Microformats

Christopher St John ckstjohn at gmail.com
Tue Sep 18 07:21:14 PDT 2007

On 9/18/07, ken <gebser at speakeasy.net> wrote:
> The problem with *not* storing the original markup is that, if there are
> changes in the "standards" (which it seems there certainly will be), you
> won't know which of your data need to be changed.

If there are changes in standards/conventions, then you'd probably
want to re-scan the originals in any case. Caching the originals may
or may not be a good optimization strategy, but the basic idea is that
the semantic content is what's important. If I were doing an experiment
along these lines (as opposed to, say, writing a whole-Internet scale
production system) I'd either:

 a) Have some fun and check out an RDF data store. Flexible, fun,
   and lots of relatively unproven (and therefore interesting) technology.

 b) Be super corporate about it and come up with a relational schema
   that matched the _semantics_ (not the markup) of each microformat,
   then publish the schema to the group for others to use. Kinda brittle
   in the face of standards changes, but it should be clear how to proceed
   and people would probably find the research useful.

(a) is one fairly likely future, (b) is clearly the present, and it all sounds
like an enjoyable way to spend some free time.

If you're doing a real, live, highly scalable production system, then
never mind the above, you've got a whole different set of problems :-)


Christopher St. John

More information about the microformats-discuss mailing list