[microformats-discuss] FYI: two posting about the Semantic Web, the "SynWeb", scraping and microformats

Tue Oct 25 04:23:20 PDT 2005

On 10/25/05, Ian Hickson <ian at hixie.ch> wrote:
> On Tue, 25 Oct 2005, Danny Ayers wrote:
> > >
> > > RDF is optimised for computer consumption. HTML is optimised for human
> > > consumption. Therefore we should be using HTML.
> >
> > Sorry, that doesn't work. RDF is optimised for modelling things and
> > relationships between them. HTML is optimised for rendering documents.
>
> Humans don't model things and the relationships between them, generally.
> I'm not sure what you mean by "rendering documents". HTML is not about
> rendering, it's about semantics, and at this point is no longer
> particularly limited to documents either (applications are often done in
> HTML, for instance).

Ok, I was a bit hasty in my use of "rendering". On second thoughts,
I'll accept your distinction:

> > > RDF is optimised for computer consumption. HTML is optimised for human
> > > consumption.

But whether we use RDF or HTML depends on what we're doing.  HTML is
good at  representing document structures, but pretty hopeless at
express highly interconnected data structures. With RDF it's the other
way around.

> If you want to make the data available in a more "raw" form (to enable you
> to make custom queries against the data in a way that a form wouldn't let
> you) then you're going to have to solve two problems:
>
>  1. How to present the raw data to the user in a way that is at least as
>     understandable as the current form approach, and
>
>  2. How to allow you to run these custom queries on the data in a
>     reliable, standard way that isn't site-specific and doesn't require
>     you to download the entire data set (since there could be gigabytes
>     or even terabytes of it).

Ok, this seems a reasonable request.

> Neither of these requirements says anything about the data model, format,
> or syntax; those are all quite secondary.

Ok, the specifics are secondary, but with current tools you would need
some kind of data model to be able to formulate queries. To
interoperate with the user a way of managing the presentation will be
needed, all arrows point to a markup language.

Both of these problems _must_ be
> solved before anyone will consider moving to something other than HTML
> forms, IMHO, because otherwise they won't see a compelling reason to move.
> They must have a compelling reason to move because it will be a very
> expensive process to do so.

For 1, I don't see any immediate need for moving beyond HTML forms (at
least not far beyond). But forms in a browser are the presentation
layer, where I believe current systems could be improved is behind the
scenes.

For 2, the SPARQL protocol and query language is one option. It's a
late arrival on the RDF scene, but couldn't really have appeared until
the data modelling had been figured out, to provide something to
query. But SPARQL isn't the only option. Atom seems to be set to take
a place here as well. But once again, its role isn't directly in front
of the user, it's more of a containership system.

> (If you want the data available in a form other than HTML forms for a
> reason _other_ than being able to query it more effectively, then please
> explain the use case, as I don't understand it yet.)

I think you're probably right, querying is probably *the* big reason
for putting data in anything other than document-oriented markup.
Storage and processing go hand in hand with querying. There's also
machine-machine interchange, which could also be seen as a special
case of querying.

My personal use case (generalised) is that I want to be able to do
data processing with material from a variety of sources in a uniform
fashion. I could use an RDBMS for this, I could use an object-oriented
programming language. But neither of these on their own allow me to
work uniformly on the data without have to adjust the table or object
definitions every time I want to support a new kind of data. RDF
provides flexibility, and the use of URIs means it fits nicely over
data sourced over the Web. The tools I'm currently using are actually
based on an RDBMS and OO language (Redland toolkit with MySQL backed,
coding in Python), but having the RDF model available makes life a lot
easier.

But there's no actual conflict here between what you're saying about
HTML and the approach I'm taking. The majority of the input to my
system is HTML (though some of that's wrapped in RSS/Atom), the
primary human interface with the system is HTML-based. The base data
model is RDF, with mappings from domain-specific models.

Cheers,
Danny.

--

http://dannyayers.com