[microformats-discuss] FYI: two posting about the Semantic Web, the "SynWeb", scraping and microformats

Mon Oct 24 19:46:25 PDT 2005

On 10/25/05, Dr. Ernie Prabhakar <drernie at opendarwin.org> wrote:
> Hi Danny,
>
> On Oct 24, 2005, at 5:24 PM, Danny Ayers wrote:
> > Going back a few sentences, you talk of "human-readable web pages" as
> > if that's the only source or sink of data, with no intermediation, no
> > computer work allowed. That's the cultural difference here. A lot of
> > information can be represented in a machine-friendly fashion.
>
> Actually, I meant *precisely* the opposite. :-)  Perhaps that is the
> disconnect.
>
> I believe the *intermediate* files are the ones that need to be
> HTML.  That is, my vision of a microformatted future is one where all
> *published* data on the web is in 'salted' XHTML Basic, no matter
> where it came from or where it ends up.

Ah, ok. It'll probably take a couple more cycles before we're on the
same wavelength, but we're getting there ;-)

At this point in time I suspect there has been way too much concern
amongst RDF people about the interchange format. Which is a bit weird,
considering that the thing of value there is the logical model, not
its serialization. In their defence, there are practical issues to
consider, like if you want to pass rich data from one place with
another, how do you do it?  Salted (good word) XHTML may be suitable
for relatively simply structured data, but if you've got something
corresponding to, lets say, ten different relational database tables,
with ten fields each, lots of foreign keys scattered, things are going
to get messy. I'm sure it's possible in principle, but for
machine-machine communications, what is the advantage in using a
format that's been designed for human consumption?

> > Take a train timetable. Would you prefer 1000 human-readable HTML
> > pages detailing the journeys, or just a form with fields for start and
> > destination, a machine to do the searching for you?
>
> Oh, I'd love to have a machine do the searching -- but please, give
> me the results as XHTML table with meaningful class names, so I can
> write an Automator action to run it interactively.  Without having to
> wait for someone to create a *separate* web service, which I'd have
> to learn Yet Another Schema to use.

Right, with that I can sympathise.

> Maybe I'm ill-informed:  my understanding is that the non-microformat
> vision of the Semantic Web was predicated on encoding data in a non-
> HTML, non-human-visible format in order to enable machine parsing.
> Did you mean something else?

Yes and no. There is the issue of shifting complex data structures
around to consider. But the important bits are away from the formats
altogether. A relational model of data that uses URIs as keys is the
bottom line. Doesn't matter what that looks like as a file format, as
long as the URIs are in there somewhere.

Yep, I think there may have been paranoia about dumbing things in the
RDF community, that expressing something so precise as *data* couldn't
be reliably done in something so un-semantic as HTML. But microformats
demonstrate that it can. In the general case I suspect a pug-ugly
format like RDF/XML probably is necessary, certainly convenient for
inter-system comms, where any number of terms might be involved.
Whatever, right now the specs say that RDF/XML is *the* interchange
format. An RDF system will be expected to be able to read and write
the stuff, if nothing else. (It is a very artificial W3C thing, the
Redland RDF kit understands 'tag soup' RSS and has near-arbitrary XML
reading as a feature).

But to make unambiguous, machine-interpretable statements within a
fairly constrained domain (business cards, calendars, reviews)
something like HTML is absolutely viable. The mime type is a bit loose
as a clue for what the machine should do, but referring to a URI as
the profile removes enough ambiguity that the whole thing is
automatable.

With RDF it's taken a long while to get past a fairly procedural style
of parsing and serialization, but now GRDDL (XSLT to something the
system can read) provides a neatly stylable kind of input - to the
machine microformats mean RDF. On the output side there's SPARQL, 
effectively a two-phase thing, the SQL-like query grabs the data of
interest, pumps it out in a specific XML format. But that format isn't
much immediate use, so usually you'd pass that through XSLT to get
something like, well, maybe microformat data.

> To me, the beauty of microformats is that there is *one* data format
> and interface usable by *both* humans and machines -- but the
> machines have to work harder than we do. :-)

Yep. That bit is very appealing. I don't know though, I sit here
typing away, doesn't really seem like the machine's doing much at
all...

Cheers,
Danny.

--

http://dannyayers.com