[microformats-discuss] FYI: two posting about the Semantic Webthe "SynWeb", scraping and microformats

Ian Hickson ian at hixie.ch
Tue Oct 25 11:14:37 PDT 2005


On Tue, 25 Oct 2005, Danny Ayers wrote:
> On 10/25/05, Ian Hickson <ian at hixie.ch> wrote:
> 
> > IMHO, that's the mistake. There doesn't need to be a relational model 
> > of data using URIs as keys.
> 
> So, how will HTTP work without URIs? How will hypertext work without 
> links? The Web is already a graph-shaped data structure (hence the name) 
> with relations between the nodes.

Sure. And if you do usability studies you find that people have trouble 
with the "URI" part of that already.

What I was referring to was the "using URIs as keys" part. People don't do 
that today, not knowingly. They bookmark things, they search for things, 
they write companies names followed by the magical incantation ".com". 
They don't know they are using URIs; those are an implementation detail 
which could be replaced without changing the way people use the Web.


> I'm fed up with having to use a myriad of different applications with 
> limited interoperability between them. As you say, sometimes it does 
> make sense to help the computer a bit. I personally think that the 
> easiest way of doing that is to build on Web components - by which I 
> mean primarily the naming scheme, URIs.

IMHO in that case you are fixating on something that is incidental.

It's like saying "I'm fed up with sleeping in the rain. It makes sense to 
prevent the rain from falling on my living space. I think the easiest way 
of doing that is to build a roof - by which I mean primarily the adhesive, 
nails.". Of course you can build a roof without nails. You could use 
screws. Equivalently, you could build a coherent Web without URIs. You 
could use GUIDs with a central dispatch server, for instance.

I don't think the URI part (or the RDF part) is relevant to the discussion 
of trying to figure out how to make data on the Web more accessible.


On Tue, 25 Oct 2005, Danny Ayers wrote:
> > > > 
> > > > RDF is optimised for computer consumption. HTML is optimised for 
> > > > human consumption.
> 
> But whether we use RDF or HTML depends on what we're doing.  HTML is 
> good at representing document structures, but pretty hopeless at 
> expressing highly interconnected data structures. With RDF it's the 
> other way around.

I disagree that RDF is good at expressing highly interconnected data 
structures. Given a blob of RDF it takes me HOURS to determine what on 
earth it is expressing. In fact I usually have to use an RDF grapher or 
some other computer tool to do it. (Regardless of what serialisation it 
uses, be it XML RDF, n3, RDF/A, or anything else.)

That, IMHO, is why RDF has not been a huge success on the Web.


> > If you want to make the data available in a more "raw" form (to enable 
> > you to make custom queries against the data in a way that a form 
> > wouldn't let you) then you're going to have to solve two problems:
> >
> >  1. How to present the raw data to the user in a way that is at least as
> >     understandable as the current form approach, and
> >
> >  2. How to allow you to run these custom queries on the data in a
> >     reliable, standard way that isn't site-specific and doesn't require
> >     you to download the entire data set (since there could be gigabytes
> >     or even terabytes of it).
> > 
> > Neither of these requirements says anything about the data model, 
> > format, or syntax; those are all quite secondary.
> 
> Ok, the specifics are secondary, but with current tools you would need 
> some kind of data model to be able to formulate queries. To interoperate 
> with the user a way of managing the presentation will be needed, all 
> arrows point to a markup language.

I don't know what you mean.

The idea is that you can go to a site, and the UA will know how to present 
that data, without any hint from the author.

HTML does this, for instance. You can go to any well-written HTML page and 
without a stylesheet or anything, get readable output.


> For 1, I don't see any immediate need for moving beyond HTML forms (at 
> least not far beyond). But forms in a browser are the presentation 
> layer, where I believe current systems could be improved is behind the 
> scenes.

If you use forms for this, then you haven't exposed the data layer, which 
is what you were asking for. Thus forms don't solve 1 above.


> For 2, the SPARQL protocol and query language is one option. It's a late 
> arrival on the RDF scene, but couldn't really have appeared until the 
> data modelling had been figured out, to provide something to query. But 
> SPARQL isn't the only option. Atom seems to be set to take a place here 
> as well. But once again, its role isn't directly in front of the user, 
> it's more of a containership system.

The SPARQL protocol could be an option (I don't see how Atom could do it). 
But that's just half the solution -- you need to be able to present this 
to the user. SPARQL doesn't seem to have a clean way for the UA to 
determine what UI to expose, for instance.

(Also, SPARQL seems unbelievably over-engineered.)


> My personal use case (generalised) is that I want to be able to do data 
> processing with material from a variety of sources in a uniform fashion. 
> I could use an RDBMS for this, I could use an object-oriented 
> programming language. But neither of these on their own allow me to work 
> uniformly on the data without have to adjust the table or object 
> definitions every time I want to support a new kind of data. RDF 
> provides flexibility, and the use of URIs means it fits nicely over data 
> sourced over the Web. The tools I'm currently using are actually based 
> on an RDBMS and OO language (Redland toolkit with MySQL backed, coding 
> in Python), but having the RDF model available makes life a lot easier.

Easier for you, possibly. My mum doesn't want to learn anything about RDF 
models. She just wants to find out if her friend is online or whether the 
bus to get her to the gym leaves at 14:00 or 14:15.

Before the servers will provide arbitrary querying, you need to find a way 
for the user experience to improve over what is currently available. (What 
is currently available being specialised search forms per data set, with 
the data not directly available to query.)

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'


More information about the microformats-discuss mailing list