[microformats-discuss] FYI: two posting about the Semantic Web,
the "SynWeb", scraping and microformats
danny.ayers at gmail.com
Mon Oct 24 18:49:38 PDT 2005
On 10/25/05, Ian Hickson <ian at hixie.ch> wrote:
> On Tue, 25 Oct 2005, Danny Ayers wrote:
> > Take a train timetable. Would you prefer 1000 human-readable HTML
> > pages detailing the journeys, or just a form with fields for start and
> > destination, a machine to do the searching for you?
> The former, with computers able to determine what that data means and make
> use of it.
Right yes, and free around-the-world tickets.
> Similar question:
> Business addresses. Would you prefer 10,000,000 human-readable Web pages
> with business names, addresses, etc, or just a form with fields for
> company name and a machine to do the searching for you?
> The former is what we have today. Google Local and other search providers
> have proved that you don't need anything more to get high-quality
> geographic data out of those 10,000,000 pages. (i.e. they can make the
> second from the first.)
In many cases it is possible to turn HTML-expressed, human-readable
data into indexed, machine readable data, sure. Hopefully initiatives
like microformats will take a lot of the guesswork out, move from
scraping to parsing.
But the point I was trying to make was that Google Local or whatever
doesn't (as far as I know) get people to look through all their
documents to find the data for every query, they use some sort of
database. Dr. Ernie was framing docs vs. data as if it was us vs. HAL.
I think there's value in the notion of making data shareable across
the Web, without that data necessarily having that data expressed in
How are Google Maps satellite pictures stored? Pixel-by-pixel CSS classes?
More information about the microformats-discuss