[uf-discuss] Worth blogging about W3C's press release today?

Rohit Khare khare at alumni.caltech.edu
Tue Sep 11 21:56:56 PDT 2007


September 11, 2007 10:00 AM Eastern Daylight Time
W3C Completes Bridge Between HTML/Microformats and Semantic Web

GRDDL Gives Web Content Hooks to Powerful Reuse and Data Integration

http://www.w3.org/--(BUSINESS WIRE)--Today, the World Wide Web  
Consortium completed an important link between Semantic Web and  
microformats communities. With "Gleaning Resource Descriptions from  
Dialects of Languages", or GRDDL (pronounced "griddle"), software can  
automatically extract information from structured Web pages to make  
it part of the Semantic Web. Those accustomed to expressing  
structured data with microformats in XHTML can thus increase the  
value of their existing data by porting it to the Semantic Web, at  
very low cost.

"Sometimes one line of code can make a world of difference," said Tim  
Berners-Lee, W3C Director. "Just as stylesheets make Web pages more  
readable to people, GRDDL makes Web pages, microformat tags, XML  
documents, and data more readable to Semantic Web applications,  
opening more data to new possibilities and creative reuse."

Getting Data into and out of the Web; how is it happening today?

One aspect of recent developments some people call "Web 2.0" involves  
applications based on combining — in "mashups" — various types of  
data that are spread all around on the Web. A number of active  
communities innovating on the Web share the goal of sharing data such  
as calendar information, contact information, and geopositioning  
information. These communities have developed diverse social  
practices and technologies that satisfy their particular needs. For  
instance, search engines have had great success using statistical  
methods while people who share photos have found it useful to tag  
their photos manually with short text labels. Much of this work can  
be captured via "microformats". Microformats refer to sets of simple,  
open data formats built upon existing and widely adopted standards,  
including HTML, CSS and XML.

This wave of activity has direct connections to the essence of the  
Semantic Web. The Semantic Web-based communities have pursued ways to  
improve the quality and availability of data on the Web, making it  
possible for more intensive data-integration and more diverse  
applications that can scale to the size of the Web and allow even  
more powerful mash-ups. The Web-based set of standards that supports  
this work is known as the Semantic Web stack. The foundations of the  
Semantic Web stack meet the requirements for formality of some  
applications such as managing bank statements, or combining volumes  
of medical data.

Each approach to "getting your data out there" has its place. But why  
limit yourself to just one approach if you can benefit, at low cost,  
from more than one? As microformats users consider more uses that  
require data modelling, or validation, how can they take advantage of  
their existing data in more formal applications?

A Bridge from Flexible Web Applications to the Semantic Web

GRDDL is the bridge for turning data expressed in an XML format (such  
as XHTML) into Semantic Web data. With GRDDL, authors transform the  
data they wish to share into a format that can be used and  
transformed again for more rigorous applications.

GRDDL Use Cases provides insight into why this is useful through a  
number of real-world scenarios, including scheduling a meeting,  
comparing information from various retailers before making a  
purchase, and extracting information from wikis to facilitate e- 
learning. Once data is part of the Semantic Web, it can be merged  
with other data (for example, from a relational database, similarly  
exposed to the Semantic Web) for queries, inferences, and conversion  
to other formats.

The Working Group has reported on implementation experience, and its  
members have come forward with statements of support and commitments  
to implement GRDDL.

GRDDL Test Cases is also published today, which describes and  
includes test cases for software agents to support GRDDL. The Working  
Group has produced a GRDDL service that allows users to input a  
GRDDL'd file and extract the important data.

About the World Wide Web Consortium [W3C]

The World Wide Web Consortium (W3C) is an international consortium  
where Member organizations, a full-time staff, and the public work  
together to develop Web standards. W3C primarily pursues its mission  
through the creation of Web standards and guidelines designed to  
ensure long-term growth for the Web. Over 400 organizations are  
Members of the Consortium. W3C is jointly run by the MIT Computer  
Science and Artificial Intelligence Laboratory (MIT CSAIL) in the  
USA, the European Research Consortium for Informatics and Mathematics  
(ERCIM) headquartered in France and Keio University in Japan, and has  
additional Offices worldwide. For more information see http:// 

More information about the microformats-discuss mailing list