[uf-discuss] Re: Perl microformat parsing

Toby A Inkster mail at tobyinkster.co.uk
Sat Mar 1 05:16:21 PST 2008


Toby A Inkster wrote:

> There are still a few things that I want to take care of before I
> release alpha3 publicly.

Took a bit longer than expected, but alpha3 is here:

	http://buzzword.org.uk/cognition/

The main microformat-related changes for this version are that data found
in microformats can now be output as RDF; rel=enclosure is now supported;
hCalendar support conforms to my own draft hCalendar 1.1 spec (see recent
thread "To-do items?" on this mailing list) including events, todo items,
freebusy and alarms, and supporting RRULE and EXRULE; and I've attempted
to support the tabular event calendar parsing rules described on the wiki
<http://microformats.org/wiki/hcalendar-brainstorming#Tabular_event_calendars>.

I'd appreciate any examples of pages where it fails to properly parse an
hCalendar, hCard, geo or adr. 

I already know it that does occasionally run into issues with characters
sets and unrecognised entities, so you don't need to tell me about that.

Full change log for this version:

- Switch from XML::DOM to XML::LibXML. Should be my last big parser change!
- Restructure object to be more tuple-like.
- URLs:
   - Support for CURIEs.
   - support for geo: and tag: URIs
   - use XPointer to provide URLs for document fragments without identifiers
- RDF:
   - use <rdf:Bag> to wrap multiple tuples with the same subject and property
   - Remove duplicate values within bags
   - add support for microformats to RDF output
   - RDF subjects may have multiple URIs defined to help match up properties
     that actually belong to the same subject (e.g. some properties might be
     attached to a fragment identifier, and others to an hcard, but if we
     know that the hcard root element has an id attribute which matches the
     fragment identifier, then we can equate the subjects)
   - support "vocabularies" for RDF
   - convert document structure to RDF <http://purl.org/dc/terms/hasPart>,
     <http://purl.org/dc/terms/isPartOf>.
- Improve STRINGIFY to prevent all these leading and trailing spaces
- Recognise (X)HTML predefined link types and put them in XHTML namespace.
- More reliable support for namespaces.
- Microformats:
   - Properly parse DateTimes found in microformats.
   - support table cell header pattern
   - support hcalendar 1.1 draft
- Complete support for RDFa
- Much improved support for eRDF, support rdf:type. Any bugs?
- Improved support for XHTML role attribute


-- 
Toby A Inkster BSc (Hons) ARCS
[Geek of HTML/SQL/Perl/PHP/Python/Apache/Linux]
[OS: Linux 2.6.17.14-mm-desktop-9mdvsmp, up 31 days, 19:14.]

                               Bottled Water
          http://tobyinkster.co.uk/blog/2008/02/18/bottled-water/



More information about the microformats-discuss mailing list