[uf-discuss] Re: Perl microformat parsing

Michael MD mdagn at spraci.com
Sat Feb 23 01:39:01 PST 2008


>Web::Scraper
>http://search.cpan.org/dist/Web-Scraper/


Interesting ... didn't know about that one...


I had a go at a perl parser for microformats a couple of years ago:

Test version here
http://www.spraci.com/cgi-bin/microformats.cgi


I tried to keep dependencies down to a minimum for this.
It has its own tagsoup parser - no html or xml parsing libraries needed!
(can be used in places where people can't compile anything non-perl)
It won't die if there is a curly quote or other strange entity in there and
it will even cope with the occasional unquoted parameter. (unclosed tags
will still cause trouble but there IS a limit to how liberal something like
this can be!)

It tried to make it handle include-pattern too 
(seems to work but probably needs more testing)

... still a work in progress though... 

The code needs cleaning up (still rather messy - certainly nowhere near a
suitable standard for releasing on CPAN), the parsing rules need more work,

and I still haven't fully decided on the format of its output.

The plan was to one day (when I'm at least reasonably happy with the code!)
to put the source code up somewhere for people to download. 











More information about the microformats-discuss mailing list