[uf-discuss] Scraping or parsing?
mdagn at spraci.com
Wed Mar 7 13:55:23 PST 2007
>But Michael can, of course, better clarify for himself exactly what
>he was looking for and not finding.
I just thought I might be able to use the profile idea to provide a way to
tell a parser what to look for. If they are not meant for that then that is
my mistake. I just thought I might be able use that to make it more
The "difference" with rel-tag I was talking about is splitting the url and
returning the last part rather than just the whole href attribute.
There is a test version of a perl module I put together here
If you enter a url it should show a dump of what the parser returns.
(it might also show extra stuff as I'm still working on it)
It's not currently using the profiles for parsing rules (that was just an
idea I had at the time) and it still needs lots of work but when I am
reasonably happy with it I'll post a link to the source to uf-dev.
(I'm no expert at writing parsers so don't expect perfection! - I learn by
trying things out)
The idea of it is to try to create something that can handle any snippet of
html you feed it (so that it can be used in a cms with data created by
users) without depending on libraries that people using shared hosting
environments might not easily be able to install and liberal enough to cope
with minor errors in markup (though of course unclosed tags or unquoted
attributes will of course still cause problems)
More information about the microformats-discuss