[uf-discuss] Parsing XFN in PHP
gareth at morethanseven.net
Thu Apr 10 06:48:08 PDT 2008
On Tue, Apr 8, 2008 at 1:10 PM, Julian Bond <julian_bond at voidstar.com> wrote:
> I need some advice about reading rel="me" tags in arbitrary web pages using
> PHP. I'm intending to use this to help build a lifestream style function.
> The basic intent is to cut down the amount of data entry the user has to do.
> When they give me a MyBlogLog, Friendfeed, Plaxo Pulse page that has lists
> of links to their profile pages I should be able to avoid having to ask them
> for all of them again. So:-
> - User gives me a URL for one of their profile pages
> - Use Curl to collect the source
> - Parse the source looking for links with a rel="me"
> - Extract an array of Link URL - Link Text
> - Do something useful with the array. (???? followed by Profit!)
> I've been searching this morning for a PHP library to do the parsing and
> link extraction or PHP examples or example regex to use in PREG_MATCH_ALL or
> something/anything, without success. Since the source data is probably badly
> written and broken html, I don't think I can use XML methods as all the XML
> unserialising code I've used barfs on badly formed XML. One possibility I
> suppose is to run it though HTML-Tidy first but I run the (admittedly small)
> chance of html-tidy wiping out some of the links.
> So what do people use to consume XFN with PHP?
Another approach is to use an external service based parser and simply
send it requests. Depends on your exact needs but uFXtract might be
worth a look. Supports lots of formats plus a couple of interesting
concepts (paged datasets, some basic spidering):
Then just use your favourite http request tool in php to make requests
of the service and parse the response (XML or JSON as you prefer)
> Julian Bond E&MSN: julian_bond at voidstar.com M: +44 (0)77 5907 2173
> Webmaster: http://www.ecademy.com/ T: +44 (0)192 0412 433
> Personal WebLog: http://www.voidstar.com/ skype:julian.bond?chat
> Not Tested On Animals
> microformats-discuss mailing list
> microformats-discuss at microformats.org
More information about the microformats-discuss