> I need some advice about reading rel="me" tags in arbitrary web pages using
> PHP. I'm intending to use this to help build a lifestream style function.
> The basic intent is to cut down the amount of data entry the user has to do.
> When they give me a MyBlogLog, Friendfeed, Plaxo Pulse page that has lists
> of links to their profile pages I should be able to avoid having to ask them
> for all of them again. So:-
>  - User gives me a URL for one of their profile pages
>  - Use Curl to collect the source
>  - Parse the source looking for links with a rel="me"
>  - Extract an array of Link URL - Link Text
>  - Do something useful with the array. (???? followed by Profit!)
>  I've been searching this morning for a PHP library to do the parsing and
> link extraction or PHP examples or example regex to use in PREG_MATCH_ALL or
> something/anything, without success. Since the source data is probably badly
> written and broken html, I don't think I can use XML methods as all the XML
> unserialising code I've used barfs on badly formed XML. One possibility I
> suppose is to run it though HTML-Tidy first but I run the (admittedly small)
> chance of html-tidy wiping out some of the links.
>  So what do people use to consume XFN with PHP?

Another approach is to use an external service based parser and simply
send it requests. Depends on your exact needs but uFXtract might be
worth a look. Supports lots of formats plus a couple of interesting
concepts (paged datasets, some basic spidering):


Then just use your favourite http request tool in php to make requests
of the service and parse the response (XML or JSON as you prefer)

