[uf-dev] Parsing XFN in PHP

Julian Bond julian_bond at voidstar.com
Fri Apr 11 04:12:46 PDT 2008


Continuing a thread that started on the Discuss list.

My experiments have led me to 2 approaches depending on PHP release.
First php5. With error handling left as an exercise for the reader

$url = 'http://ciaranmcnulty.com/';
if($html = @file_get_contents($url)){
  $dom = new DomDocument();
  if(@$dom->loadHtml($html)){
    if ($nodes = $dom->getElementsByTagName('a')) {
      foreach($nodes as $node){
        if ($node->getAttribute('rel')=='me') {
          echo $node->getAttribute('href');
        }
      }
    }
  }
}

Pretty easy, huh? Clearly this same approach could be used for other
values of rel= It's probably not too hard to extend this approach to
find hCard and other uFs.

loadHtml() doesn't exist in php4 dom-xml. In theory it should be
possible to use HTML-Tidy tidy_repair_string to clean the html first and
then feed it to domxml_open_mem. In practice, I'm having real trouble
getting the right collection of tidy_repair_string configuration
parameters to generate clean enough XML for dom to accept it. If that
can be done, then this should work.

$url = 'http://ciaranmcnulty.com/';
if($html = @file_get_contents($url)){
  $html = @tidy_repair_string($html);
  if ($dom = @domxml_open_mem($html)) ) {
    if ($nodes = $dom->get_elements_by_tagname('a')) {
      foreach($nodes as $node){
        if ($node->get_attribute('rel')=='me') {
          echo $node->get_attribute('href');
        }
      }
    }
  }
}

Typical errors are things like:-
- Space required after the Public Identifier
- SystemLiteral " or ' expected
- xmlParseExternalID: PUBLIC, no URI in
- invalid entity nbsp
Maybe, it's possible to get Tidy's output to avoid all these but I
haven't managed it yet. I had a look at hkit but that makes no attempt
to configure the Tidy module so I'd expect lots of problems when trying
to parse arbitrary web pages.

-- 
Julian Bond  E&MSN: julian_bond at voidstar.com  M: +44 (0)77 5907 2173
Webmaster:          http://www.ecademy.com/      T: +44 (0)192 0412 433
Personal WebLog:    http://www.voidstar.com/     skype:julian.bond?chat
                           Tastes Like Milk


More information about the microformats-dev mailing list