[uf-dev] Fwd: (Off-list) Parsing XFN in PHP
Ryan Parman
ryan.lists.warpshare at gmail.com
Fri Apr 11 09:38:54 PDT 2008
Forwarding Geoffrey's off-list message sent to the original thread:
Begin forwarded message:
> From: Geoffrey Sneddon <foolistbar at googlemail.com>
> Date: April 11, 2008 4:45:03 AM PDT
> To: Toby A Inkster <mail at tobyinkster.co.uk>, Ryan Parman <ryan.lists.warpshare at gmail.com
> >
> Subject: Re: (Off-list) Parsing XFN in PHP
>
>
> On 10 Apr 2008, at 18:34, Toby A Inkster wrote:
>> Ryan Parman wrote:
>>
>>> "But we can do it in web browsers!" What do web browsers have that
>>> PHP
>>> developers don't? An HTML parser. As far as I know there are no HTML
>>> parsers written for PHP (or any other language that I'm aware of).
>>
>> http://www.php.net/manual/en/function.dom-domdocument-loadhtml.php
>
> That doesn't really work. libxml2's HTML parsing is nothing like
> what is actually needed for real world compatibility. Just take a
> look at things like <b><i>foo</b>bar</i>, or <plaintext>foo</
> plaintext><b>bar.
>
>
> On 11 Apr 2008, at 08:33, Toby A Inkster wrote:
>> Another option is XML_HTMLSax3 from PEAR:
>> http://pear.php.net/package/XML_HTMLSax3
>
> This really seems like nothing more than a subset of SGML similar to
> XML, and is therefore equally useless at parsing HTML. See the above
> two examples again, as well as things like <b<i>hi</i></b> (note the
> omitted >).
>
> Real world HTML content really does rely on specific parsing rules,
> and attempting to deviate from them will just result in issues. In
> terms of anything useful, you'd really need to implement your own
> HTML parser, likely starting from HTML 5. Then you can run into
> issues with DOM requiring XML well-formedness, so you can't have as
> a localName "a@" (to reuse the example on public-html a few days
> ago, you need to parse <a@> <a#> </a@> correctly, despite all those
> tags having characters that you can't legally store in the DOM)
>
>
> --
> Geoffrey Sneddon
> <http://gsnedders.com/>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://microformats.org/discuss/mail/microformats-dev/attachments/20080411/bd79448b/attachment.html
More information about the microformats-dev
mailing list