[uf-discuss] Re: Perl microformat parsing
Tatsuhiko Miyagawa
miyagawa at gmail.com
Sat Feb 23 02:46:40 PST 2008
On 2/22/08, Takatsugu Shigeta <takatsugu.shigeta at gmail.com> wrote:
> my $url = 'http://diveintomark.org/projects/greasemonkey/hcard/tests/2-4-2-vcard.xhtml';
>
> my $fn = scraper {
> process '.vcard .fn', 'fn[]' => 'TEXT';
> process '.vcard .tel', 'tel[]' => 'TEXT';
> process '.vcard .title', 'title[]' => 'TEXT';
> result 'fn', 'tel', 'title';
> }->scrape(URI->new($url));
For a better nested output,
use strict;
use Web::Scraper;
use URI;
my $uri = URI->new("http://diveintomark.org/projects/greasemonkey/hcard/tests/2-4-2-vcard.xhtml");
my $scraper = scraper {
process ".vcard", "vcards[]" => scraper {
process ".email", email => '@href';
process ".fn", fullname => "TEXT";
process ".tel", tel => "TEXT";
process ".title", title => "TEXT";
};
};
my $result = $scraper->scrape($uri);
__END__
$VAR1 = {
'vcards' => [
{
'email' => bless( do{\(my $o = 'mailto:jfriday at host.com')},
'URI::mailto' ),
'tel' => '+1-919-555-7878',
'fullname' => 'Joe Friday',
'title' => 'Area Administrator, Assistant'
},
]
};
Well, you get this vard twice because it has nester .vcard but I guess
that's fine :)
Thanks,
--
Tatsuhiko Miyagawa
More information about the microformats-discuss
mailing list