h2vx: Difference between revisions
(verified latest issue) |
|||
Line 69: | Line 69: | ||
Google Reader won't subscribe to any h2vx hCalendar files due to robots.txt. [[User:TomMorris|TomMorris]] 15:33, 1 June 2011 (UTC) | Google Reader won't subscribe to any h2vx hCalendar files due to robots.txt. [[User:TomMorris|TomMorris]] 15:33, 1 June 2011 (UTC) | ||
:Google Calendar also fails because of their [http://h2vx.com/robots.txt robots.txt] which disallows robots from fetching and therefore caching the ical files. [[User:Jayvdb|Jayvdb]] 22:16, 5 May 2012 (UTC) | :Google Calendar also fails because of their [http://h2vx.com/robots.txt robots.txt] which disallows robots from fetching and therefore caching the ical files. [[User:Jayvdb|Jayvdb]] 22:16, 5 May 2012 (UTC) | ||
Apparently, I was correct in thinking Google would likely have a unique user-agent specifically for calendar fetches (see e.g. quote below) but they don't. It's not on [http://support.google.com/webmasters/bin/answer.py?hl=en&answer=1061943 the UA list] and I tested empirically (`sudo nc -v -l -p 80`) and it is a generic UA. ;( Someone should retest Google Reader to see if it works. | |||
<blockquote>Google has several other user-agents, including Feedfetcher (user-agent Feedfetcher-Google). Since Feedfetcher requests come from explicit action by human users who have added the feeds to their [http://www.google.com/ig Google home page] or to [http://www.google.com/reader Google Reader]], and not from automated crawlers, Feedfetcher does not follow robots.txt guidelines. You can prevent Feedfetcher from crawling your site by configuring your server to serve a 404, 410, or other error status message to user-agent Feedfetcher-Google.[http://support.google.com/webmasters/bin/answer.py?answer=178852 More information about Feedfetcher.]</blockquote> <ref>http://support.google.com/webmasters/bin/answer.py?hl=en&answer=182072</ref> --[[User:Jeremyb|Jeremyb]] 22:34, 12 June 2012 (UTC) | |||
:On second thought, I tested w/ a real webserver (+tcpdump) to see if /robots.txt was fetched with a different agent than the actual feed. no such luck. | |||
<pre>From: googlebot(at)googlebot.com | |||
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)</pre> | |||
:--[[User:Jeremyb|Jeremyb]] 22:34, 12 June 2012 (UTC) | |||
=== Date incorrect when not using abbr element for dtstart === | === Date incorrect when not using abbr element for dtstart === |
Revision as of 22:34, 12 June 2012
<entry-title>H2VX</entry-title>
H2VX is a production deployment of the X2V hCard and hCalendar conversion transforms.
It converts hCard contacts and hCalendar events on web pages to .vcf and .ics respectively for use in desktop and other client software applications.
documentation
To convert hCards to vCards, go http://h2vx.com/vcf/ and enter the URL to the hCards.
To convert hCalendar to iCalendar, go http://h2vx.com/ics/ and enter the URL to the hCalendar events.
URLs
Links to H2VX.com to convert a URL (like http://microformats.org/wiki/events ) can be constructed as follows:
You may omit the leading "http://" from the URL to be converted for a briefer more readable URL:
- download vCards from hCards
- http://h2vx.com/vcf/URL
- e.g. http://h2vx.com/vcf/microformats.org/wiki/events
- download iCalendar from hCalendar
- http://h2vx.com/ics/URL
- e.g. http://h2vx.com/ics/microformats.org/wiki/events
- subscribe to hCalendar from hCalendar
- webcal://h2vx.com/ics/URL
- e.g. webcal://h2vx.com/ics/microformats.org/wiki/events
- http://h2vx.com/ics/sub/URL for systems which don't support auto-linking of webcal: URLs, e.g. MediaWiki, Twitter.
- e.g. http://h2vx.com/ics/sub/microformats.org/wiki/events
user agent strings
H2VX uses two user agent strings, when retrieving hCards and hCalendars respectively:
- H2VX contacts proxy (http://h2vx.com/vcf/)
- H2VX events proxy (http://h2vx.com/ics/)
You may see occurrences of these in your web server logs when users of H2VX convert hCards and hCalendar events on your pages.
built
H2VX is built and maintained by Tantek with:
- X2V XSLTs by Brian Suda
- PHP get-contact.php get-cal.php originally written by Brian, updated/factored by Tantek with various improvements.
- PHP common.php (and Javascript common.js) by Tantek which incorporate CASSISv0 open source from http://cassisproject.com/
- XHTML1+CSS+JS front-end design/interface by Tantek (view source of h2vx.com in your browser for more).
open source
H2VX is available on the microformats github:
feedback
Have feedback on H2VX? Feel free to add to the top of this list and use ~~~~ to sign your name and date your comment. If this grows too big we can move it to h2vx-feedback
- This calendar: http://www.ustreetmusichall.com/calendar results in a ics file that gives Error at line 11: Unparseable date: "T220000" when imported into Google calendar
- 2012-154 verified with both h2vx.com and dev.h2vx.com. Page uses value class pattern, in particular, empty span technique which seems valid. Need to make a test case of this to isolate and track down. - Tantek 19:42, 2 June 2012 (UTC)
<span class="start dtstart">
<span class="value-title" title="2012-06-02T22:00:00-04:00"></span>
10:00 pm
</span>
- Have been getting an error where it says that its a "empty document; no HTML can be found" from my site. As with the comment below, there are no issues with using Operator to extract. I used to use this service a good while ago, but since moving to PHP/HTML5 for this page, I'm getting this error. Antoine RJ Wright 04:17:49, 7 January 2010 (UTC)
- The last few days I've not been able to retrieve vCards from hCards on this site using H2VX yet Operator is nicely extracting them. Comments and insights appreciated. ChipD 05:13, 12 November 2010 (UTC)
- Can't seem to get #hcard-ids working with the /referrer option. Jnpcl 00:01, 14 October 2010 (UTC)
- Could you provide the URL you are having trouble with? Tantek 18:51, 29 October 2010 (UTC)
- It would be very useful to be able to POST or GET an HTML snippet to request a conversion. I have created a Javascript button that will easily rip out the code and send it to H2VX.com: http://1daylater.com/H2VX_snippets.html useful for dynamic or password protected webpages
- As a Web page author I find the H2XV site a bit awkward to use -- it's difficult to find the URLs to use in my Web page. As an end-user it fine to have the H2VX bookmarklets in my toolbar, but as a page author I can't be sure everyone has the bookmarklets or Operator installed. Bob Jonkman 00:56, 10 November 2009 (UTC)
- "Also, a short 'about' page would be worthwhile IMO, especially for adding to the homepage." - Norm on microformats-discuss.
- ...
issues
Found a problem with H2VX? Please note it here at the top of this list (consider grouping it under an existing subhead or introduce a new subhead if necessary) and use ~~~~ to sign your name and date your comment. If this grows too big we can move it to h2vx-issues
robots.txt prevents subscription in Google Reader
Google Reader won't subscribe to any h2vx hCalendar files due to robots.txt. TomMorris 15:33, 1 June 2011 (UTC)
- Google Calendar also fails because of their robots.txt which disallows robots from fetching and therefore caching the ical files. Jayvdb 22:16, 5 May 2012 (UTC)
Apparently, I was correct in thinking Google would likely have a unique user-agent specifically for calendar fetches (see e.g. quote below) but they don't. It's not on the UA list and I tested empirically (`sudo nc -v -l -p 80`) and it is a generic UA. ;( Someone should retest Google Reader to see if it works.
Google has several other user-agents, including Feedfetcher (user-agent Feedfetcher-Google). Since Feedfetcher requests come from explicit action by human users who have added the feeds to their Google home page or to Google Reader], and not from automated crawlers, Feedfetcher does not follow robots.txt guidelines. You can prevent Feedfetcher from crawling your site by configuring your server to serve a 404, 410, or other error status message to user-agent Feedfetcher-Google.More information about Feedfetcher.
<ref>http://support.google.com/webmasters/bin/answer.py?hl=en&answer=182072</ref> --Jeremyb 22:34, 12 June 2012 (UTC)
- On second thought, I tested w/ a real webserver (+tcpdump) to see if /robots.txt was fetched with a different agent than the actual feed. no such luck.
From: googlebot(at)googlebot.com User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- --Jeremyb 22:34, 12 June 2012 (UTC)
Date incorrect when not using abbr element for dtstart
I had a date marked up like so <em class="detail dtstart" title="2010-10-20">Wednesday, October 20, 2010</em>. It was not being parsed correctly until I changed it to use <abbr>, but the element shouldn't really make any difference.
.vcf not formed properly
When opening the resultant .vcf files with Outlook, all non-standard characters are not shown correctly, due to the fact that the returned file is not encoded as UTF-8 without the BOM. Thus, these files are useless for use with Outlook - one of the most used e-mail clients.
Can we get a UTF-8 file returned without the BOM?
Missing data/Wrong encoding
- We'd like to use your service for the new version of our location list (Free WiFi Hotspots in Austria) but ran into problems:
- After importing the vCard, the Mac OS X Address book showed only the phone number (not as work), the URL, zip code and city. No name and no street.
- The vCard itself is encoded in ISO-8859-1, although having “CHARSET=utf-8” instructions. The source is also encoded in UTF-8.
- Here's the HTML code, we've been using:
<div class="vcard">
<h2 class="fn org"><img class="photo" src="http://static.freewave.at/logos/testa_rossa_caffe_150.gif" alt="Testa Rossa Caffèbar Logo" />Testa Rossa Caffèbar</h2>
<div class="adr work"><span class="street-address">Mahlerstraße 4 </span><br />
<span class="postal-code">1010</span> <span class="locality">Wien</span><br />
<span class="country-name">Österreich</span></div>
<div><span class="tel work">+43 699 161 616 61</span><br />
<a class="url work" href="http://www.testarossawien.at/">http://www.testarossawien.at/</a><br />
<a class="email work" href="mailto:"></a></div>
<div class="geo"><span class="latitude">48.20275</span>,<span class="longitude">16.37079</span></div>
</div>
Thanks! --Vividvisions 17:25, 4 May 2010 (UTC)
I got the same type of problem with non ASCII content. Don't know which part is responsible, though. --Jean-Luc Geering 2010-05-10
+1, I was coaching the dudes at http://hagreve.com/ implementing hCalendar and they reached the roadblock of having the accented chars wrongly encoded on the .ics They resorted to using other ways of building an ics. :sadface: Thanks. -- andr3
HTML5 support
<meta charset=utf-8>
isn't recognized so the output is double encoded. Greut 11:12, 4 January 2010 (UTC)- new HTML5 elements (such as header, footer, section) are not supported (this is because they are stripped out by PHP Tidy and thus ignored). Tantek 16:13, 19 January 2010 (UTC)
possible solutions:- 2010-09-01 UPDATE: new HTML5 elements and
<time datetime>
preliminary support added to http://dev.h2vx.com/ - try it out - give feedback!
- 2010-09-01 UPDATE: new HTML5 elements and
Please give feedback on the http://dev.h2vx.com/ HTML5 support here:
- Is there a timeout/throttling on requests to dev.h2vx? I've been getting inconsistent returns on ics and webcal requests from the same markup. Don't know what else could be the issue. When it works, it works great though!
- Any throttling we've been adding manually as necessary. What URL are you trying? Tantek 18:04, 14 July 2011 (UTC)
- ...
Previously:
- Possible options
- 1. Use a proper PHP html5lib (being coded by the HTML5 community, but not available/functional yet AFAIK) - still might do this long term.
- 2. Add a flag to the H2VX processing URL which says "I'm a crazy XML person and my markup is 100% well formed XML, please don't tidy, please break and fail to process if it's not well formed".
- in either case, new special HTML5 elements (like time) will require an update to X2V to know to properly handle/parse new semantic attributes (like datetime).
mouse events
- The "what are microformats?" style descriptions only appear on mouse-over of the trigger terms (those with class="term"). It does not appear at all when keyboard navigation is used, making it somewhat inaccessible. The problem here is that the trigger elements are the ones that should receive focus, but not being links they are not in the tabbing order so do not, hence the helper text never appears for keyboard users. Norm 10:39, 6 November 2009 (UTC)
- Quick fix: remove visibility:hidden from .term .info. Andr3
page semantics
- <i class="term"> should be made into <em>'s for semantic reasons. ;) Andr3
not possible to use dtstart with timezone in abbr title
- Adding a timezone to dtstart using abbr pattern leads to The Shining-style debug output repeating “Object is a string”. I tried adding a time with timezone via the value class pattern, and while the vcard downloads the time is incorrect ~~Oli 00:53 15 February 2010 (+09:00)
- Oli, could you provide a URL to a live example/test case that you were using so we can test with it to try to see exactly what is going on? Thanks! Tantek 17:35, 15 February 2010 (UTC)
resolved
Resolved issues are moved to this section. If this grows too big we can move it to h2vx-issues-resolved
- ...
- 2009-11-11 We were using the Technorati hosted service. Surprised to see it redirected to H2XV, took a minute to realize what was going on. Thanks for picking up the service! Both the hosting provider and the new user agent are blocked by default on our side to prevent scraping. To be more transparent, maybe you could change the UA similar to the old one: from "Technorati contacts proxy (http://technorati.com/contacts/)" to "H2VX contacts proxy (http://h2vx.com/vcf/)" DineMonkey 15:47, 11 November 2009 (UTC)
- I've updated the user agent strings per your recommendation and documented them above as well. Tantek 18:29, 11 November 2009 (UTC)
- H2VX contacts proxy (http://h2vx.com/vcf/)
- H2VX events proxy (http://h2vx.com/ics/)
- I've updated the user agent strings per your recommendation and documented them above as well. Tantek 18:29, 11 November 2009 (UTC)
closed
Once a resolved issue has no further actions (and ideally is verified by the issue reporter), it can be closed and moved to this section. If this grows too big we can move it to h2vx-issues-closed
- 2009-11-04 It would be good to have the option to pass "referer" instead of a URL. Adactio 10:47, 4 November 2009 (UTC)
- http://h2vx.com/vcf/referrer and http://h2vx.com/ics/referrer are now up and running. Example of /vcf/referrer is live on http://tantek.com/ and the alternate spelling /vcf/referer is used live on http://clearleft.com/ - as such, closing this issue. Tantek 20:25, 5 November 2009 (UTC)
- ...
There is at least one related H2V service that uses the same X2V XSLT files as H2VX:
old
Previously Technorati hosted X2V conversion services:
- feed.technorati.com/contacts - for hCards to vCards
- feed.technorati.com/events - for hCalendar to iCalendar