hreview-parsing

From Microformats Wiki
Revision as of 18:47, 20 December 2008 by Brian (talk | contribs) (reverting spam)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

hReview parsing

by Tantek Çelik

introduction

When hReview was first being developed, it was clear to me from experience with developing hCard and hCalendar that each property, value, and structure being introduced into hReview was unambiguously parsable, both for the existence of hReviews in arbitrary (X)HTML (and anywhere that arbitrary (X)HTML can be embedded, e.g. RSS, Atom, "generic XML"), and properties and values in general.

The purpose of this document is to capture the specifics of how to parse hReview and all its properties in order to increase interoperability of the format.

status

This document is an incomplete draft. Use the hcard-parsing document for guidance where holes exist. In fact, much of this will be clearly recognized by anyone familiar with hcard-parsing as having been copypasted from that source. At some point (perhaps after writing hcalendar-parsing), I will abstract the common aspects of compound-microformat parsing and write a separate parsing document which will handle all general aspects such as URL handling, looking for the root class name, looking for properties, treating embedded microformats as wrappers etc.

scope

Although this page is written specifically to explain how to parse hReview, the concepts and algorithms contained therein serve as an example for how other compound microformats are to be parsed.

URL handling

An hReview parser may begin with a URL to retrieve.

If the URL lacks a fragment identifier, then the parser should parse the entire retrieved resource for hReview.

If the URL has a fragment identifier, then the parser should parse only the node indicated by the fragment identifier and its descendants, looking for hReviews, starting with the indicated node, which may itself be a single hReviews.

root class name

Each compound microformat starts with a root element with a relatively unique class name. By that I mean a class name which isn't simply a common word, and is unlikely to have been used outside the context of the microformat. By choosing such a root class name the microformat avoids (for all practical purposes) colliding with existing class names that may exist within the (X)HTML context. This is essential to enabling such compound microformats to be embedded inside current, existing content, as well as future content.

Related pages