Ideas for how to support aggregate reviews via microformats.
Common themes amongst examples (that we might want to support)
- Aggregations of reviews always contain these two elements:
- the number of reviewers
- the average rating
number of reviewers, really? unique people? what if there's more than one rating per person? Number of ratings would be simpler.
Google Rich Snippets makes mistake of assuming that number of ratings equals number of reviews, but users may leave rating without leaving review
These are good points -- added a separate section below to discuss ratings vs reviews vs reviewers. Note that the first comment referring to the number of "reviewers" caught a typo -- it should be number of reviews, not reviewers. -- Kavi, Nov 2 2009
- Other elements that occur in the example set include:
- the number of reviews for each rating (i.e. 10 5-star ratings, 7 4-star ratings, etc)
- recurring themes about the entity being reviewed (i.e. "romantic restaurant" or "love the chicken mole").
- who are the reviewers (i.e. "critics" or "users"). Some sites (i.e. Rotten Tomatoes or GameSpot) have multiple sets of aggregate reviews to cover both critics and users.
- In addition, some elements already present in the hReview schema exist in aggregate reviews as well:
- review summary/description
- most recent date reviewed
Proposal discussed over IRC
What is the proposal?
- Define a new microformat for aggregate reviews (root class name "hreview-aggregate").
- The format will contain only value (the number of reviews) with a new property "count" and embedded hReview 0.4 (in progress) properties that contains details like the average review score, summary, and a reference to the object of the review.
This proposal should be written up on a separate page as a microformats draft, e.g. hReview-aggregate 0.2.
Why was this proposal preferred?
- Creating a new uF rather than extending hReview doesn't require branching the spec for hReview and provides clean separation in case we want to extend the new format to include other data in the future
- Supporting only the number of reviews (rather than scores per rating, etc) is probably sufficient for 80% of sites with aggregate reviews.
Other proposals suggested
1) Do nothing. Aggregation must be done by the microformats parser
- Pros: Doesn't require any change to the existing microformats definitions
- Cons: Very difficult for parsers. Reviews for a single entity are usually not limited to a single web page (there are typically no more than 5-10 reviews per page), so aggregating this data would require the parser to figure out which pages to crawl to assemble the aggregate scores.
2) Extend existing hReview format to include "reviewcount"
- Any hReview that contains a reviewcount field (which denotes the number of reviewers) would implicitly refer to an aggregation of reviews. The rating would correspond to the average rating of all individual reviews, summary/description refer to a summary of overall sentiments from the reviews, date refers to the most recent review's date.
- Pros: very simple addition to the existing microformat
- Cons: Mild overloading of what an hReview contains -- a review can now correspond to a single user's review or an aggregation of user reviews.
3) Define a new microformat type for aggregate reviews
- This type could contain the staples -- average review score and number of reviewers -- as well as some of the other sometimes-used features listed in the "common themes" section earlier.
- Pros: robust way to mark up many elements of aggregate review information
- Cons: some redundancy with hReview. Extending hReview might be sufficient
4) Do not use hreview classes in hreview-aggregate
- This causes a collision when hreview includes an hcard that contains review aggregates per google specification http://www.google.com/support/webmasters/bin/answer.py?answer=146645 . In that case, the hreview has its own rating, and a second rating imported via the include pattern.
Currently Yelp implements review aggregate as an hreview-aggregate block that include the entire hcard inside it, and the aggregate rating. Importing this hcard from an hreview using the include pattern, imports the rating as well.
- review-aggregate can be included INSIDE hcard block, or can surround that block.
- review-aggregate can point to a self contained hcard (include pattern):
- Without repeating any of the information in the hcard
- Without including empty links, example
<a class="item include" href="#my_business_hcard"></a>
- Without including links with duplicate redundant information that is part of the hcard, example
<a class="item include make_me_invisible_to_user" href="#my_business_hcard">Business Name</a>
- Without adding listing information, such as the type of listing:
- Without using non-semantic HTML, such as object tag.
- An hreview should be able to safely import an hcard that may contain hreview-aggregate without name collitions especially the rating tag (using include pattern).
- An hreview should be able to safely import an hcard that may contain nested elements of hreview-aggregate, such as count and average rating, without name collitions (using include pattern). If this is not possible, pages will have to default to non semantic HTML, and markup that contains a lot of hidden content, making them less accessible.
Reviews vs ratings
Note -- this proposal has been incorporated into the hreview-aggregate draft spec version 0.2 in December 2009.
Adoption of hreview-aggregate has been strong, but one issue that arisen is the notion of reviews vs ratings. The original hreview-aggregate spec as described below uses a single property called "count" to specify the total number of reviews for an item that contributed towards an aggregate review. This works well for a site like Yelp, where the aggregate rating is based on the number of individual user reviews.
However, many sites allow users to provide a rating without actually writing a review. So there should be some way of allowing sites to mark up the number of "ratings" (or "votes") separately from the number of "reviews."
A couple examples of sites that have separate rating counts and review counts:
- Urbanspoon: http://www.urbanspoon.com/r/6/88896/restaurant/Patxis-Chicago-Pizza-Palo-Alto
- Download.com: http://download.cnet.com/Mayura-Chess-Board/3000-7562_4-10525218.html
- Longer list of examples is here: aggregate-review-examples
- Add a new property called "votes" to hreview-aggregate.
- Pros: it is simple for webmasters to understand and is consistent with the terminology that many sites actually use (for example: Urbanspoon, Download.com, IMDb). It also doesn't change the interpretation for any site that has already implemented the existing hreview-aggregate spec.
- Cons: it's not completely obvious that "count" means "number of reviews" whereas "votes" means "number of ratings."
- Add nested sub-properties to the "count" property: "reviews" and "votes"
- Pros: terminology is clearer than the proposed solution above. For backwards compatibility, if users don't specify reviews vs votes, assume they meant reviews by default.
- Cons: in practice, adding additional layers of nesting creates more chances for webmasters to make mistakes. More nesting is also more verbose.
Note that we shouldn't remove the ability for people to mark up the actual number of reviews -- for anyone who wants to read reviews, it's very useful to know how many reviews there are on a page even if there are more votes contributing towards an aggregate rating.