[uf-discuss] Microformats and keyword spamming
lists at ben-ward.co.uk
Tue Jun 2 01:42:14 PDT 2009
On 2 Jun 2009, at 00:01, Elli Albek wrote:
> 1. Non semantic HTML. The pages include a lot of repeating terms in
> wrong places. There is no way to avoid it if we want to use hreview.
Repeated content is of course undesirable, but please don't conflate
it with being “non semantic”.
> 2. The pages become less accessible, since the HTML starts deviating
> its semantic form. For example, include pattern requires:
The include pattern offers you choices to reference other content in
the page. It's not perfect, I agree. I would appreciate some clarity
on your problems below, though:
> a. object tags (this should ALWAYS be avoided at all cost!!!)
We are aware of and have documented scenarios where `object` is
problematic, but that doesn't equate to ‘ALWAYS be avoided at all
cost’. If you have a new case with the `object` version is as horrific
as you make out, please document it on http://microformats.org/wiki/include-pattern-issues
. If not, please keep your descriptions concise and free from
overreaction, as it confuses attempts to assist you.
> b. empty A tags
These are pretty much outlawed by the spec now, based on previous
> C. A tags with redundant information that is constantly repeating on
> page. This is what we currently do as the lesser of all evils.
This is a compromise, of course. Is it really evil though? Suboptimal,
sure, but evil? You'd be repeating the name of the restaurant
somewhere. It can be fit into the structure cleanly, and it can be
hidden if you want.
<h2 class='summary'><a class='include' href='#item'>The Alembic:</a>
Great food and cocktails!</h2>
Yes, not perfect. But can be designed in cleanly.
> 3. Needing to repeat so much text on the page affects search engine
> Repeating terms that are almost irrelevant to the page, like
> <span class="type">business</span>
> on each and every hreview.
The review 'type' is optional. If it doesn't fit your publishing
pattern, just leave it out.
> 4. Repeating important keywords on the page TOO MANY TIMES, such as
> business name on every review:
> <span class="item">
> <a class="include" href="#review_item">Maharishi Ayurveda Health
> This added the page main keywords so many times that I suspect it
> keyword spamming in the eyes of the search engine.
Can you provide more info than that you ‘suspect’? Obviously that's a
serious issue if true, but these hyperlinks are linking to fragments
within the same page, not to other resources. To the best of my
knowledge, that has nothing to do with establishing keywords for a
page. Is there documentation to the contrary?
> 2. Use Google's cut down version of microformats. This may not
> follow the
> spec, but if we follow google most of our problems are solved. What
> I like
> about many google APIs is their practical approach. In that case I
> their view of microformats is more practical then the spec. It
> solves a lot of our problems.
Again, can you provide a link to this? Google's Rich Snippets
documentation provides some small examples of hReview, but links to
the microformats spec as the definitive reference. I'm unaware of any
‘cut down’ version of microformat specs from Google, nor can I see
anything in their examples suggesting this.
> 4. Direct feed to search engines in proprietary formats. We will still
> support hcard for the business directory, but will remove support for
> hreview since this is the major source of problems.
Since search engines don't currently accept any alternative ‘direct
feed’ format I don't quite know what option 4 is supposed to entail,
and again, precise clarity of major problems makes the issue easier to
> Advice is welcome and appreciated.
> I would really appreciate:
> ** An example that shows how to build a REAL reviews page. **
> 1. That page includes the business information once and only once on
> page, where the business name is in H1 (in hcard).
> 2. That page includes review aggregate, which does not require any
> repetition or hidden text.
> 3. A few reviews of the said business, WHICH DO NOT REQUIRE ANY
> of any item information, the type of business, etc.
> 4. Business name shows once and only once on the page in the hcard and
> nowhere else.
> 5. Listing type (business/product) shows once and only once on the
> Natural place in the hcard, but according to the spec it is not
I'm now a little confused. Up to now you're talking about hreview, and
now you refer to hreview-aggregate.
As far as I'm aware, the `hreview-aggregate` can be created without
any repetition or hidden text. The item will be a child of the
The subsequent reviews would need to use the include pattern to
reference the original item to avoid major repetition. Currently,
there is no other way.
Now, I've asked questions here because you've been quite imprecise
with some of your complaints, and I need to be clear about where
you've found an explicit, documentable problem and where you're just
expressing frustration. Please don't get me wrong, I completely agree
that the include pattern is imperfect. And I do think better
structural parsing should be pursued.
But, with regard to helping your problem right now, that means being
clear about problems you have with the current spec.
If you can follow up more precisely to the above and provide more info
where necessary, hopefully we can nail down a solution for you in the
Please provide example mark-up if you can (add it to the wiki if you
think it illustrates an issue, too).
More information about the microformats-discuss