[uf-discuss] Microformats and keyword spamming

Ben Ward lists at ben-ward.co.uk
Tue Jun 2 01:42:14 PDT 2009


Hi Elli,

On 2 Jun 2009, at 00:01, Elli Albek wrote:

> 1. Non semantic HTML. The pages include a lot of repeating terms in  
> the
> wrong places. There is no way to avoid it if we want to use hreview.

Repeated content is of course undesirable, but please don't conflate  
it with being “non semantic”.

> 2. The pages become less accessible, since the HTML starts deviating  
> from
> its semantic form. For example, include pattern requires:

The include pattern offers you choices to reference other content in  
the page.  It's not perfect, I agree. I would appreciate some clarity  
on your problems below, though:

>  a. object tags (this should ALWAYS be avoided at all cost!!!)

We are aware of and have documented scenarios where `object` is  
problematic, but that doesn't equate to ‘ALWAYS be avoided at all  
cost’. If you have a new case with the `object` version is as horrific  
as you make out, please document it on http://microformats.org/wiki/include-pattern-issues 
. If not, please keep your descriptions concise and free from  
overreaction, as it confuses attempts to assist you.

>  b. empty A tags

These are pretty much outlawed by the spec now, based on previous  
accessibility research,

> C. A tags with redundant information that is constantly repeating on  
> the
> page. This is what we currently do as the lesser of all evils.

This is a compromise, of course. Is it really evil though? Suboptimal,  
sure, but evil? You'd be repeating the name of the restaurant  
somewhere. It can be fit into the structure cleanly, and it can be  
hidden if you want.

<h2 class='summary'><a class='include' href='#item'>The Alembic:</a>  
Great food and cocktails!</h2>
<p>… etc.</p>

Yes, not perfect. But can be designed in cleanly.

> 3. Needing to repeat so much text on the page affects search engine  
> results:
>
> Repeating terms that are almost irrelevant to the page, like
> <span class="type">business</span>
> on each and every hreview.

The review 'type' is optional. If it doesn't fit your publishing  
pattern, just leave it out.

> 4. Repeating important keywords on the page TOO MANY TIMES, such as  
> the
> business name on every review:
> <span class="item">
> <a class="include" href="#review_item">Maharishi Ayurveda Health  
> Spa</a>
> </span>
>
> This added the page main keywords so many times that I suspect it  
> borders
> keyword spamming in the eyes of the search engine.

Can you provide more info than that you ‘suspect’? Obviously that's a  
serious issue if true, but these hyperlinks are linking to fragments  
within the same page, not to other resources. To the best of my  
knowledge, that has nothing to do with establishing keywords for a  
page. Is there documentation to the contrary?

> 2. Use Google's cut down version of microformats. This may not  
> follow the
> spec, but if we follow google most of our problems are solved. What  
> I like
> about many google APIs is their practical approach. In that case I  
> think
> their view of microformats is more practical then the spec. It  
> certainly
> solves a lot of our problems.

Again, can you provide a link to this? Google's Rich Snippets  
documentation provides some small examples of hReview, but links to  
the microformats spec as the definitive reference. I'm unaware of any  
‘cut down’ version of microformat specs from Google, nor can I see  
anything in their examples suggesting this.

> 4. Direct feed to search engines in proprietary formats. We will still
> support hcard for the business directory, but will remove support for
> hreview since this is the major source of problems.

Since search engines don't currently accept any alternative ‘direct  
feed’ format I don't quite know what option 4 is supposed to entail,  
and again, precise clarity of major problems makes the issue easier to  
work with.

> Advice is welcome and appreciated.
>
> I would really appreciate:
>
> ** An example that shows how to build a REAL reviews page. **
>
> 1. That page includes the business information once and only once on  
> the
> page, where the business name is in H1 (in hcard).
> 2. That page includes review aggregate, which does not require any
> repetition or hidden text.
> 3. A few reviews of the said business, WHICH DO NOT REQUIRE ANY  
> REPETITION
> of any item information, the type of business, etc.
> 4. Business name shows once and only once on the page in the hcard and
> nowhere else.
> 5. Listing type (business/product) shows once and only once on the  
> page.
> Natural place in the hcard, but according to the spec it is not  
> possible.

I'm now a little confused. Up to now you're talking about hreview, and  
now you refer to hreview-aggregate.

As far as I'm aware, the `hreview-aggregate` can be created without  
any repetition or hidden text. The item will be a child of the  
aggregate.

The subsequent reviews would need to use the include pattern to  
reference the original item to avoid major repetition. Currently,  
there is no other way.



Now, I've asked questions here because you've been quite imprecise  
with some of your complaints, and I need to be clear about where  
you've found an explicit, documentable problem and where you're just  
expressing frustration. Please don't get me wrong, I completely agree  
that the include pattern is imperfect. And I do think better  
structural parsing should be pursued.

But, with regard to helping your problem right now, that means being  
clear about problems you have with the current spec.

If you can follow up more precisely to the above and provide more info  
where necessary, hopefully we can nail down a solution for you in the  
shorter term.

Please provide example mark-up if you can (add it to the wiki if you  
think it illustrates an issue, too).

Thanks,

Ben


More information about the microformats-discuss mailing list