microformats2-parsing-issues
This article is a stub. You can help the microformats.org wiki by expanding it.
This page is for documenting issues with the microformats2-parsing specification.
issues
e- parsing iframe srcdoc
- Proposal: addition of a new e-* parsing rule for iframe elements with srcdoc attributes. E.G.
<div class="h-entry">
<iframe class="e-content" srcdoc="<p>A paragraph of HTML with "quoted quotes" &amp; doubly quoted ampersands</p>" />
</div>
{
"items": [{
"type": ["h-entry"],
"properties": {
"content": ["<p>A paragraph of HTML with "quoted quotes" & doubly quoted ampersands</p>"]
}
}]
}
This would allow, for example, HTML comments to be sandboxed inside iframes but still parsable as microformats.
I believe the correct processing would be to leave " entities as they are but to unescape any doubly-escaped ampersands.
- Is there any use case for that? —Tom Morris 12:32, 14 September 2013 (UTC)
- +1 we need documentation of use case and existing sites publishing iframe srcdoc like this - Tantek 00:47, 15 September 2013 (UTC)
e- and p- escaping levels
- The fact that the parsed value of any element with .e-* is at a different level of escaping to the parsed values of p-*, dt-* etc. without any indication of how the property was parsed in the output is a security problem. For example:
input | output |
---|---|
<p class="h-card">
<span class="p-name"><tag></span>
</p>
|
{
"items": [
{
"type": [
"h-card"
],
"properties": {
"name": [
"<tag>"
]
}
}
]
}
|
<p class="h-card">
<span class="e-name"><tag></span>
</p>
|
{
"items": [
{
"type": [
"h-card"
],
"properties": {
"name": [
"<tag>"
]
}
}
]
}
|
- As a parser developer, the most straightforward way I can think of solving this is to add an option (enabled by default) which encodes HTML special characters on all non e-* properties, so the developer knows that all property values are going to be at the same level of escaping. --bw 20:00, 15 June 2013 (UTC)
- Your suggestion of auto-HTML-encoding p-*/u-*/dt-* property values is the most sensible I think. I would NOT make it an option, as it makes sense write consistent microformats2 consumers. - Tantek 07:18, 5 July 2013 (UTC)
- Can you think of any existing apps/consumers of microformats2 via the parser that would break? What would indieweb comments parsers do? - Tantek 07:18, 5 July 2013 (UTC)
- The only breakage which might occur would be over-encoding of non e-* properties, but I’ll release this update as v0.2.0 and warn people about the changes. The worst thing which could happen is that some comments look a bit weird, as opposed to the current worst possible scenario of easy XSS attacks --bw 12:55, 5 July 2013 (UTC)
- We should also decide exactly which characters get encoded — just angle brackets, or quotes/ampersands as well? --bw 12:55, 5 July 2013 (UTC)
- I am not sure about this, it seems more like a helper function rather than a core feature of the parser. Personally I would like to store data as text and encode only when I am going to use and I known the format it is going to be use in. --Glenn Jones 9:54, 14 July 2013 (UTC)
- After the discussion on the indiewebcamp IRC with Barnaby Walters I now understand the XSS issue that this change is trying to address. A rogue author could include HTML with scripts to execute a XSS attack. These could be masked by switch prefixes i.e. p-* to e-* on a well use property. As the consumer does not see the prefix in the JSON output they have no idea if a property will content HTML or text. I will update my two parsers and the test suite --Glenn Jones 8:02, 17 July 2013 (UTC)
- So what about an author setting a property to e-* when it would normal be p-*, dt-* or u-* i.e.
<div class="h-card"><p class="e-name"><script> alert('xss test') </script></p></div>
Should we not encode e-* as well and the consumer can decode at their own risk --Glenn Jones 18:42, 21 July 2013 (UTC)
resolved
br hr empty string
- The parsing rule 'else if br.p-x or hr.p-x, then return "" (empty string)' for p-* can cause any code consuming the API to become quite bloated. It means that you have test every array value to see if its an empty string. It is also unclear to me what the purpose of this mark-up pattern is for Glenn Jones
- Upon reconsidering this, I agree with you, this is an unlikely use case. If a publisher wants to explicitly set an empty property "p-foo" they can simply write
<span class="p-foo"></span>
which looks explicit. Whereas BR and HR tags are often just presentational, so we should both not encourage usage of them for semantics, and anyone that did use them would be subject to likely loss of semantics upon a redesign (that got rid of those particular BR and HR tags). I'm going to remove them from the parsing spec. - Tantek 15:29, 10 February 2013 (UTC)
- Upon reconsidering this, I agree with you, this is an unlikely use case. If a publisher wants to explicitly set an empty property "p-foo" they can simply write
datetime examples without T delimiter
- The examples in the wiki microformats-2 pages such h-entry and h-entry had datetime without the 'T' delimiter between date and time. ie
<time class="dt-published" datetime="2013-06-13 12:00:00">13<sup>th</sup> June 2013</time>
I have updated the pages. As far as I known this is a new pattern for dates. Was it a mistake in the examples or is it a new datetime pattern.
- The HTML5 "time" element, and "datetime" attribute allow for space " " as a separator between date and time as well as "T", thus we allow it for microformats as well. The " " separator is preferred as the date and time are more readable when separated by a space. The examples noted in those specs deliberately use this. - Tantek 18:48, 15 July 2013 (UTC)
rel-alternate absent optional attributes
- What should rel-alternate parsing do when one of the optional attributes specified (hreflang or media or both) is not there? The options seem to be:
- leave the corresponding key out of the alternate JSON object
- This one. Leave the corresponding key out.
- include the corresponding key in the alternate JSON object, but set the value to the JSON null object
- include the corresponding key in the alternate JSON object, but set the value to a blank string
- something I haven't thought of
- leave the corresponding key out of the alternate JSON object
I haven't checked the existing implementations, but Barnaby said he's not sure what the appropriate way to deal with it is either. —Tom Morris 15:41, 9 August 2013 (UTC)
rel-alternate and type attribute
- Should rel-alternate parsing also pick up the
type
attribute? It’s fairly widely used, e.g. for ATOM feeds.- Numerous existing sites/pages have various rel-alternate uses with a type attribute for feeds/APIs so that's good enough to add this for help with discovery in general. Rel parsing updated. - Tantek 00:47, 15 September 2013 (UTC)