parsing: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(added the rest of the HTML elements)
m (Replace <entry-title> with {{DISPLAYTITLE:}})
 
(7 intermediate revisions by 4 users not shown)
Line 1: Line 1:
= Parsing =
{{DISPLAYTITLE:Parsing}}
 
This is a braindump, this page will need cleaning-up, take everything with a grain of salt at the moment.
This is a braindump, this page will need cleaning-up, take everything with a grain of salt at the moment.
For now, start with reading [[hcard-parsing]] as that has more detail and has been more throughly reviewed and implemented.
* I've documented my own [[parsing-brainstorming|thoughts on parsing]] which flesh out some of the ideas and go into more detail on algorithms and stuff. [[User:TobyInk|TobyInk]] 06:33, 21 Jul 2008 (PDT)


== By Element ==
== By Element ==
This is a matrix of element and type. This should be describe under what circumstances each value and where that value comes from. The list of elements has been taken from http://www.w3.org/TR/html4/index/elements.html
This is a matrix of element and type. This should be describe under what circumstances each value and where that value comes from. The list of elements has been taken from http://www.w3.org/TR/html4/index/elements.html with some [[HTML5]] elements added, in particular those with special parsing needs.
 
See [[semantic html]] for a definitive list of elements.


=== data types ===
=== data types ===
Line 121: Line 128:
<td>node-value</td>
<td>node-value</td>
<td>node-value</td>
<td>node-value</td>
</tr>
<tr>
<td>DATA</td>
<td>@value,node-value</td>
<td>@value,node-value</td>
</tr>
</tr>
<tr>
<tr>
Line 441: Line 453:
<td>node-value</td>
<td>node-value</td>
<td>node-value</td>
<td>node-value</td>
</tr>
<tr>
<td>TIME</td>
<td>@datetime,node-value</td>
<td>@datetime,node-value</td>
</tr>
</tr>
<tr>
<tr>
Line 473: Line 490:
</tr>
</tr>
</table>
</table>
== New Elements ==
When a new [[semantic HTML]] element is introduced, follow these steps to update microformats to handle the new element.
# add the element to [https://github.com/microformats/mediawiki-semantic-html SemanticHTML MediaWiki extension], which enables creating wiki live markup examples
# update parsing rules accordingly (on [[parsing]] and [[hcard-parsing]] wiki pages)
# create/iterate actual live markup examples on wiki with real world content examples
# implement experimental new parsing support, test on wiki examples. optionally deploy for broader testing (e.g. dev.)
# if results are as expected/predicted, create test case from example markup with results as expected. if not then re-assess how parsing should work and go to 2.
# add parsing support to additional implementations
# have individual implementations test/deploy broadly as they see fit
== Links ==
* [http://www.xml.com/pub/a/2007/09/04/parsing-microformats.html XML.com Parsing microformats]
* [[Acid Test]]

Latest revision as of 16:31, 18 July 2020


This is a braindump, this page will need cleaning-up, take everything with a grain of salt at the moment.

For now, start with reading hcard-parsing as that has more detail and has been more throughly reviewed and implemented.

  • I've documented my own thoughts on parsing which flesh out some of the ideas and go into more detail on algorithms and stuff. TobyInk 06:33, 21 Jul 2008 (PDT)

By Element

This is a matrix of element and type. This should be describe under what circumstances each value and where that value comes from. The list of elements has been taken from http://www.w3.org/TR/html4/index/elements.html with some HTML5 elements added, in particular those with special parsing needs.

See semantic html for a definitive list of elements.

data types

(this probably needs a better name) There are two types in microformats, protocol types and strings. Strings could be integers, such as ratings, strings, such as a note, or datetimes, such as dtstart. Protocol types are UIDs, URLs, email addresses, (sometimes Telephones and faxes)

If there is a comma list, then this is in order of availability. For instance, the ABBR element is @title,node-value. IF the @title is present then it is used, if not the stack is popped and node-value is looked at, if there is no node-value, then the value is NULL.

protocol string
A @href,node-value node-value
ABBR @title,node-value @title,node-value
ACRONYM @title,node-value @title,node-value
ADDRESS node-value node-value
APPLET ??? ???(node-value)
AREA @href,node-value node-value
B node-value node-value
BASE (valid?) @href
BASEFONT (valid?)
BDO (valid?)
BIG node-value node-value
BLOCKQUOTE @cite?,node-value node-value
BODY node-value node-value
BR (valid?)
BUTTON @value? @value?
CAPTION node-value node-value
CENTER node-value node-value
CITE node-value node-value
CODE node-value node-value
COL node-value node-value
COLGROUP node-value node-value
DATA @value,node-value @value,node-value
DD node-value node-value
DEL @cite,node-value node-value
DFN node-value node-value
DIR node-value node-value
DIV node-value node-value
DL node-value node-value
DT node-value node-value
EM node-value node-value
FIELDSET node-value node-value
FONT node-value node-value
FORM @action?,node-value node-value
FRAME @src?,node-value node-value
FRAMESET node-value node-value
H1 node-value node-value
H2 node-value node-value
H3 node-value node-value
H4 node-value node-value
H5 node-value node-value
H6 node-value node-value
HEAD (valid?) node-value node-value
HR (valid?) node-value node-value
HTML (valid?) node-value node-value
I node-value node-value
IFRAME @src? node-value
IMG @src @alt
INPUT @value? @value?
INS @cite,node-value node-value
ISINDEX (valid?)
KBD node-value node-value
LABEL node-value node-value
LEGEND node-value node-value
LI node-value node-value
LINK (valid?)
MAP node-value node-value
MENU (valid?)
META (valid?)
NOFRAMES node-value node-value
NOSCRIPT node-value node-value
OBJECT @data,node-value node-value
OL node-value node-value
OPTGROUP (valid?) node-value node-value
OPTION node-value node-value
P node-value node-value
PARAM (?) node-value node-value
PRE node-value node-value
Q node-value node-value
S node-value node-value
SAMP node-value node-value
SCRIPT node-value node-value
SELECT (valid?) node-value node-value
SMALL node-value node-value
SPAN node-value node-value
STRIKE node-value node-value
STRONG node-value node-value
STYLE (valid?) node-value node-value
SUB node-value node-value
SUP node-value node-value
TABLE(valid?) node-value node-value
TBODY node-value node-value
TD node-value node-value
TEXTAREA node-value node-value
TFOOT node-value node-value
TH node-value node-value
THEAD node-value node-value
TIME @datetime,node-value @datetime,node-value
TITLE node-value node-value
TR node-value node-value
TT node-value node-value
U node-value node-value
UL node-value node-value
VAR node-value node-value

New Elements

When a new semantic HTML element is introduced, follow these steps to update microformats to handle the new element.

  1. add the element to SemanticHTML MediaWiki extension, which enables creating wiki live markup examples
  2. update parsing rules accordingly (on parsing and hcard-parsing wiki pages)
  3. create/iterate actual live markup examples on wiki with real world content examples
  4. implement experimental new parsing support, test on wiki examples. optionally deploy for broader testing (e.g. dev.)
  5. if results are as expected/predicted, create test case from example markup with results as expected. if not then re-assess how parsing should work and go to 2.
  6. add parsing support to additional implementations
  7. have individual implementations test/deploy broadly as they see fit

Links