parsing: Difference between revisions
m (Add link to brainstorming page) |
m (Replace <entry-title> with {{DISPLAYTITLE:}}) |
||
(3 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
{{DISPLAYTITLE:Parsing}} | |||
This is a braindump, this page will need cleaning-up, take everything with a grain of salt at the moment. | This is a braindump, this page will need cleaning-up, take everything with a grain of salt at the moment. | ||
Line 7: | Line 8: | ||
== By Element == | == By Element == | ||
This is a matrix of element and type. This should be describe under what circumstances each value and where that value comes from. The list of elements has been taken from http://www.w3.org/TR/html4/index/elements.html | This is a matrix of element and type. This should be describe under what circumstances each value and where that value comes from. The list of elements has been taken from http://www.w3.org/TR/html4/index/elements.html with some [[HTML5]] elements added, in particular those with special parsing needs. | ||
See [[semantic | See [[semantic html]] for a definitive list of elements. | ||
=== data types === | === data types === | ||
Line 127: | Line 128: | ||
<td>node-value</td> | <td>node-value</td> | ||
<td>node-value</td> | <td>node-value</td> | ||
</tr> | |||
<tr> | |||
<td>DATA</td> | |||
<td>@value,node-value</td> | |||
<td>@value,node-value</td> | |||
</tr> | </tr> | ||
<tr> | <tr> | ||
Line 447: | Line 453: | ||
<td>node-value</td> | <td>node-value</td> | ||
<td>node-value</td> | <td>node-value</td> | ||
</tr> | |||
<tr> | |||
<td>TIME</td> | |||
<td>@datetime,node-value</td> | |||
<td>@datetime,node-value</td> | |||
</tr> | </tr> | ||
<tr> | <tr> | ||
Line 479: | Line 490: | ||
</tr> | </tr> | ||
</table> | </table> | ||
== New Elements == | |||
When a new [[semantic HTML]] element is introduced, follow these steps to update microformats to handle the new element. | |||
# add the element to [https://github.com/microformats/mediawiki-semantic-html SemanticHTML MediaWiki extension], which enables creating wiki live markup examples | |||
# update parsing rules accordingly (on [[parsing]] and [[hcard-parsing]] wiki pages) | |||
# create/iterate actual live markup examples on wiki with real world content examples | |||
# implement experimental new parsing support, test on wiki examples. optionally deploy for broader testing (e.g. dev.) | |||
# if results are as expected/predicted, create test case from example markup with results as expected. if not then re-assess how parsing should work and go to 2. | |||
# add parsing support to additional implementations | |||
# have individual implementations test/deploy broadly as they see fit | |||
== Links == | == Links == | ||
* [http://www.xml.com/pub/a/2007/09/04/parsing-microformats.html XML.com Parsing microformats] | * [http://www.xml.com/pub/a/2007/09/04/parsing-microformats.html XML.com Parsing microformats] | ||
* [[Acid Test]] | * [[Acid Test]] |
Latest revision as of 16:31, 18 July 2020
This is a braindump, this page will need cleaning-up, take everything with a grain of salt at the moment.
For now, start with reading hcard-parsing as that has more detail and has been more throughly reviewed and implemented.
- I've documented my own thoughts on parsing which flesh out some of the ideas and go into more detail on algorithms and stuff. TobyInk 06:33, 21 Jul 2008 (PDT)
By Element
This is a matrix of element and type. This should be describe under what circumstances each value and where that value comes from. The list of elements has been taken from http://www.w3.org/TR/html4/index/elements.html with some HTML5 elements added, in particular those with special parsing needs.
See semantic html for a definitive list of elements.
data types
(this probably needs a better name) There are two types in microformats, protocol types and strings. Strings could be integers, such as ratings, strings, such as a note, or datetimes, such as dtstart. Protocol types are UIDs, URLs, email addresses, (sometimes Telephones and faxes)
If there is a comma list, then this is in order of availability. For instance, the ABBR element is @title,node-value. IF the @title is present then it is used, if not the stack is popped and node-value is looked at, if there is no node-value, then the value is NULL.
protocol | string | |
A | @href,node-value | node-value |
ABBR | @title,node-value | @title,node-value |
ACRONYM | @title,node-value | @title,node-value |
ADDRESS | node-value | node-value |
APPLET | ??? | ???(node-value) |
AREA | @href,node-value | node-value |
B | node-value | node-value |
BASE (valid?) | @href | |
BASEFONT (valid?) | ||
BDO (valid?) | ||
BIG | node-value | node-value |
BLOCKQUOTE | @cite?,node-value | node-value |
BODY | node-value | node-value |
BR (valid?) | ||
BUTTON | @value? | @value? |
CAPTION | node-value | node-value |
CENTER | node-value | node-value |
CITE | node-value | node-value |
CODE | node-value | node-value |
COL | node-value | node-value |
COLGROUP | node-value | node-value |
DATA | @value,node-value | @value,node-value |
DD | node-value | node-value |
DEL | @cite,node-value | node-value |
DFN | node-value | node-value |
DIR | node-value | node-value |
DIV | node-value | node-value |
DL | node-value | node-value |
DT | node-value | node-value |
EM | node-value | node-value |
FIELDSET | node-value | node-value |
FONT | node-value | node-value |
FORM | @action?,node-value | node-value |
FRAME | @src?,node-value | node-value |
FRAMESET | node-value | node-value |
H1 | node-value | node-value |
H2 | node-value | node-value |
H3 | node-value | node-value |
H4 | node-value | node-value |
H5 | node-value | node-value |
H6 | node-value | node-value |
HEAD (valid?) | node-value | node-value |
HR (valid?) | node-value | node-value |
HTML (valid?) | node-value | node-value |
I | node-value | node-value |
IFRAME | @src? | node-value |
IMG | @src | @alt |
INPUT | @value? | @value? |
INS | @cite,node-value | node-value |
ISINDEX (valid?) | ||
KBD | node-value | node-value |
LABEL | node-value | node-value |
LEGEND | node-value | node-value |
LI | node-value | node-value |
LINK (valid?) | ||
MAP | node-value | node-value |
MENU (valid?) | ||
META (valid?) | ||
NOFRAMES | node-value | node-value |
NOSCRIPT | node-value | node-value |
OBJECT | @data,node-value | node-value |
OL | node-value | node-value |
OPTGROUP (valid?) | node-value | node-value |
OPTION | node-value | node-value |
P | node-value | node-value |
PARAM (?) | node-value | node-value |
PRE | node-value | node-value |
Q | node-value | node-value |
S | node-value | node-value |
SAMP | node-value | node-value |
SCRIPT | node-value | node-value |
SELECT (valid?) | node-value | node-value |
SMALL | node-value | node-value |
SPAN | node-value | node-value |
STRIKE | node-value | node-value |
STRONG | node-value | node-value |
STYLE (valid?) | node-value | node-value |
SUB | node-value | node-value |
SUP | node-value | node-value |
TABLE(valid?) | node-value | node-value |
TBODY | node-value | node-value |
TD | node-value | node-value |
TEXTAREA | node-value | node-value |
TFOOT | node-value | node-value |
TH | node-value | node-value |
THEAD | node-value | node-value |
TIME | @datetime,node-value | @datetime,node-value |
TITLE | node-value | node-value |
TR | node-value | node-value |
TT | node-value | node-value |
U | node-value | node-value |
UL | node-value | node-value |
VAR | node-value | node-value |
New Elements
When a new semantic HTML element is introduced, follow these steps to update microformats to handle the new element.
- add the element to SemanticHTML MediaWiki extension, which enables creating wiki live markup examples
- update parsing rules accordingly (on parsing and hcard-parsing wiki pages)
- create/iterate actual live markup examples on wiki with real world content examples
- implement experimental new parsing support, test on wiki examples. optionally deploy for broader testing (e.g. dev.)
- if results are as expected/predicted, create test case from example markup with results as expected. if not then re-assess how parsing should work and go to 2.
- add parsing support to additional implementations
- have individual implementations test/deploy broadly as they see fit