parsing: Difference between revisions
(started matrix for parsing rules, move this page if needed) |
(added the rest of the HTML elements) |
||
| Line 233: | Line 233: | ||
</tr> | </tr> | ||
<tr> | <tr> | ||
<td>I</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>IFRAME</td> | |||
<td>@src?</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>IMG</td> | |||
<td>@src</td> | |||
<td>@alt</td> | |||
</tr> | |||
<tr> | |||
<td>INPUT</td> | |||
<td>@value?</td> | |||
<td>@value?</td> | |||
</tr> | |||
<tr> | |||
<td>INS</td> | |||
<td>@cite,node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>ISINDEX (valid?)</td> | |||
<td></td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<td>KBD</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>LABEL</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>LEGEND</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>LI</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>LINK (valid?)</td> | |||
<td></td> | |||
<td></td> | <td></td> | ||
</tr> | |||
<tr> | |||
<td>MAP</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>MENU (valid?)</td> | |||
<td></td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<td>META (valid?)</td> | |||
<td></td> | |||
<td></td> | |||
</tr> | |||
<tr> | |||
<td>NOFRAMES</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>NOSCRIPT</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>OBJECT</td> | |||
<td>@data,node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>OL</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>OPTGROUP (valid?)</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>OPTION</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>P</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>PARAM (?)</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>PRE</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>Q</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>S</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>SAMP</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>SCRIPT</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>SELECT (valid?)</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>SMALL</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>SPAN</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>STRIKE</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>STRONG</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>STYLE (valid?)</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>SUB</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>SUP</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>TABLE(valid?)</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>TBODY</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>TD</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>TEXTAREA</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>TFOOT</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>TH</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>THEAD</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>TITLE</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>TR</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>TT</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>U</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>UL</td> | |||
<td>node-value</td> | |||
<td>node-value</td> | |||
</tr> | |||
<tr> | |||
<td>VAR</td> | |||
<td>node-value</td> | <td>node-value</td> | ||
<td>node-value</td> | <td>node-value</td> | ||
</tr> | </tr> | ||
</table> | </table> | ||
Revision as of 11:04, 12 September 2007
Parsing
This is a braindump, this page will need cleaning-up, take everything with a grain of salt at the moment.
By Element
This is a matrix of element and type. This should be describe under what circumstances each value and where that value comes from. The list of elements has been taken from http://www.w3.org/TR/html4/index/elements.html
data types
(this probably needs a better name) There are two types in microformats, protocol types and strings. Strings could be integers, such as ratings, strings, such as a note, or datetimes, such as dtstart. Protocol types are UIDs, URLs, email addresses, (sometimes Telephones and faxes)
If there is a comma list, then this is in order of availability. For instance, the ABBR element is @title,node-value. IF the @title is present then it is used, if not the stack is popped and node-value is looked at, if there is no node-value, then the value is NULL.
| protocol | string | |
| A | @href,node-value | node-value |
| ABBR | @title,node-value | @title,node-value |
| ACRONYM | @title,node-value | @title,node-value |
| ADDRESS | node-value | node-value |
| APPLET | ??? | ???(node-value) |
| AREA | @href,node-value | node-value |
| B | node-value | node-value |
| BASE (valid?) | @href | |
| BASEFONT (valid?) | ||
| BDO (valid?) | ||
| BIG | node-value | node-value |
| BLOCKQUOTE | @cite?,node-value | node-value |
| BODY | node-value | node-value |
| BR (valid?) | ||
| BUTTON | @value? | @value? |
| CAPTION | node-value | node-value |
| CENTER | node-value | node-value |
| CITE | node-value | node-value |
| CODE | node-value | node-value |
| COL | node-value | node-value |
| COLGROUP | node-value | node-value |
| DD | node-value | node-value |
| DEL | @cite,node-value | node-value |
| DFN | node-value | node-value |
| DIR | node-value | node-value |
| DIV | node-value | node-value |
| DL | node-value | node-value |
| DT | node-value | node-value |
| EM | node-value | node-value |
| FIELDSET | node-value | node-value |
| FONT | node-value | node-value |
| FORM | @action?,node-value | node-value |
| FRAME | @src?,node-value | node-value |
| FRAMESET | node-value | node-value |
| H1 | node-value | node-value |
| H2 | node-value | node-value |
| H3 | node-value | node-value |
| H4 | node-value | node-value |
| H5 | node-value | node-value |
| H6 | node-value | node-value |
| HEAD (valid?) | node-value | node-value |
| HR (valid?) | node-value | node-value |
| HTML (valid?) | node-value | node-value |
| I | node-value | node-value |
| IFRAME | @src? | node-value |
| IMG | @src | @alt |
| INPUT | @value? | @value? |
| INS | @cite,node-value | node-value |
| ISINDEX (valid?) | ||
| KBD | node-value | node-value |
| LABEL | node-value | node-value |
| LEGEND | node-value | node-value |
| LI | node-value | node-value |
| LINK (valid?) | ||
| MAP | node-value | node-value |
| MENU (valid?) | ||
| META (valid?) | ||
| NOFRAMES | node-value | node-value |
| NOSCRIPT | node-value | node-value |
| OBJECT | @data,node-value | node-value |
| OL | node-value | node-value |
| OPTGROUP (valid?) | node-value | node-value |
| OPTION | node-value | node-value |
| P | node-value | node-value |
| PARAM (?) | node-value | node-value |
| PRE | node-value | node-value |
| Q | node-value | node-value |
| S | node-value | node-value |
| SAMP | node-value | node-value |
| SCRIPT | node-value | node-value |
| SELECT (valid?) | node-value | node-value |
| SMALL | node-value | node-value |
| SPAN | node-value | node-value |
| STRIKE | node-value | node-value |
| STRONG | node-value | node-value |
| STYLE (valid?) | node-value | node-value |
| SUB | node-value | node-value |
| SUP | node-value | node-value |
| TABLE(valid?) | node-value | node-value |
| TBODY | node-value | node-value |
| TD | node-value | node-value |
| TEXTAREA | node-value | node-value |
| TFOOT | node-value | node-value |
| TH | node-value | node-value |
| THEAD | node-value | node-value |
| TITLE | node-value | node-value |
| TR | node-value | node-value |
| TT | node-value | node-value |
| U | node-value | node-value |
| UL | node-value | node-value |
| VAR | node-value | node-value |