book-brainstorming: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
 
(37 intermediate revisions by 7 users not shown)
Line 1: Line 1:
= Book Brainstorming =
= Book Brainstorming =
== Contributors ==
* [http://people.opera.com/howcome Håkon Wium Lie]
* [http://tantek.com/log/ Tantek Çelik]
* Bert Bos


== Introduction ==
== Introduction ==


Given analysis and research done on [[book-examples]] and [[book-formats]], this page documents various thoughts and strawman proposals for a book microformat.
Given analysis and research done on [[book-examples]] and [[book-formats]], this page documents various thoughts and strawman proposals for a book microformat. Many of the class names proposes are suitable for other forms of extended prose, such as articles and longer notes.


== Table of Contents ==
== Table of Contents ==
Line 18: Line 12:


HTML is a general-purpose markup language used for electronic documents, mostly for onscreen reading.  
HTML is a general-purpose markup language used for electronic documents, mostly for onscreen reading.  
Some content, however, is more suitable for other kinds of presentation and being able to reuse the same content for different media types has been a design goal or HTML and CSS.
Some content, however, is more suitable for other kinds of presentation and being able to reuse the same content for different media types has been a design goal of HTML and CSS.
 
It has been shown possible to use HTML as a [http://www.alistapart.com/articles/boom format for book publishing]. In the authoring process, it was helpful to use a set of class name on HTML element to further classify content. The classes, along with their associated structural elements, mostly served as hooks for the associated style sheet. In particular, the class names helped separate the content into different sections of a book.


It has been shown possible to use HTML as a [http://www.alistapart.com/boom format for book publishing]. In the authoring process, it was helpful to use a set of class name on HTML element to further classify content. The classes, along with their associated structural elements, mostly served as hooks for the associated style sheet. In particular, the class names helped separate the content into different sections of a book.
The main motivation for developing a microformat for book is to encourage reuse of content for different media types. By offering people a sample HTML file and an associated style sheet, HTML can become a compelling format to use for book production. As such, the class names described in a book microformat are primarily hooks for style sheets to use, and secondarily machine-readable semantics.
 
The scope of the term "book" has no clear limits when applied to texts made public on the Internet. It is therefore the case that many of the class names here can be used for any presentation of extended prose, including those which will never appear in print. In particular, articles can have tables of contents, lists of figures, appendices, glossaries, references, bibliographies among other shared parts. It is likely that the continued shift in publishing to online forms will blur the distinctions re-enforced by physical manifestations so this microformat encourages reuse in all suitable situations.


== Parts of a book ==
== Parts of a book ==


The user interface of books is fairly standarized. There is typically a front cover that includes the title of the book and the name of the author(s). Inside the cover, one will find a table of contents, chapters, and index and so forth. The table below lists commonly used sections.
The user interface of books is fairly standarized. There is typically a front cover that includes the title of the book and the name of the author(s). Inside the cover, one will find a table of contents, chapters, and index and so forth. The table below lists commonly used section types.


<table>
<table border>
<tr><th>Section&nbsp;name<th>Description
<tr><th>Section&nbsp;type<th>Description
<tr><td>frontcover  <td>The front cover
<tr><td>frontcover  <td>The front cover
<tr><td>halftitlepage<td>The halftitle page is simple with only the title of the book, and perhaps the name of the authors
<tr><td>halftitlepage<td>The halftitle page is simple with only the title of the book, and perhaps the name of the authors
Line 33: Line 31:
<tr><td>imprint      <td>The imprint page typically starts with a copyright statement and also contains information about where the book is printed, its ISBN number etc.  
<tr><td>imprint      <td>The imprint page typically starts with a copyright statement and also contains information about where the book is printed, its ISBN number etc.  
<tr><td>dedication  <td>The dedication page is where you find "for mom"
<tr><td>dedication  <td>The dedication page is where you find "for mom"
<tr><td>inspiration  <td>Many books contain inspirational quotes by other authors
<tr><td>inspiration  <td>Many books contain inspirational quotes by other authors.
<tr><td>foreword    <td>Many books contain a foreword written by someone other than the authors
<tr><td>foreword    <td>Many books contain a foreword written by someone other than the authors
<tr><td>preface     <td>The preface is written by the authors and often contains an acknowledgement of other contributors
<tr><td>preface     <td>The preface is written by the authors and often contains an acknowledgement of other contributors
<tr><td>toc          <td>Table of Contents
<tr><td>toc          <td>Table of Contents [[User:Sfsheath|Sfsheath]] 20:33, 13 September 2010 (UTC) why is this abbreviated. 'tableofcontents' is more clear. [[User:HOWCOME|HOWCOME]] 08:57, 21 February 2018 (UTC) 'TOC' in an established, if cryptic, term in the publishing world
<tr><td>lot          <td>List of Tables
<tr><td>lot          <td>List of Tables [[User:Sfsheath|Sfsheath]] 20:33, 13 September 2010 (UTC) 'listoftables' is more clear.
<tr><td>lof          <td>List of Figures
<tr><td>lof          <td>List of Figures [[User:Sfsheath|Sfsheath]] 20:33, 13 September 2010 (UTC) 'listoffigures' is more clear.
<tr><td>chapter      <td>The content itself content is typically organized in numbered chapters.
<tr><td>introduction <td>An introductory chapter
<tr><td>uchapter    <td>Many books contain unnumbered chapters, e.g., an introduction.
<tr><td>chapter      <td>The content itself is typically organized in numbered chapters
<tr><td>part        <td>Some books organize sets of chapters into parts
<tr><td>part        <td>Some books organize sets of chapters into parts
<tr><td>catalog    <td>The section of a book listing discrete, similarly structured descriptive entries.
<tr><td>afterword    <td>An additional, often unnumbered chapter at the end of the book
<tr><td>afterword    <td>An additional, often unnumbered chapter at the end of the book
<tr><td>bibliography <td>The bibliography lists other books and sources for further reading
<tr><td>references  <td>References from the text of the book are often listed in a separate section
<tr><td>references  <td>References from the text of the book are often listed in a separate section
<tr><td>appendix    <td>Additional information can be organized into appendices
<tr><td>appendix    <td>Additional information can be organized into appendices
<tr><td>biblio      <td>The bibliography lists other books and sources for further reading
<tr><td>glossary    <td>The glossary defines terms used in the book
<tr><td>glossary    <td>The glossary defines terms used in the book
<tr><td>index        <td>The index is a list of keyword with page references
<tr><td>index        <td>The index is a list of keyword with page references
Line 53: Line 52:
</table>
</table>


In [[boom]], the section names are used as class names on the <code><nowiki><div></nowiki></code> element:
In boom, the section names are used as class names on the <code><nowiki><div></nowiki></code> element:
 
:<code><nowiki><div class="halftitlepage"><h1>Title</h1></div></nowiki></code>
 
Not all books have all sections. A typical novel will have instances of around 10 sections. (My copy of Robert M. Pirsig's "Zen and the art of Motorcycle maintenance" uses these sections: frontcover, inspiration, praise, promotion, titlepage, imprint, preface, inspiration, part, chapter, afterword.) Non-fiction books often use more sections. (My randomly chosen title from O'Reilly uses 16 sections: frontcover, halftitlepage, titlepage, imprint, toc, lof, foreword, preface, part, chapter, appendix, index, bio, colophon, promotion, backcover.)
 
=== Are there too many section types? ===
 
It may be argued that the list of possible section types is too long for a "microformat". While one should always strive for simplicity, a few things should be kept in mind:
 
* the section names only affect on attribute on one element (namely, the class attribute on the div element)
 
* publishing is an established industry and paper-based books are not likely to change. As such, the format describes something that already exists.
 
Nontheless, some of the proposed sections could be combined. for example, the forewords and the preface are often formatted in the same manner and there is no need to distinguish between the two in the style sheet. Another similar example is the list of tables and the list of figures. And having a colophon isn't that common, is it? However, all the proposed section types are in common use and the cost of listing one more type is small compared to the extra cost of differentiating between sections through other means than standardized class names.
 
 
=== Are there enough sections? ===
 
The list of possible section types is seemingly endless. For example, one could have a separate "acknowledgements" section instead of using the "preface" section for this. Also, one could have different types of sections for different types of promotional material. The postcard, which is often included in books, is formatted very differently from the list of other books in the same series. Thus, having several promotional elements would make sense.
 
However, in the interest of simplicity it is important to keep the number of section types at a manageable level.
 
In the end, determining the list of section types for a microformat is a judgement call.
 
== Figures ==
 
Figures are often used in book. From a typesetting persepctive, figures are troublesome as they form blobs that cannot be split over several pages. By classifying figures into different categories, typesetting can be made easier. The following baseline markup is proposed:
 
:<code><nowiki><div class="figure">...<p class="caption">...</div></nowiki></code>
 
In addition, figures can be given additional class names:
 
<table border>
<tr><th>Class name<th>Description
 
<tr><td>wide<td>The figure is wide and that it may need to intrude into margins
<tr><td>flex<td>The figure is anchored at a certain position, but the presentation of the figure may occur in a nearby place. For example, the figure may be floated to the top of the page. Using this class can make typesetting easier and is recommended unless the figure needs to be placed exactly where it appears in the markup.
</table>
 
== Other features of a book ==


::<code><nowiki><div class="halftitlepage"><h1>Title</h1></div></nowiki></code>
Sections types provide a vocabulary for classifying different parts, pages, of a book. Book authors will also need to classify smaller elements, e.g.:


:Not all books has all sections. A typical novel will have instances of around 10 sections. (My copy of Robert M. Pirsig's "Zen and the art of Motorcycle maintenance" uses these sections: frontcover, inspiration, praise, promotion, titlepage, imprint, preface, inspiration, part, chapter, afterword.) Non-fiction books often use more sections. (My randomly chosen title from O'Reilly uses 16 sections: frontcover, halftitlepage, titlepage, imprint, toc, lof, foreword, preface, part, chapter, appendix, index, bio, colophon, promotion, backcover.)
* sidenotes
* footnotes
* different kinds of tables: small, multi-page ...
* table captions


=== Complexity ===
HTML has defined the semantics of table captions through the "caption" element. Alas, the quality of deployed browsers is variable and this makes it hard to use the "caption" element in practice. The boom microformat proposes class names for this to go around widely deployed bugs.
 
== Comparison with DocBook ==
 
DocBook [http://www.docbook.org/ docbook] is an SGML/XML vocabulary which is been developed for "books and papers about computer hardware and software", but it can also be used for other kinds of books. DocBook is a complex specification; it contains around [http://www.docbook.org/tdg5/en/html/pt02.html 400 different elements]. Some of DocBook's elements are similar to the section types in the table above:
 
<table border>
<tr><th>Section&nbsp;type<th>DocBook element
<tr><td>frontcover  <td>not defined,
<tr><td>halftitlepage<td>not defined
<tr><td>titlepage    <td>not defined
<tr><td>imprint      <td>not defined
<tr><td>dedication  <td>dedication
<tr><td>inspiration  <td>not defined
<tr><td>foreword    <td>not defined, "preface" is recommended
<tr><td>preface   <td>preface
<tr><td>toc          <td>toc
<tr><td>lot          <td>lot
<tr><td>lof          <td>not defined, "lot" is recommended
<tr><td>introduction <td>not defined
<tr><td>chapter      <td>chapter
<tr><td>part        <td>part
<tr><td>afterword    <td>not defined
<tr><td>references  <td>reference (not the singular form)
<tr><td>appendix    <td>appendix
<tr><td>bibliography <td>bibliography
<tr><td>glossary    <td>glossary
<tr><td>index        <td>index
<tr><td>colophon    <td>colophon
<tr><td>promotion    <td>not defined
<tr><td>backcover    <td>not defined
</table>


It may be argued that the list of possible section names is too long for a "microformat". While one should always strive for simplicity, a few things should be kept in mind
Although DocBook doesn't have elements for all section types, it is still possible for these sections to appear in the resulting publication. For example, an XSLT processor can add a title page in the printed output based on information in DocBook's "author" element.


- the section names only affect on attribute on one element (namely, the class attribute on the div element)
This underlines a difference between HTML and some other SGML/XML formats: in HTML, content is presented roughly in the same order as it appears in the structure. Other formats, e.g. DocBook, often require a transformation stage where content is moved from abstract elements (e.g., "info") to more concrete elements (e.g., the front and back covers).


- publishing is an established industry and paper-based books are not likely to change. As such, the format describes something that already exists.
HTML does not have the more abstract elements (although "meta" could possibly be used) and subclassing "div" elements in the order of presentation is therefore a pragmatic approach.


== Proposals ==
== Proposals ==
Line 75: Line 148:
* [[book-examples]]
* [[book-examples]]
* [[book-formats]]
* [[book-formats]]
* [http://ocoins.info/ OCoins]</nowiki>

Latest revision as of 08:57, 21 February 2018

Book Brainstorming

Introduction

Given analysis and research done on book-examples and book-formats, this page documents various thoughts and strawman proposals for a book microformat. Many of the class names proposes are suitable for other forms of extended prose, such as articles and longer notes.

Table of Contents

Background

HTML is a general-purpose markup language used for electronic documents, mostly for onscreen reading. Some content, however, is more suitable for other kinds of presentation and being able to reuse the same content for different media types has been a design goal of HTML and CSS.

It has been shown possible to use HTML as a format for book publishing. In the authoring process, it was helpful to use a set of class name on HTML element to further classify content. The classes, along with their associated structural elements, mostly served as hooks for the associated style sheet. In particular, the class names helped separate the content into different sections of a book.

The main motivation for developing a microformat for book is to encourage reuse of content for different media types. By offering people a sample HTML file and an associated style sheet, HTML can become a compelling format to use for book production. As such, the class names described in a book microformat are primarily hooks for style sheets to use, and secondarily machine-readable semantics.

The scope of the term "book" has no clear limits when applied to texts made public on the Internet. It is therefore the case that many of the class names here can be used for any presentation of extended prose, including those which will never appear in print. In particular, articles can have tables of contents, lists of figures, appendices, glossaries, references, bibliographies among other shared parts. It is likely that the continued shift in publishing to online forms will blur the distinctions re-enforced by physical manifestations so this microformat encourages reuse in all suitable situations.

Parts of a book

The user interface of books is fairly standarized. There is typically a front cover that includes the title of the book and the name of the author(s). Inside the cover, one will find a table of contents, chapters, and index and so forth. The table below lists commonly used section types.

Section typeDescription
frontcover The front cover
halftitlepageThe halftitle page is simple with only the title of the book, and perhaps the name of the authors
titlepage The title page contains (at least) the book title, the name of the author and the name of the publisher
imprint The imprint page typically starts with a copyright statement and also contains information about where the book is printed, its ISBN number etc.
dedication The dedication page is where you find "for mom"
inspiration Many books contain inspirational quotes by other authors.
foreword Many books contain a foreword written by someone other than the authors
preface The preface is written by the authors and often contains an acknowledgement of other contributors
toc Table of Contents Sfsheath 20:33, 13 September 2010 (UTC) why is this abbreviated. 'tableofcontents' is more clear. HOWCOME 08:57, 21 February 2018 (UTC) 'TOC' in an established, if cryptic, term in the publishing world
lot List of Tables Sfsheath 20:33, 13 September 2010 (UTC) 'listoftables' is more clear.
lof List of Figures Sfsheath 20:33, 13 September 2010 (UTC) 'listoffigures' is more clear.
introduction An introductory chapter
chapter The content itself is typically organized in numbered chapters
part Some books organize sets of chapters into parts
catalog The section of a book listing discrete, similarly structured descriptive entries.
afterword An additional, often unnumbered chapter at the end of the book
bibliography The bibliography lists other books and sources for further reading
references References from the text of the book are often listed in a separate section
appendix Additional information can be organized into appendices
glossary The glossary defines terms used in the book
index The index is a list of keyword with page references
colophon The colophon page contains information about the production of the book
promotion Promotional material from the publisher, e.g., a list of other titles in the same series
backcover The back cover

In boom, the section names are used as class names on the <div> element:

<div class="halftitlepage"><h1>Title</h1></div>

Not all books have all sections. A typical novel will have instances of around 10 sections. (My copy of Robert M. Pirsig's "Zen and the art of Motorcycle maintenance" uses these sections: frontcover, inspiration, praise, promotion, titlepage, imprint, preface, inspiration, part, chapter, afterword.) Non-fiction books often use more sections. (My randomly chosen title from O'Reilly uses 16 sections: frontcover, halftitlepage, titlepage, imprint, toc, lof, foreword, preface, part, chapter, appendix, index, bio, colophon, promotion, backcover.)

Are there too many section types?

It may be argued that the list of possible section types is too long for a "microformat". While one should always strive for simplicity, a few things should be kept in mind:

  • the section names only affect on attribute on one element (namely, the class attribute on the div element)
  • publishing is an established industry and paper-based books are not likely to change. As such, the format describes something that already exists.

Nontheless, some of the proposed sections could be combined. for example, the forewords and the preface are often formatted in the same manner and there is no need to distinguish between the two in the style sheet. Another similar example is the list of tables and the list of figures. And having a colophon isn't that common, is it? However, all the proposed section types are in common use and the cost of listing one more type is small compared to the extra cost of differentiating between sections through other means than standardized class names.


Are there enough sections?

The list of possible section types is seemingly endless. For example, one could have a separate "acknowledgements" section instead of using the "preface" section for this. Also, one could have different types of sections for different types of promotional material. The postcard, which is often included in books, is formatted very differently from the list of other books in the same series. Thus, having several promotional elements would make sense.

However, in the interest of simplicity it is important to keep the number of section types at a manageable level.

In the end, determining the list of section types for a microformat is a judgement call.

Figures

Figures are often used in book. From a typesetting persepctive, figures are troublesome as they form blobs that cannot be split over several pages. By classifying figures into different categories, typesetting can be made easier. The following baseline markup is proposed:

<div class="figure">...<p class="caption">...</div>

In addition, figures can be given additional class names:

Class nameDescription
wideThe figure is wide and that it may need to intrude into margins
flexThe figure is anchored at a certain position, but the presentation of the figure may occur in a nearby place. For example, the figure may be floated to the top of the page. Using this class can make typesetting easier and is recommended unless the figure needs to be placed exactly where it appears in the markup.

Other features of a book

Sections types provide a vocabulary for classifying different parts, pages, of a book. Book authors will also need to classify smaller elements, e.g.:

  • sidenotes
  • footnotes
  • different kinds of tables: small, multi-page ...
  • table captions

HTML has defined the semantics of table captions through the "caption" element. Alas, the quality of deployed browsers is variable and this makes it hard to use the "caption" element in practice. The boom microformat proposes class names for this to go around widely deployed bugs.

Comparison with DocBook

DocBook docbook is an SGML/XML vocabulary which is been developed for "books and papers about computer hardware and software", but it can also be used for other kinds of books. DocBook is a complex specification; it contains around 400 different elements. Some of DocBook's elements are similar to the section types in the table above:

Section typeDocBook element
frontcover not defined,
halftitlepagenot defined
titlepage not defined
imprint not defined
dedication dedication
inspiration not defined
foreword not defined, "preface" is recommended
preface preface
toc toc
lot lot
lof not defined, "lot" is recommended
introduction not defined
chapter chapter
part part
afterword not defined
references reference (not the singular form)
appendix appendix
bibliography bibliography
glossary glossary
index index
colophon colophon
promotion not defined
backcover not defined

Although DocBook doesn't have elements for all section types, it is still possible for these sections to appear in the resulting publication. For example, an XSLT processor can add a title page in the printed output based on information in DocBook's "author" element.

This underlines a difference between HTML and some other SGML/XML formats: in HTML, content is presented roughly in the same order as it appears in the structure. Other formats, e.g. DocBook, often require a transformation stage where content is moved from abstract elements (e.g., "info") to more concrete elements (e.g., the front and back covers).

HTML does not have the more abstract elements (although "meta" could possibly be used) and subclassing "div" elements in the order of presentation is therefore a pragmatic approach.

Proposals

  • boom - the Book Microformat

See Also