[uf-new] an equation/MathML/TeX microformat?

Paul Topping pault at dessci.com
Sat Oct 27 11:01:13 PDT 2007


Yes, that's a good idea. In fact, our products already do embed this
kind of data in the equation images they produce. However, to make it a
useful standard, I have to convince the makers of many other products
and websites to do the same. Somehow it seems like an easier task to
convince them to change their HTML but perhaps it amounts to the same
thing. Is it practical for client-side script code to extract MathML or
TeX embedded in an image this way? 

Paul

> -----Original Message-----
> From: microformats-new-bounces at microformats.org [mailto:microformats-
> new-bounces at microformats.org] On Behalf Of James Howison
> Sent: Saturday, October 27, 2007 10:47 AM
> To: For discussion of new microformats.
> Subject: Re: [uf-new] an equation/MathML/TeX microformat?
> 
> Since you are focusing on image files always being present ("Equation
> images are 100% reliable") would another option be to encode the
> MathML or TeX into the image file, then use a class name to indicate
> that this image should be parsed to look for the data.
> 
> eg stick it into the Description or Comment field (or make your own
> field keyword) in a PNG file:
> 
> http://www.libpng.org/pub/png/spec/1.2/PNG-Chunks.html#C.Anc-text
> 
> That could be added to some of the commonly used MathML->PNG or TeX-
>  >PNG tools.  eg
> 
> http://redsymbol.net/software/l2p/dist/l2p-doc.html
> 
> Then you could use something like:
> 
> <img class="parse-image-for-mathml" alt="natural language description
> of forumla" src="image-with-embedded-MathML.png" />
> 
> Of course this would need a tool to read the text out from the image
> file, but those are widely available. Client side applications that
> supported MathML could even read the MathML out of the image and
> replace the PNG with the preferred MathML content.
> 
> Yeah, it wouldn't be in the HTML file, but you'd almost always also
> have to download the image representation, no?
> 
> It would have the advantage of being able to be easily emailed too :)
> 
> --J
> 
> On Oct 27, 2007, at 11:46 AM, Paul Topping wrote:
> 
> > For an explanation of why I'm looking to embed MathML and/or TeX in
> > HTML, I refer you to my original post for the details. The short
> > answer
> > is that MathML is not a universal enough solution. I am aware of
> > most of
> > what you mention below (especially the stuff about my own company's
> > products -- thanks!) and agree with it. Perhaps I should come at
this
> > from a slightly different angle.
> >
> > Instead of addressing the successes and failures of MathML, let's
look
> > at the many "solutions" to the equation display in a web page
problem.
> > There are many websites that represent equations as images. They do
> > this
> > because of the universal browser compatibility of HTML with equation
> > images. MathML is not a solution as it is not close to being
> > universally
> > supported in browsers. This is a big issue in education which is
> > usually
> > not in a position to dictate browsers and, perhaps more importantly,
> > doesn't want to embrace any solution that might require the user to
> > download plugins and/or fonts. Equation images are 100% reliable.
> >
> > Pretty much all the equation images are produced by rendering TeX or
> > MathML using a variety of tools. In the case of TeX at least, this
> > textual representation is also potentially useful in the browser so
> > some
> > math websites embed the TeX in alt text or as a comment. Many don't
> > expose this textual representation at all.
> >
> > If math websites embedded the TeX or MathML used to produce each
> > equation image in a standardized manner, it would enable client-side
> > software to provide math accessibility and interoperability. And,
> > if it
> > could be done in a way that didn't interfere with other HTML
features,
> > such as commandeering alt text away from its intended purpose, that
> > would be a win. Finally, it needs to be done in a rigorous way. For
> > example, if a certain version of LaTeX is used for the
representation,
> > that should be declared so the s/w that processes the notation
should
> > not have to sniff and guess.
> >
> > Paul Topping
> > Design Science, Inc.
> > www.dessci.com
> >
> >> -----Original Message-----
> >> From: microformats-new-bounces at microformats.org
[mailto:microformats-
> >> new-bounces at microformats.org] On Behalf Of Benjamin Hawkes-Lewis
> >> Sent: Saturday, October 27, 2007 6:25 AM
> >> To: For discussion of new microformats.
> >> Subject: RE: [uf-new] an equation/MathML/TeX microformat?
> >>
> >>
> >> On Thu, 2007-10-25 at 15:39 -0700, Paul Topping wrote:
> >>
> >>> I'm aware of the effort to support MathML in HTML5 but this effort
> >> seems
> >>> unlikely to bear fruit. Besides, I'm looking to create something
> > that
> >>> will work in "plain old HTML" which, as I understand it, is part
of
> >> the
> >>> microformat philosophy. As I stated, MathML was originally
intended
> > to
> >>> be implemented in browsers but the actuality leaves something to
be
> >>> desired. With HTML5, I would simply be waiting for yet another
thing
> >> to
> >>> be implemented in browsers. Exactly what I want to avoid.
> >>
> >> Personally, I believe wanting to use MathML is one of the rare
> >> use-cases that justifies choosing XHTML over HTML 4.01, in which
case
> >> you can inline MathML directly into your markup.
> >>
> >> Can you provide some more details about the problem you're trying
to
> >> solve, in particular:
> >>
> >> 1. What are the deficiencies of existing techniques for using
> >> mathematical markup in conjunction with HTML that you are
> > realistically
> >> expecting to overcome?
> >>
> >> 2. What are the key existing consuming agents you'd like the
solution
> > to
> >> those deficiencies to work with? e.g. What platforms, browsers, and
> >> assistive technology?
> >>
> >> Currently, your company's MathPlayer ActiveX plugin can already
speak
> >> maths and has MSAA support for assistive technology:
> >>
> >> http://www.dessci.com/en/products/mathplayer/tech/accessibility.htm
> >>
> >> Gecko has built-in support for a subset of MathML; the FireVox
> > extension
> >> turns Firefox into a self-voicing browser on all three major
> >> platforms
> >> and can read MathML using Abraham Nemeth's Mathspeak rules:
> >>
> >> http://www.firevox.clcworld.net/features.html
> >>
> >> The next version of Opera will include support for a subset of
MathML
> >> and restore (some level of) support for assistive technologies on
> >> Windows and Mac:
> >>
> >>
> > http://dev.opera.com/articles/view/can-kestrels-do-math-mathml-
> > support-
> >> in/
> >>
> >> http://my.opera.com/desktopteam/blog/2007/08/31/focus-areas-during-
> >> kestrel-development
> >>
> >> The only major engine missing support for MathML at all is WebKit:
> >>
> >> http://bugs.webkit.org/show_bug.cgi?id=3251
> >>
> >> This situation could be a /lot/ better.
> >>
> >> Your problem seems very similar to the problem of inlining RDF in
> > HTML:
> >>
> >> http://infomesh.net/2002/rdfinhtml/
> >>
> >> http://esw.w3.org/topic/EmbeddingRDFinHTML
> >>
> >> The suggested approach of dumping XML markup inside comments was
> >> actually used by Creative Commons and Moveable Type's Trackback
> > system:
> >>
> >> http://www.xml.com/pub/a/2003/01/15/creative.html
> >>
> >> But I can't really see how embedding MathML markup inside HTML
> > comments
> >> is supposed to improve things with existing (or even future)
> >> consuming
> >> agents. You did say this:
> >>
> >>> Also, I want to put the MathML or TeX in the page, not in separate
> >>> documents. Typical pages with math in them might have dozens of
> >>> equations. Having their representation in separate files is
> >>> inefficient but perhaps the biggest problem is that it makes
> > authoring
> >>> a lot more tedious as lots of small files have to be managed.
> >>
> >> The web stack tends to work by stitching together lots of different
> >> resources to make up the final presentation the user experiences
when
> >> she visits a URI: multiple HTML documents, scripts, stylesheets,
> > images,
> >> Flash movies, and so on all get bound together by a master
document.
> > I'm
> >> generally sceptical about attempts to fight against this model in
the
> >> name of efficiency without changing the HTML specification itself.
> >>
> >> There's also an issue of efficiency for consuming agents that don't
> >> support MathML and would have to waste bandwidth on MathML
> >> embedded in
> >> comments.
> >>
> >> Having said that, additional requests are more expensive than
> > additional
> >> bytes. But even if you have to externalize MathML outside the main
> > HTML
> >> document, you could (I guess) cut down the additional requests to
> >> just
> >> one by using fragment identifiers to pinpoint individual equations
in
> > an
> >> XML document containing a series of equations. You could even make
> >> the
> >> referenced document an XHTML version of the HTML document, and use
a
> >> <link rel="alternate" type="application/xhtml+xml"> element to
point
> > to
> >> the XHTML version from the HEAD of the HTML version, and use the
HTML
> >> HREF and LONGDESC attributes to point to fragment identifiers for
> >> particular equations. HTTP content negotiation could perhaps be
used
> > to
> >> send supporting consuming agents to the XHTML version in the first
> >> place.
> >>
> >> For very short equations, you might be able to do away with the
> > external
> >> document altogether by using data URIs instead of referencing an
> > actual
> >> external document:
> >>
> >> http://www.w3.org/TR/html401/struct/objects.html#adef-data
> >>
> >> http://en.wikipedia.org/wiki/Data:_URI_scheme
> >>
> >> If authoring tools make creating master documents hard, I think
that
> >> highlights a flaw in the authoring tools in question, and the
> > authoring
> >> tools are likely to be a little easier to "fix" than consuming
> >> agents.
> >> OpenDocuments consist of an archive of multiple resources, but the
> > user
> >> experience of editing an OpenDocument in OpenOffice.org is of
editing
> > a
> >> seamless document.
> >>
> >> Using HTML comments to hide stuff is generally a bad idea since
> > comments
> >> should be disposable and sometimes are discarded. There are cases
> > where
> >> that doesn't matter. For example, it doesn't matter when
conditional
> >> comments are used to serve different styles to IE are okay
> >> because, if
> >> content has been kept separate from presentation, it should be
> > possible
> >> to use the browser's default CSS to achieve a usable presentation
of
> > the
> >> document. Now there could be cases where using comments is worth
the
> >> risk even for data that is important to understanding a document,
but
> > I
> >> don't really see sufficient advantages in this particular case to
> > offset
> >> the costs.
> >>
> >> --
> >> Benjamin Hawkes-Lewis
> >>
> >> _______________________________________________
> >> microformats-new mailing list
> >> microformats-new at microformats.org
> >> http://microformats.org/mailman/listinfo/microformats-new
> >
> > _______________________________________________
> > microformats-new mailing list
> > microformats-new at microformats.org
> > http://microformats.org/mailman/listinfo/microformats-new
> >
> 
> _______________________________________________
> microformats-new mailing list
> microformats-new at microformats.org
> http://microformats.org/mailman/listinfo/microformats-new



More information about the microformats-new mailing list