[uf-discuss] Live Clipboard and uF escaping (was Fwd: 0.91 Spec comment: escaped markup is harmful)

Danny Ayers danny.ayers at gmail.com
Thu Apr 6 03:25:59 PDT 2006

Dear uFers,

The post below is from the MS Live Clipboard list,


in relation to:


I'm not sure I understand why Matt suggests XML data might have to be
delivered as both XML and escaped as well, but he gets into
browser/DOM territory, a place presumably well-known around this list
- thoughts appreciated.

---------- Forwarded message ----------
From: Matt Augustine <matta at microsoft.com>
Date: Apr 6, 2006 5:23 AM
Subject: Re: 0.91 Spec comment: escaped markup is harmful
To: LIVE-CLIP at discuss.microsoft.com

Thanks for all the information.  Escaping only non-XML data formats and
leaving XML data formats as part of the Live Clipboard document seems
like a reasonable compromise, but I have a few reservations:

Since the items in the LiveClipboardContent object might contain either
XML or escaped, non-XML data, we would most likely have to add a second
property, named something like xmlData, to hold an XML node in addition
to the existing data property.  Applications would have to know which
property to use based on the contenttype of the format.

In the microformat case, treating the data as a string rather than as
parsed XML makes it easy to display the data by setting it as the value
of the innerHTML property of an element on the page.  In order to avoid
forcing applications that use this technique to first serialize the
xmlData, we would probably want to always provide this serialized data
in the existing item data property.  If there is a way to directly add
the parsed XHTML of the microformat into an HTML element we could avoid
this, but as far as I know this isn't possible.

IE6 provides an easy way to get the serialized string value of all the
xml content of a particular node via the "xml" property.  I'm not aware
of a similar property when using the DOMParser in Mozilla or other
browsers.  Is there an easy way to get access to this serialized data in
order to satisfy the previous requirement?

As for David's comments about the ugly XML parsing / serialization code,
we tried to contain all of that error-prone stuff within our script to
make interfacing with Live Clipboard data as simple as possible for the
developer and to make sure the emitted XML put on the clipboard is
compliant with the spec.  We're certainly open to alternative light
weight implementations of the control though, as long as they are
compatible with the clipboard XML format as specified and present the
same user experience for copy/paste.


Matt Augustine
Software Design Engineer
CTO Concept Development Team
Microsoft Corporation

-----Original Message-----
From: Live Clipboard discussion and feedback
[mailto:LIVE-CLIP at DISCUSS.MICROSOFT.COM] On Behalf Of M. David Peterson
Sent: Tuesday, April 04, 2006 7:41 PM
Subject: Re: 0.91 Spec comment: escaped markup is harmful

On Tue, 04 Apr 2006 16:24:51 -0600, Danny Ayers <danny.ayers at GMAIL.COM>


> On 4/4/06, Seth Russell <russell.seth at gmail.com> wrote:
>> In many applications the programmer is not in control of what kind of

>> markup
>> will be processed through the system.
> But to be able to process it at all, some indication of what it is
> will be needed - and the current spec has the contenttype attribute.
> If it's an XML format, it should be indicated here by the producer.
> This is independent of whether the data appears directly as XML or is
> escaped/encoded.

Exactly.  To be able to convert it from one format to another, an
important piece in all of this is knowing what type of format you are
dealing with in the first place.  To think that the oddball case in
we have absolutely no clue whats on the clipboard should be the driving

force behind using a politically-"I know no type I do not like"-correct

processor solution is quite simply (and being quite frank) -- stupid.

Don't mean to offend, just don't want to spend the next 10 years of my
life dealing with code that makes me itch just thinking about it when I

know there are a VAST number of better solutions than can and should
be/have been implemented.

>  I don't think that there is a way to
>> always convert content from the wild it into well formed XML.  If
>> is,
>> please point me to a easy to deploy PHP library.
> What source format? HTML Tidy can output XHTML, I'm sure there's a
> binding for PHP. I don't know of any pure-PHP options, but I'd be
> surprised if there weren't any.

Yep.  In fact there is both a tidy and, if not mistaken (though need to

check to be certain) a Tag Soup solution for PHP.  In fact, being one
has been working quite intimately and directly with both Saxon and Dr.
to help bring the Saxon.NET project over to Saxonica, to then work on
various details with the new implementation, I can tell you with solid
assurance that there is a simple and straight forward way of invoking a

Tag Soup transformation of any given file, or folder of files/file
folders, with one extension function built directly into the Saxon
engine.  Given the ability to invoke Java-based solutions and, with
Phalanger, .NET-based solutions from within your PHP code base, then it

can easily be stated that "yeah, there's a solution that will work with


>  If Live Clipboard is to
>> be useful between domains in the wild, i think the CDATA "black hole"

>> is the
>> only way to go.
> I'm not suggesting that every producer should produce XML. Only that
> *when they do* it's transported as such and not obfuscated.

Yep.  What Danny said :)

> Cheers,
> Danny.
> --
> http://dannyayers.com

Using Opera's revolutionary e-mail client: http://www.opera.com/mail/



More information about the microformats-discuss mailing list