[uf-discuss] class="tag"
Duncan Cragg
uf-discuss at cilux.org
Tue Jul 1 13:49:33 PDT 2008
Ciaran McNulty wrote:
> On Sun, Jun 29, 2008 at 3:07 PM, Duncan Cragg <uf-discuss at cilux.org> wrote:
>
>> Those of us who favour opaque URLs (actually for practical reasons such as
>> clean separation of concerns, maintainability, etc.) are unhappy with being
>> forced into a semantic URL schema when using rel-tag.
>>
> Can you go into a bit more detail, or point to a resource explaining
> the benefits of opaque URLs? It's something I've not come across
> before and I'd be intrigued to see the reasons behind it.
>
I'll do both. Here's a resource explaining it - I addressed the subject
in this blog post:
http://duncan-cragg.org/blog/post/content-types-and-uris-rest-dialogues/
That is a very transparent URL (see: I'm not obsessive about it!).
The trouble with my URL is that it mixes three concerns:
1. making a connection to my server and kicking off HTTP
2. identifying a resource (with a completely opaque string) within HTTP
3. kicking off some Python code with an argument string
It's 1. and 3. I'm talking about. URLs are already opaque to HTTP.
As soon as you allow in syntax or schema in URLs - as soon as you start
using anything other than long random numbers - you've got a problem of
namespace allocation and schema standardisation. I refer to "Zooko's
Triangle" on my blog's right rail which discusses the trade-off between
global uniqueness, security and memorability.
_________________________________________
On 1.: Unless you're running fancy P2P algorithms, it's hard to argue
against putting a big hint in the URL to say where to go to find the
resource. But don't forget that you needn't go to that server - you
could ask an intermediary proxy - which is kind of a simplistic P2P
algorithm...
However, there is a case for arguing that DNS has been a failure: it
isn't any more easy to type a URL when you know you have to be so
precise to avoid scam sites. And it isn't any easier to use it to
identify a site when you have to avoid the likes of
www.yahoo.com.baddies.com or www.google.randomtld . You may as well only
use IP addresses; as hard to type and as useless to read. Most programs
come with a copy-paste function to save some typing...
Add to this lack of security (and other security holes) the absurd
scramble for domain name real estate and such bad behaviour as domain
squatting, etc., and it's looking like a system that only system admins
and crooks benefit from.
Most people (including myself) would type 'acme' into Google instead of
'acme.com' into the URL bar, to give an extra level of intelligence,
familiarity, trust and user interface consistency.
_________________________________________
But really it's 3. that bothers me most. Using URLs to pass
human-readable strings to an application 'above' HTTP.
A transparent URL string is always a query string (whether it has a '?'
or not) - in other words, it could potentially be ambiguous and return,
not definitely one, but zero or many possible results. We probably get
zero results when we 'hack' a URL or when the site gets reorganised. We
gloss over the many-results case by returning a single page that we call
'query results'. But by allowing in zero or many resources so easily,
we've loosened the Web by removing the definite 1-1 mapping of URL to
resource.
Hackable URLs should not be part of a self-respecting website's user
interface. We would give a better user experience if we took the URL bar
away and replaced it with a 'jump to first clipboard web link' button,
for those copy-paste situations. Such a button would intelligently parse
the text on the clipboard for URLs and jump to the first location
discovered. A good information architecture and user interaction design
makes hackable URLs irrelevant.
Another problem is when people start using their knowledge of the URL
structure to generate new URLs - it may be acceptable or encouraged
(even prescribed in an HTML GET form), but each time it happens, we're
creating a unique mini-contract - another non-standard schema. The Web
thrives on URL proliferation, not on schema proliferation!
The need for URLs to be reliable - to always return what they are
expected to return each time they're used - means that whatever URL
schema or namespace you come up with is something you're stuck with -
people or even programs may depend on it. But there's no standards body
or namespace body looking after the bigger picture for you. Your
mistakes may haunt you for a long time.
Also, query URLs are inherently /not/ reliable - the resource they
return is /expected/ to change, which again makes their (re)-use less
desirable.
Clearly, the W3C's unfortunate 'httpRange-14' issue would never have
occurred with opaque URLs. In other words, opaque, semantics-free HTTP
URIs are /always/ dereferencable to 'information resources' and /never/
refer to cars! Strings that are part of a car domain model belong inside
/content/ not in links to content - they belong above HTTP. I'm not
fully conversant in the Semantic Web domain, but I suspect that there
are issues in there that are caused by mixing up globally unique
identifier strings used to build information structures with strings
that are semantically-meaningful over those structures, and that can
dereference to sets.
So my main objection to transparent URLs is the way they mix up the
mechanism for linking up the Web with a mechanism for querying it. The
Web works fine using HTTP and opaque URLs. We have POST and Content-Type
and OpenSearch schemas to query the Web.
_________________________________________
Practical examples..
You can return opaque links to time-ordered collections listing the
latest documents to be tagged 'semweb':
<a class="tag" href="http://tagbeat.com/3720a-993117b">semweb</a>
Keep your URLs opaque (like GUIDs in databases) and put your application
data and queries in the content (like SQL queries and result sets in
databases). Give your query content resources a first-class schema - see
OpenSearch - and even their own URLs. POST these queries to opaque
collection URLs. Make your result sets transient (returned in the POST
response, thus no-cache by default). Result sets should only be
'grounded' (thus linkable and cacheable) if explicitly asked for in the
query, when you should redirect to a new resource in the POST response.
Of course, you can still surround the UUID/GUID part of your opaque URLs
with human-readable string decorations, as long as they're never used to
dereference the resource but just for mnemonic purpose, or for search
engine optimisation.
_________________________________________
I've gone on at length (again!), but hope you have had the patience to
get my point of view. =0)
Cheers!
Duncan Cragg
PS I work at the Financial Times over the river from you - but I was a
URL opacitist /before/ having to wrangle with the FT CMS...!
More information about the microformats-discuss
mailing list