hash-brainstorming

From Microformats Wiki
Jump to navigation Jump to search

Hash brainstorming

The Problem

Checksums (MD5 & SHA-1 hashes) are offered for files to prove they haven't been tampered with and to uniquely identify them. They are very useful, but they are not used as much as they could be. The current method involves a manual process of hashing the downloaded file (with programs that are not installed by default on all operating systems) and then comparing the value to the one listed. An easy and automatic way to use them would be preferrable to present methods.

Participants

Proposal

A microformat for MD5 and SHA-1 hashes could make them more usable. MD Hash Tool, another extension, or download managers could be modified to use them automatically.

<span class="download">
     <a rel="bookmark" href="...">Download OpenOffice.org
     <span class="checksum md5">e0d123e5f316bef78bfdf5a008837577</span>
     </a>
</span>

Use with hAtom

For example, with RSS and Atom feeds, we have something called an "enclosure". With an "enclosure" you are being told that this file (that the enclosure points to) is "attached" to this item. (And that you might want to go and download it.)

Now, having checksum information (like a MD5 checksum) could be very useful for this. Especially in the context of hAtom -- the Microformat variation of Atom. And the rel-enclosure Microformat.

So, if we combined the two -- combined this semantic HTML for "downloading" and rel-enclosure -- then we might get something like this:

<span class="download">
     <a rel="bookmark enclosure" href="...">Download OpenOffice.org
     <span class="checksum md5">e0d123e5f316bef78bfdf5a008837577</span>
     </a>
</span>

Note that I've added "enclosure" to the "rel" attribute of the <a> element. This could be used in other Microformats and semantic HTML too.

Issues

The proposal above has a few issues.

  • Visibility of metadata: The type of checksum (MD5, SHA1, …) should not be inside an attribute, since it is metadata that should be visible.
  • Checksum type attribute: There would need to be a massive number of class names for all possible checksum types: md2, md4, md5, sha1, sha256, sha384, sha512, tiger, ripemd128, ripemd160, etc.

The second sub-section also mentions use with hAtom, even though rel-enclosure is not tied to hAtom.

Proposal #2

Based on the two issues mentioned above, I propose a following hash format similar to the one in this example:

<span class="checksum">The <span class="type">MD5</span> checksum of this download is <span class="value">e0d123e5f316bef78bfdf5a008837577</span>.</span>

This introduces a type attribute for the kind of checksum (MD5, SHA1, RIPEMD-160, etc), and value for the actual checksum.

Hashes and UIDs

Hashes an UIDs (unique identifiers) have quite a bit in common. Hashes are, usually, unique as well, except in a few edge cases where two different items have the same MD5 sum. Also see uid and related pages.

I think the difference between hashes and UIDs is too big for the two concepts to be merged. Hashes are, in my experience, not used to reference items. For example, you wouldn't say "give me the MP3 file with MD5 sum d2c550efaecc2bed137f7e40255177fe" but you could say "give me the book with ISBN ISBN0596002815". DenisDefreyne 16:02, 5 Aug 2007 (PDT)