hash-examples: Difference between revisions

From Microformats Wiki
Jump to navigation Jump to search
(Adding proposal #2)
No edit summary
Line 1: Line 1:
= Hash Examples =
<h1>Hash examples</h1>
A microformat for MD5 and SHA-1 hashes.
 
A microformat for cryptographic hashes, such as MD5 and SHA-1.
 
__TOC__


== The Problem ==
== The Problem ==
Line 122: Line 125:


This introduces a <code>type</code> attribute for the kind of checksum (MD5, SHA1, RIPEMD-160, etc), and <code>value</code> for the actual checksum.
This introduces a <code>type</code> attribute for the kind of checksum (MD5, SHA1, RIPEMD-160, etc), and <code>value</code> for the actual checksum.
== Related Pages ==
* [[hash-brainstorming]]

Revision as of 13:55, 3 August 2007

Hash examples

A microformat for cryptographic hashes, such as MD5 and SHA-1.

The Problem

Checksums (MD5 & SHA-1 hashes) are offered for files to prove they haven't been tampered with and to uniquely identify them. They are very useful, but they are not used as much as they could be. The current method involves a manual process of hashing the downloaded file (with programs that are not installed by default on all operating systems) and then comparing the value to the one listed. An easy and automatic way to use them would be preferrable to present methods.

Participants

Real-World Examples

Currently, MD5 and SHA-1 checksums are either listed on a webpage or email (see Example #1) or stored in a separate file such as (filename.ext.md5 or filename.ext.sha1) (see Example #2). There is no standard or automatic way to use them. Verifying a file after you have the hash is not complex, but it is more than the average user is used to doing (see OpenOffice.org: Using MD5 sums). MD5 checksums are 32 digit hexadecimal numbers, while SHA-1 checksums are 40, and SHA-256 checksums are 64.

Who offers MD5/SHA-1 checksums with software

This is only a small sampling.

Example #1: OpenOffice.org MD5 sums

English Application Binaries

e0d123e5f316bef78bfdf5a008837577  OOo_2.0.1_LinuxIntel_install.tar.gz
35d91262b3c3ec8841b54169588c97f7  OOo_2.0.1_LinuxIntel_install_wJRE.tar.gz
cc273fe9d442850fa18c31c88c823e07  OOo_2.0.1_SolarisSparc_install.tar.gz
ff6626c69507a6f511cc398998905670  OOo_2.0.1_SolarisSparc_install_wJRE.tar.gz
ce099d7e208dc921e259b48aadef36c1  OOo_2.0.1_Solarisx86_install.tar.gz
4fb319211b2e85cace04e8936100f024  OOo_2.0.1_Solarisx86_install_wJRE.tar.gz
66bd00e43ff8b932c14140472c4b8cc6  OOo_2.0.1_Win32Intel_install.exe
2d86c4246f3c0eb516628bf324d6b9a3  OOo_2.0.1_Win32Intel_install_wJRE.exe

Example #2: Knoppix MD5 and SHA-1 sums in separate files

KNOPPIX_V4.0.2CD-2005-09-23-EN.iso.md5:

1188f67d48c9f11afb8572977ef74c5e *KNOPPIX_V4.0.2CD-2005-09-23-EN.iso

KNOPPIX_V4.0.2CD-2005-09-23-EN.iso.sha1:

56857cfc709d3996f057252c16ec4656f5292802 *KNOPPIX_V4.0.2CD-2005-09-23-EN.iso

Note: This directory also contains filename.ext.md5.asc and filename.ext.sha1.asc files containing the same checksums and PGP signatures in one file.

Existing Practices

As described above, I believe almost all solutions are manual (see OpenOffice.org: Using MD5 sums), an 8 step process on Windows and 3 steps on Linux. Link Fingerprints which are used by MD Hash Tool, a Firefox extension, is one exception. Here is a Link Fingerprint example:

http://example.org/OOo_2.0.1_LinuxIntel_install.tar.gz#!md5!e0d123e5f316bef78bfdf5a008837577

A Link Fingerprint begins with a traditional URL, then #!md5!, then the MD5 hash.

Metalinks are an XML file format (.metalink) that contain mirrors and checksum information for downloading files. They are used by download programs/managers and mainly open source projects. After a download finishes, the checksum is automatically verified.

Brad Fitzpatrick also suggested referring to "files/patches/changesets" by their unique digest.

Some HTTP server applications compute a hash over the response body to serve as an effective ETag. The server must still compute the body but can benefit from reduced network utilization and reduced downstream cache thrashing. Such applications must be willing to risk a hash collision, albeit scoped to a single URL.

Including a hash in a URL can lead to great cacheability, since the TTL can likely be set to an infinite value. Such URLs are often referred to as versioned URLs.

Proposal

A microformat for MD5 and SHA-1 hashes could make them more usable. MD Hash Tool, another extension, or download managers could be modified to use them automatically.

<span class="download">
     <a rel="bookmark" href="...">Download OpenOffice.org
     <span class="checksum md5">e0d123e5f316bef78bfdf5a008837577</span>
     </a>
</span>

Use with hAtom

For example, with RSS and Atom feeds, we have something called an "enclosure". With an "enclosure" you are being told that this file (that the enclosure points to) is "attached" to this item. (And that you might want to go and download it.)

Now, having checksum information (like a MD5 checksum) could be very useful for this. Especially in the context of hAtom -- the Microformat variation of Atom. And the rel-enclosure Microformat.

So, if we combined the two -- combined this semantic HTML for "downloading" and rel-enclosure -- then we might get something like this:

<span class="download">
     <a rel="bookmark enclosure" href="...">Download OpenOffice.org
     <span class="checksum md5">e0d123e5f316bef78bfdf5a008837577</span>
     </a>
</span>

Note that I've added "enclosure" to the "rel" attribute of the <a> element. This could be used in other Microformats and semantic HTML too.

Issues

The proposal above has a few issues.

  • Visibility of metadata: The type of checksum (MD5, SHA1, …) should not be inside an attribute, since it is metadata that should be visible.
  • Checksum type attribute: There would need to be a massive number of class names for all possible checksum types: md2, md4, md5, sha1, sha256, sha384, sha512, tiger, ripemd128, ripemd160, etc.

The second sub-section also mentions use with hAtom, even though rel-enclosure is not tied to hAtom.

Proposal #2

Based on the two issues mentioned above, I propose a following hash format similar to the one in this example:

<span class="checksum">The <span class="type">MD5</span> checksum of this download is <span class="value">e0d123e5f316bef78bfdf5a008837577</span>.</span>

This introduces a type attribute for the kind of checksum (MD5, SHA1, RIPEMD-160, etc), and value for the actual checksum.

Related Pages