using-utf-8

From Microformats Wiki
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Using UTF-8

Many folks using and authoring microformats have found that consistent use of UTF-8 in the toolchain helps ensure that microformatted international content (i.e. with non-ASCII7 characters) is preserved from publication to indexing to aggregation and addition to desktop applicaions. (You could say I personally have some incentive to get this to all work properly, or rather, that I end up being a good test case ;) Tantek Çelik

Tips

HTML

  • Use valid (X)HTML. Preferably XHTML 1.0 Strict.
  • Specify the character-set explicitly as UTF-8, e.g. with
<meta http-equiv="content-type" content="text/html; charset=utf-8" />

e.g. here is a complete valid XHTML 1.0 Strict UTF-8 document

 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
         "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml">
 <head>
     <meta http-equiv="content-type" content="text/html; charset=utf-8" />
     <title>Valid XHTML 1.0 UTF-8 document</title>
 </head>
 <body>
 
 </body>
 </html>

Sidenote: this (meta http-equiv) is perhaps the *only* meta tag worth using in an (X)HTML document.

  • AVOID the ?xml prolog for sending the page as XHTML or XML.
    • It is undesirable because it causes IE6/Windows to go into quirks mode.
    • It is also unecessary
    • Thus delete this if you see it at the top of your document: <?xml version="1.0" encoding="UTF-8"?>

Web Server

Make sure that you have configured the web server to also send the character set as UTF-8 for HTML documents. E.g. for Apache, you can put this in your .htaccess file:

 AddType 'text/html; charset=UTF-8' .html

Middleware

  • Make sure that your middleware languages, tools, and frameworks (i.e. PHP, Python, Perl, XSLT, Tidy) are all using UTF-8 aware string handling functions.

Database

  • Use UTF-8 string fields in your databases.
  • Configure your database accordingly
    • For Postgres, use: client_encoding = "UTF-8"