using-utf-8: Difference between revisions
(first draft) |
m (Reverted edits by EltrtRvart (Talk) to last version by Brian) |
||
(6 intermediate revisions by 5 users not shown) | |||
Line 10: | Line 10: | ||
* Use [http://validator.w3.org/ valid] (X)HTML. Preferably XHTML 1.0 Strict. | * Use [http://validator.w3.org/ valid] (X)HTML. Preferably XHTML 1.0 Strict. | ||
* Specify the character set explicitly | * Specify the character-set explicitly as UTF-8, e.g. with | ||
<pre><nowiki> | <pre><nowiki> | ||
Line 31: | Line 31: | ||
</html> | </html> | ||
</nowiki></pre> | </nowiki></pre> | ||
Sidenote: this (meta http-equiv) is perhaps the *only* meta tag worth using in an (X)HTML document. | |||
* AVOID the <code>?xml</code> prolog for sending the page as XHTML or XML. | * AVOID the <code>?xml</code> prolog for sending the page as XHTML or XML. | ||
** It is undesirable because it causes IE6/Windows to go into quirks mode. | ** It is undesirable because it causes IE6/Windows to go into quirks mode. | ||
** It is also [http://tantek.com/XHTML/Test/minimal.html#variants unecessary] | ** It is also [http://tantek.com/XHTML/Test/minimal.html#variants unecessary] | ||
** | ** Thus delete this if you see it at the top of your document: <code><nowiki><?xml version="1.0" encoding="UTF-8"?></nowiki></code> | ||
=== Web Server === | === Web Server === |
Latest revision as of 19:19, 3 January 2009
Using UTF-8
Many folks using and authoring microformats have found that consistent use of UTF-8 in the toolchain helps ensure that microformatted international content (i.e. with non-ASCII7 characters) is preserved from publication to indexing to aggregation and addition to desktop applicaions. (You could say I personally have some incentive to get this to all work properly, or rather, that I end up being a good test case ;) Tantek Çelik
Tips
HTML
- Use valid (X)HTML. Preferably XHTML 1.0 Strict.
- Specify the character-set explicitly as UTF-8, e.g. with
<meta http-equiv="content-type" content="text/html; charset=utf-8" />
e.g. here is a complete valid XHTML 1.0 Strict UTF-8 document
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> <title>Valid XHTML 1.0 UTF-8 document</title> </head> <body> </body> </html>
Sidenote: this (meta http-equiv) is perhaps the *only* meta tag worth using in an (X)HTML document.
- AVOID the
?xml
prolog for sending the page as XHTML or XML.- It is undesirable because it causes IE6/Windows to go into quirks mode.
- It is also unecessary
- Thus delete this if you see it at the top of your document:
<?xml version="1.0" encoding="UTF-8"?>
Web Server
Make sure that you have configured the web server to also send the character set as UTF-8 for HTML documents. E.g. for Apache, you can put this in your .htaccess file:
AddType 'text/html; charset=UTF-8' .html
Middleware
- Make sure that your middleware languages, tools, and frameworks (i.e. PHP, Python, Perl, XSLT, Tidy) are all using UTF-8 aware string handling functions.
Database
- Use UTF-8 string fields in your databases.
- Configure your database accordingly
- For Postgres, use:
client_encoding = "UTF-8"
- For Postgres, use: