Character Encodings

A user agent is typically told by the server (by the "charset" parameter of the "Content-Type" header field) what character set or encoding is used in a web document. If this cannot be done, then a "meta" tag can be used. For example: <meta http-equiv="Content-Type" content="text/html; charset=utf-8">. This "meta" tag should appear as early as possible in the "head" element. Without this information, a user agent will need to further deduce what character encoding was used.

When decided what encoding to use, it is usually best to use a character encoding that is "universal". UTF-8 is the recommended encoding for web document. Avoid using OS specific or system dependent character encodings, such as those whose name contains "windows", like "windows-1252".

The W3C HTML 4.01 Recommendation says that commonly used character encodings on the Web include ISO-8859-1 (also referred to as "Latin-1"; usable for most Western European languages), ISO-8859-5 (which supports Cyrillic), SHIFT_JIS (a Japanese encoding), EUC-JP (another Japanese encoding), and UTF-8 (an encoding of ISO 10646 using a different number of bytes for different characters). Names for character encodings are case-insensitive, so that for example "SHIFT_JIS", "Shift_JIS", and "shift_jis" are equivalent."