Multilingual Web pages

Character Encoding for Multilingual web pages

Character Encoding refers to the format in which the text files are stored. Computers are designed to process data in binary digits(0s and 1s). So in order to represent characters such as alphabets in binary, computers use character encoding. There have been encoding formats used over the years like:

ASCII
It was the first format and supported 128 characters like numbers, alphabets and symbols.
ANSI
It supported 256 characters and was an improvement over ASCII.
UTF-8
It is the default format for encoding in many modern editors. It supports almost all language characters and symbols.

If you want to have multiple languages in your document make sure you are using UTF-8 encoding.

The character encoding can be specified using the charset attribute in meta element.

<meta charset="utf-8">

Lang attribute

While specifying the character encoding format and typing the content in other languages will work fine, there is no way for other programs to know the language used in the web page. The language used in the web page can be specified using the lang attribute in the html element.

<html lang="en-US">
</html>

The lang accepts a language code and dialect code if the language has different dialects. While this provides the information about the language used in the web page, there are occasions where multiple languages are used in the same web page. In this case we can specify the lang attribute for that specific section that has other languages.

<span lang="ta">வணக்கம்</span>

If the language does not appear properly then the file must be encoded in different format. Save the file again by selecting UTF-8 at the time of saving.