Building a complex website can be a daunting task, but when the site requires multiple languages the project can quickly swell into a world of impossibilities. Let's break down the big pieces into the little bites.
In this article we will look at:
- Multi-Language Webpage Content
- HTML Tag Language Attributes for Search Engines, Screen Readers and Braille Devices.
- Page Encodings and Fonts
- Content Storage Approaches
- In-line Token Replacement
- Full Site
- Alternative Records for Each Entry
When a website requires multiple languages the project can quickly swell into a world of impossibilities. (CLICK TO TWEET)
Multi-Language Webpage Content
HTML Tag Language Attributes for Search Engines, Screen Readers and Braille Devices.
When a service or device consumes content on a page, it looks for specific clues in the form of HTML tags to give it information about the details on the page. This includes language. The Internal Organization for Standardization (ISO) has defined the two-character codes for each language. The web uses these codes to identify the language used within content on a page.
Each well-formed webpage has a top-level —html— tag. This tag should identify the default human-read language on the page. This is true even if there are multiple languages used on a single page, there should be a —default— language set.
But what happens if we have content on the page in a different language other than the default?
We can have a page in English with some of the content in another language, and still identify that to the content reader. The same —lang— attribute can be applied to any markup tag.
<a lang="es" title="Spanish" href="qa-html-language-declarations.es" xml:lang="es">Espa√±ol</a>
If the language of the destination document is in a different language than the current default language, you can also specify this with the "hreflang" attribute in the anchor tag.
<a href="file.html" lang="fr" xml:lang="fr">Francais</a>
But what happens when there isn't a tag to hang your attribute on? Adding an inline span is the way to go:
You'd say that in Chinese as ‰∏≠ÂõΩÁßëÂ≠¶Èô¢ÊñáÁåÆÊÉÖÊä•‰∏≠ÂøÉ.
CSS switches for multiple languages in a site
Within the HTML 4.0 standard is a pseudo-class for language definition and formatting within CSS. This gives you the flexibility to update the font used, or any other attribute on the font required. This is not supported in older version of Safari and Internet Explorer 7 and below.
Ciao a tutti!
Page Encodings and Fonts
When your browser loads a webpage, it does its best to present the content consistently with the target language of the page. This includes the character set defined and how the letters are encoded, which fonts to use and how they must be displayed.
Your browser surmises this from several sources:
- The content header delivered by the web server. Apache will deliver a character encoding header that looks similar to this:
Content-Type: text/html; charset=ISO-8859-4
- The character encoding declaration in the HTML document itself. If the web server configuration or server-side scripting language cannot add the content type header, the encoding can be declared in the document itself:
Or in HTML5 the same declaration can be made like this:
- Content sniffing - the browser reads through clues in the content itself and determines which character encoding should be employed.
If the browser can not determine the encoding, it will make its best guess. Many times the best guess can return unexpected and undesirable results.
Content Storage Approaches
Multiple languages present a challenge to content management. Depending on the type of content, the type of website, the size of the data set, and the staff resources, there are different approaches to take in storing and presenting your site content. Most multiple-language sites use a combination of the following techniques:
In-line Token Replacement
One of the simplest approaches to storing language for use in a multiple language site is to use dictionary token replacement. For example, the navigation may include a "home" link that is replaced with "inicio" in Spanish.This token replacement method is usually the easiest to employ, but is generally not flexible or scalable enough to handle content-rich websites. These tokens are very useful for simple words or phrases that are common throughout the site such as navigation, footer info, filters, or other user experience link text. Generally, this type of information is stored in a text file (such as a language override file in ExpressionEngine or Magento), a simple table with one row per token and one column per language, or if the number of languages is large,as anEntity‚attribute‚value (EAV) model data set.
Many times, the target language will contain a very different navigation, images and text content. An approach to storing the data in this situation may be to re-create the entire data structure of the site (reusing any templates, of course) and load the content based on a different domain name,the segment in the directory path of the URL (Trenitalia in English: http://www.trenitalia.com/tcom-en Trenitalia in Italian: http://www.trenitalia.com/tcom or as a subdomain. This gives the most freedom to the target language site construction, SEO, navigation and content but requires the most amount of content management. This approach works very well if there are separate content management teams for each language.
Alternative Records for Each Entry
For a small or medium sized brochure website with a single marketing department managing one set of content but presented in multiple languages, the content exists once, with alternative fields or records available for the target languages. This keeps the overall management of the content centralized, with the language localization managed per-record. When a new article, blog post or static page is added, it is inserted in the target language as the parent record. Translations in each available location language are then inserted and updated to be associated with the original record. During the presentation stage processing of the data on the front-end, the parent record URL is loaded, and if the session (or url parameter) dictates a language switch, the alternate translation is displayed in its stead.
Always help your visitors out with proper headers and encoding meta information to ensure the browser renders the target language exactly as desired. Proper content strategy and planning can save dozens of hours of heartache and make the experience smooth for everyone. To see an example of a site we've built that supports multi-language for translated content, as well as country-specific pricing, check out Portland English Language Academy.