Technical notes

Adding translated content to a website

To maximise accessibility and discoverability HTML should be used.

Print-friendly MS Word versions are preferable to PDF and can be provided alongside HTML content. While PDFs are widely used for translated content their format is often not suitable as they may not contain searchable text. As such, they may not appear in search results and can be very difficult to find in some languages.

If PDF files are still required in addition to MS Word, PDF/UA should be used. PDF accessibility requirements are documented in PDF techniques for WCAG 2.0 and ISO 14289-1:2014. Some community languages have additional requirements. Appendix 2 documents some aspects of the accessibility of HTML and PDF files in relation to community languages.

Key points for displaying translated content:

  • Use characters rather than escaped characters. An escaped character is an alternative way of representing a character, used in some programming languages
  • Indicate the language of each document and any change in language, using the lang attribute on relevant HTML elements
  • Use style sheets for consistent page presentation;
  • Use appropriate encoding on forms and servers that support Australian formats for names, addresses, dates and time
  • Keep text separate from graphics. The space taken up by a translation will often differ from the space taken up by the English version
  • Include a clearly visible navigation system to localised content on each page, using the target language (see section on logo indicating translated material)
  • For writing systems that are rendered from right-to-left, such as Arabic, clearly indicate the base text direction (right-to-left) of the document and indicate changes in text direction when the language of the content changes
  • Check and validate work before publishing it.

Content Management Systems

The themes and templates for a website may need to be updated to support community languages appropriately.

Thought should also be given to how the editing interfaces can be optimised to support editing and markup of community language content. The editing interface should be able to handle all the languages being translated.

The following features should also be available:

  • Ability to control the overall directionality of content in the editing interface
  • Add mark-up to control directionality of block level and inline elements
  • Marking-up change in language on block level and inline elements
  • Display of translations in fonts appropriate to the language within the editing interface.

Not all Content Management Systems in use across Victorian Government websites support Unicode. This may present challenges at the editing interface.

Encoding

Character encoding refers to the way a character (such as a letter or number) is represented in binary data by a computer. ASCII and Unicode are the most common systems of character encoding, and Unicode is best for multilingual content as it supports a larger set of characters from different alphabets and scripts.

All translated content should be provided in Unicode. HTML content should use the UTF-8 character encoding.

Key resources include:

Specifying page encoding

It is essential to declare the encoding of the documents.

  • The character encoding of a document can be specified in the web server’s HTTP Response Header, or the information can be included in the actual web page
  • If the character encoding is declared in the HTTP Response Header, it should also be included within the web page as well
  • The value in the HTTP Response Header must match the value declared in the web page.

Depending on the document type, there are different ways to declare the encoding. The table below indicates the declarations required for UTF-8 encoded HTML4, HTML5 and XML documents.

Document type Language declaration Notes
HTML4 <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> Declared in a meta element within the head element
HTML5

<meta charset="utf-8">

Declared in a meta element within the head element

XML

<?xml version="1.0" encoding="utf-8"?>

Declared before XML root element

What to do when you are not using Unicode

When your CMS is using a legacy encoding, it is possible to convert Unicode content into a format that can be used in a non-Unicode CMS.

It is possible to convert the characters in the HTML content into Numerical Character References (NCR). These are HTML entities that identify a particular character (by decimal or hexadecimal numbers). Browsers will substitute the correct character or letter. For instance the lowercase Greek letter alpha (U+03b1) can be represented as a decimal character reference, for example &#945; or it can be represented in hexadecimal notation, for example: &#x3b1;.

Indicating languages

It is essential to indicate the language of a web page to: enhance accessibility; enable language specific searching within search engines, and; for browsers to select the appropriate fonts.

There is a distinction between the primary language of a document and the text processing language. The text processing language is the language in which the text of the document is written, processed, displayed or read by a screen reader. The lang and xml:lang attributes are used to indicate the text processing language.

It is necessary to declare the default text processing language for the whole document. Declaring a text processing language in the HTML element will specify the default language for the whole document. Do not declare the language of a document in the body element.

If the document has multiple main languages, it will be necessary to decide whether one of the languages is declared as a text processing language in the HTML element, or leave the default text processing language undefined.

For Victorian Government websites, the language of the page is best set to “en” (English) or “en-AU” (Australian English), even when the unique content is not in English.

Document type Language declaration Notes
HTML 4 and HTML5 <html lang="am"> Declared primary language of document in a lang attribute in html element

XML

<html xml:lang="am" xmlns="http://www.w3.org/1999/xhtml">

Declare primary document language in xml:lang attribute of root element

Indicating change of language

It is necessary to declare any language changes within a document. Use the lang or xml:lang attributes around any changes in language within a document. If there is no appropriate element to add the language declaration to, use the div element for a block change and use a span element for an inline change. For example:

<p>The Chinese title is <span lang="zh-Hant">哮喘病簡介</span></p>

The specification of a text processing language not only applies to the content of the element but also to the content of attributes used by the same element. If the text attribute values and the element content is in different languages, consider using a nested approach. For example:

Use nested tags as follows:

<li lang="en-AU" title="Emergency Relief and Recovery – help is available"> <a lang="din" href="/">Akuny wëi kë cï tuöl ku bën-pïïr – kuony aluthïn</a> </li>

Instead of the following code:

<li> <a lang="din" title="Emergency Relief and Recovery – help is available" href="/">Akuny wëi kë cï tuöl ku bën-pïïr – kuony aluthïn</a> </li>

If there are multiple main languages within the document, the web developer should divide the document into blocks at the highest possible level. The appropriate text processing language should be declared for each of these blocks.

When using Unicode it is important to declare the language of text written in Chinese and Japanese. These languages share Unicode characters, but the glyphs may differ between traditional Chinese, simplified Chinese and Japanese.

If the languages are declared in the mark-up, web browsers can use appropriate default fonts for each language/writing script.

For most government sites deploying translated content, the overall language of the site templates will be English. Therefore, it is good practice to wrap the translated content in a div element, or other block level element, with the appropriate lang attribute:

<div id="translationContent" lang="hi"></div>

Key resources:

Appendix 1 contains a list of languages used on Victorian Government websites and the preferred language tag for each.

Text direction

Bi-directional text (known as bidi) contains information that runs both left-to-right, and right-to-left. It generally involves text containing different types of alphabets, i.e. scripts that are read right-to-left and left-to-right.

The design of templates or themes needs to accommodate both RTL (right-to-left) and LTR (left-to-right) languages. It is important to handle bidirectional text with care. In HTML Unicode documents, it is possible to add the dir attribute to a HTML entity to indicate the directionality of text within that element.

For a web page written in a right-to-left script, the overall document direction should be indicated in the html element. For example:

<html lang="ar" dir="rtl">

Do not add dir="rtl" to the body element. The default direction of a web page is LTR.

For web pages written in languages using LTR scripts, it is not necessary to indicate the primary direction of a web page.

Government website templates will be in English, so a more practical approach is to wrap the translated content in an appropriate block level element and apply lang and dir attributes to that block level element.

<div id="translationContent" lang="prs" dir="rtl"></div>

Key resources

The authoring techniques for handling bi-directional text recommend that web developers:

  • Do not use Cascading Style Sheets (CSS) to control directionality. Mark-up should be used instead;
  • Only add bi-directional mark-up to a document when it is needed. The Unicode bi-directional algorithm should be sufficient in most cases; and
  • To change the direction of a block level element, add the dir attribute to that element. The content of all nested block elements will inherit directionality.

It is important to take care with bidirectional nesting. It is common in translations to leave some text in English or include the common English equivalent when the term is translated into the target language. Examples include government department names. Care should be taken to ensure that nested English content within a language written in a Right-to-Left (RTL) script renders correctly.

  • Double check all punctuation is located correctly, especially mirrored punctuation like brackets and parentheses;
  • Phone numbers should be treated explicitly as Left-to-Right (LTR) text; and
  • Background images, and images for list markers should be checked to ensure appropriate placement and orientation within RTL text.

Updated