Appendix 2: Internationalisation and Accessibility

Victorian Government websites must meet WCAG 2.0 (Level AA) requirements. When adding content in community languages it is also necessary to meet accessibility requirements. The obvious accessibility requirements relate to identifying the language of content and change in languages, but there are a number of stumbling blocks in providing accessible content in community languages.

Other core internationalisation best practice, such as the need to correctly select and identify the character encoding used by text, or applying appropriate bidirectional markup and control characters, that affect the readability and comprehension of the text, are assumed but unarticulated in WCAG 2.0.

Legacy and pseudo-Unicode encodings (HTML, MS Word and PDF)

WCAG 2.0 makes an important distinction between text and non-text content. Text is a string of characters in a human language that can be programmatically determined.

For accessible community language content, it is necessary to select and correctly identify the character encoding used within a document. For HTML, it must be an encoding supported by web browsers. The HTML5 Encoding specification identifies which encodings a user agent can support.

If the character encodings are unsupported, or misidentified, the content should be treated as non-text content when assessing the accessibility of web resources.

What this means in practical terms is that translated content, regardless of file formats, should be sourced from language service providers as Unicode text. HTML documents must be in the UTF-8 character encoding.

It is common to receive translated content in certain languages in a non-Unicode character encoding.

For instance, Burmese content is often supplied in the Zawgyi pseudo-Unicode encoding, while Sgaw Karen is often supplied in an unsupported eight bit legacy encoding.

Using non-Unicode content (either legacy or pseudo-Unicode encodings) will often require additional steps to make the content accessible.

MS Word specific considerations

Care needs to be taken with language identification in Microsoft Word documents as some community languages are not supported by Microsoft Office. It may not be possible to correctly tag all translations, thus impacting on the accessibility of the document.

You can use the document properties dialog to set a metadata value identifying the document’s language.

MS Word will automatically assign the default editing language as the document language. If the document is opened on another computer where the MS Word default editing language setting is different, the document language will be changed when the file is saved.

When English content is included in a translation, it is necessary to change the proofing language appropriately. For translations written in scripts that are read from the right to the left of a page, it is necessary to set the direction, not just for paragraphs, but also for sections, columns, tables and text boxes. It is not sufficient to only use text alignment.

PDF specific considerations

ISO 14289-1:2014 and PDF techniques for WCAG 2.0 document requirements and techniques for creating accessible PDF files.

For a PDF file to accessible text, the textual content of the PDF must resolve to Unicode. Software that accesses or displays PDF files, uses the file’s ToUnicode mappings for each font to resolve glyphs to Unicode codepoints. The ability to correctly resolve text in a PDF to a valid sequence of Unicode characters is dependent on the font, its internal mapping of glyphs to codepoints, and also on the nature of the writing system (script) the language is written in. Fonts designed for complex scripts may reorder glyphs and use alternative glyphs in ways that cannot be adequately represented in the ToUnicode mappings.

When the text in the PDF cannot be resolved to a meaningful Unicode sequence the user can understand, first try alternative fonts to see if they provide a better result. Otherwise, it is necessary to treat the content as non-text content and add ActualText attributes to each of the relevant tags.

Website search tools need to use the content of the ActualText attributes for indexing and searching these PDF files, in order to make the content discoverable.


I18n HTML5 WCAG 2.0 Recommendation
Declare character encoding charset attribute on meta element. refer to the definition of text vs non-text content Use Unicode for all text. For HTML documents use the UTF-8 encoding. For PDF files, use ActualText attributes of tags for languages that require it.
Declare language of document lang attribute on root element 3.1.1 Language of page Use a valid and correct BCP-47 language tag to identify the primary language of a document. For MS Word documents ensure the default editing language is set correctly.
Declare change of language lang attribute of relevant element 3.1.2 Language of parts Use valid and correct BCP-47 language tags to identify change of language within a document. For MS Word documents select the appropriate proofing languages for content.
Bidirectional support dir attribute on relevant HTML elements - For HTML5, use markup rather than CSS to handle bidirectional text. Use control characters as required. For other file formats use appropriate techniques available when editing content.