Victorian Government websites must meet WCAG 2.0 (Level AA) requirements. When adding content in community languages it is also necessary to meet accessibility requirements. The obvious accessibility requirements relate to identifying the language of content and change in languages, but there are a number of stumbling blocks in providing accessible content in community languages.
Other core internationalisation best practice, such as the need to correctly select and identify the character encoding used by text, or applying appropriate bidirectional markup and control characters, that affect the readability and comprehension of the text, are assumed but unarticulated in WCAG 2.0.
Legacy and pseudo-Unicode encodings (HTML, MS Word and PDF)
For accessible community language content, it is necessary to select and correctly identify the character encoding used within a document. For HTML, it must be an encoding supported by web browsers. The HTML5 specification identifies which encodings a user agent can support.
If the character encodings are unsupported, or misidentified, the content should be treated as non-text content when assessing the accessibility of web resources.
What this means in practical terms is that translated content, regardless of file formats, should be sourced from language service providers as Unicode text. HTML documents must be in the UTF-8 character encoding.
It is common to receive translated content in certain languages in a non-Unicode character encoding.
For instance, Burmese content is often supplied in the Zawgyi pseudo-Unicode encoding, while Sgaw Karen is often supplied in an unsupported eight bit legacy encoding.
Using non-Unicode content (either legacy or pseudo-Unicode encodings) will often require additional steps to make the content accessible.
MS Word specific considerations
Care needs to be taken with language identification in Microsoft Word documents as some community languages are not supported by Microsoft Office. It may not be possible to correctly tag all translations, thus impacting on the accessibility of the document.
You can use the document properties dialog to set a metadata value identifying the document’s language.
MS Word will automatically assign the default editing language as the document language. If the document is opened on another computer where the MS Word default editing language setting is different, the document language will be changed when the file is saved.
When English content is included in a translation, it is necessary to change the proofing language appropriately. For translations written in scripts that are read from the right to the left of a page, it is necessary to set the direction, not just for paragraphs, but also for sections, columns, tables and text boxes. It is not sufficient to only use text alignment.
PDF specific considerations
For a PDF file to accessible text, the textual content of the PDF must resolve to Unicode. Software that accesses or displays PDF files, uses the file’s ToUnicode mappings for each font to resolve glyphs to Unicode codepoints. The ability to correctly resolve text in a PDF to a valid sequence of Unicode characters is dependent on the font, its internal mapping of glyphs to codepoints, and also on the nature of the writing system (script) the language is written in. Fonts designed for complex scripts may reorder glyphs and use alternative glyphs in ways that cannot be adequately represented in the ToUnicode mappings.
When the text in the PDF cannot be resolved to a meaningful Unicode sequence the user can understand, first try alternative fonts to see if they provide a better result. Otherwise, it is necessary to treat the content as non-text content and add ActualText attributes to each of the relevant tags.
Website search tools need to use the content of the ActualText attributes for indexing and searching these PDF files, in order to make the content discoverable.
Reviewed 19 August 2019