Help:Multilingual support

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Contents on Wikimedia Commons may contain words or texts written in different languages and scripts. To be able to correctly view and edit these articles requires that you have the appropriate fonts installed and to have correctly configured your operating system and browser. This guide will help you to do so.

Overview

[edit]

Unicode

[edit]

Articles on Commons are encoded using Unicode (specifically UTF-8)[1], an industry standard designed to allow text and symbols from all of the writing systems of the world to be consistently represented and manipulated by computers. Because UTF-8 is backwards compatible with ASCII, and most modern browsers have at least basic Unicode support, most users will experience little difficulty reading and editing Wikipedia.

For older browsers, MediaWiki serves the wikitext in a safe mode upon editing. Characters that cannot be represented in ASCII are temporarily converted to hexadecimal character references, looking like ሴ. Existing hexadecimal character references get an additional leading zero so they are not converted to actual characters when the page is saved, and look like ሴ. Likewise, to create a hexadecimal character reference in safe mode, not the character itself, a leading zero should be added. One can check whether safe mode is used by editing this section. If M looks like M rather than M, safe mode is used.

Font

[edit]

Most computers with Microsoft Windows or Microsoft Office will already have several fonts with support for Latin, Greek, Cyrillic, Hebrew, Arabic, Chinese, Japanese, Korean and the International Phonetic Alphabet installed. Several historic and accented characters (used in the transliteration of foreign scripts) are missing, though.

Font Product Scripts
Arial Unicode MS [1]
  • Office 2000 (0.84)
  • Office 2003 (1.01)
  • Standalone US$99 from Ascender[2]
Lucida Sans Unicode [3]
Tahoma [5]
Microsoft Sans Serif [6]
Arial Unicode MS
supports a wide number of scripts, but is of a slightly lower quality than Arial because it lacks kerning and is not smoothed. It contains a small bug which causes double-wide diacritics to be placed on the wrong characters.
Lucida Sans Unicode
has a slightly smaller character repertoire than that of Arial Unicode MS, but is more legible.
Tahoma
has a slightly smaller character repertoire than that of Arial Unicode MS, but is more legible.
Microsoft Sans Serif
has better support for historical and accented Latin characters. (Note that this is a different font than MS Sans Serif, a bitmapped font that shipped with older versions of Windows.)

Other available unicode fonts

[edit]
Font Typeface Sample License Format Encoding
Code2001 0.919 Freeware (must not be altered) Unicode
Code2000 1.171 sans-serif Shareware TrueType Unicode
Everson Mono 3.2b4 monospace Shareware TrueType Unicode
TITUS Cyberbit Basic serif Non-commercial Unicode 4.0
Font Sample License Format Encoding
DejaVu Bitstream Vera Fonts Copyright, Arev Fonts Copyright, Public Domain TrueType Unicode
FreeFonts GPLv3 OpenType Unicode

Browsers

[edit]
Internet Explorer
supports Latin (however not all extended sets), Greek, Cyrillic, Arabic and Hebrew. Support for East Asian and some Indic scripts is available if support for this has been installed for Windows. As Internet Explorer will only use the default font for other scripts, those are usually not supported (unless the default font does).
Firefox
tries to render any character using all the fonts available on the system so multilingual support is generally good. The default rendering engine does not support complex script rendering, however. Some Linux distributions ship with a Pango-based rendering engine which does, this may currently cause some display glitches with justified text, though.
Opera
tries to render any character using all the fonts available on the system so multilingual support is also good.[2] Opera uses the operating system to perform contextual glyph selection, ligature forming, character stacking, combining character support and other character shaping tasks.[3]
Chrome
Renders many, but not all characters... Does not render Oriya, Sinhala and Tibetan scripts from examples below, while Firefox doesn't render Sinhala only.

Scripts

[edit]

East Asian

[edit]
Main article: Help:Multilingual support (East Asian)
Script Correct rendering Your computer
Traditional Chinese
人人生來自由,

在尊嚴和權利上一律平等。
他們有理性和良心,

請以手足關係的精神相對待。
Simplified Chinese
人人生来自由,

在尊严和权利上一律平等。
他们有理性和良心,

请以手足关系的精神相对待。
Japanese
すべての人間は、生まれながらにして自由であり、

かつ、尊厳と権利と について平等である。
人間は、理性と良心とを授けられており、

互いに同胞の精神をもって行動しなければならない。
Korean
모든 인간은 태어날 때부터

자유로우며 그 존엄과 권리에
있어 동등하다. 인간은 천부적으로
이성과 양심을 부여받았으며 서로

형제애의 정신으로 행동하여야 한다.
Vietnamese Nôm
畧畑䀡傳西銘

𡄎唭𠄩𡦂人情喓𠻗
埃匕𠳺匕麻𦖑

𡨺噒役畧苓𠽮身𡢐

Ethiopic

[edit]
Main article: Help:Multilingual support (Ethiopic)

The Ethiopic syllabary is used in central east Africa for Amharic, Bilen, Oromo, Tigré, Tigrinya, and other languages. It evolved from the script for classical Ge'ez, which is now strictly a liturgical language.

Font Sample License Format Encoding
Abyssinica SIL OFL OpenType, AAT and Graphite Unicode 4.1 + SIL PUA
Code2000 1.16 Shareware TrueType Unicode
Ethiopia Jiret GPL2 Unicode 3.0
Everson Mono Shareware TrueType Unicode
GF Zemen Unicode GPL2 TrueType Unicode
TITUS Cyberbit Non-commercial Unicode 4.0
GNU FreeFont GPLv3 TrueType and OpenType Unicode

Indic

[edit]
Main article: Help:Multilingual support (Indic)

The following table compares how a correctly enabled computer would render the following scripts with how your computer renders them:

Script Correct rendering Your computer
Bengali
ক + ি
কি
Devanāgarī
क + ि
कि
Gujarati
ક + િ
કિ
Gurmukhī
ਕ + ਿ
ਕਿ
Kannada
ಕ + ಿ
ಕಿ
Malayalam
ക + െ
കെ
Odia
କ + େ
କେ
Sinhala
ඵ + ේ
ඵේ
Tibetan
ར + ྐ + ྱ
རྐྱ
Tamil
க + ே
கே
Telugu
య + ీ
యీ

Burma

[edit]
Main article: Help:Multilingual support (Burmese)

Fonts

[edit]
Font License Unicode OpenType AAT Graphite
Padauk 2.4 OFL × × ×
Parabaik OFL, GPL × ×
Parabaik Sans OFL, GPL × ×
Myanmar3 LGPL × ×
Myanmar2 LGPL × ×

Coptic

[edit]

Special cases

[edit]

Esperanto

[edit]
in edit box in database and output
S S
Sx Ŝ
Sxx Sx
Sxxx Ŝx
Sxxxx Sxx
Sxxxxx Ŝxx

Mediawiki installations configured for Esperanto use UTF-8 for storage and display. However when editing the text is converted to a form that is designed to be easier to edit with a standard keyboard.

The characters for which this applies are: Ĉ, Ĝ, Ĥ, Ĵ, Ŝ, Ŭ, ĉ, ĝ, ĥ, ĵ, ŝ, ŭ. you may enter these directly in the edit box if you have the facilities to do so. However when you edit the page again you will see them encoded as Sx. This form is referred to as "x-sistemo" or "x-kodo". In order to preserve round trip capability when one or more x's follow these characters or their non-accented forms (A, G, H, J, S, U, c, g, h, j, s, u), the number of x's in the edit box is double the number in the actual stored article text.

For example, the interlanguage link [[w:en:Luxury car|en:Luxury car]] to :en:Luxury car has to be entered in the edit box as [[w:en:Luxxury car|en:Luxxury car]] on :eo:. This has caused problems with interwiki update bots in the past.

Romanian

[edit]

The Romanian alphabet contains an S-comma (Ș ș) and T-comma (Ț ț). These characters were added to Unicode 3.0 at the request of the Romanian standardization institute. Font support for these characters is poor, so the Romanian Wikipedia represents these letters with an S-cedilla (Ş ş) and T-cedilla (Ţ ţ) instead.[4]

See also

[edit]

Notes

[edit]
  1. Until June 2005, when MediaWiki 1.5 came into use on the Wikimedia projects, articles on the English Wikipedia were encoded using ISO/IEC 8859-1 (although the additional characters from the Windows-1252 character set were used in practice.) All characters from the ISO/IEC 10646 Universal Character Set could be accessed through numerical entities, as specified by the HTML 4.01 specification. Since, nearly all pages have been converted to use Unicode directly.
  2. http://www.opera.com/support/kb/view/435/
  3. http://www.opera.com/docs/specs/#text
  4. See also :ro:Wikipedia:Diacritice
[edit]