39 Facts About Unicode

1.

Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

FactSnippet No. 518,294
2.

Unicode can be stored using several different encodings, which translate the character codes into sequences of bytes.

FactSnippet No. 518,295
3.

The Unicode standard defines three and several other encodings exist, all in practice variable-width encodings.

FactSnippet No. 518,296
4.

In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—for each character.

FactSnippet No. 518,297
5.

In other words, Unicode represents a character in an abstract way and leaves the visual rendering to other software, such as a web browser or word processor.

FactSnippet No. 518,298
6.

Unicode explained that "the name 'Unicode' is intended to suggest a unique, unified, universal encoding".

FactSnippet No. 518,299
7.

Unicode is intended to address the need for a workable, reliable world text encoding.

FactSnippet No. 518,300
8.

Unicode gives higher priority to ensuring utility for the future than to preserving past antiquities.

FactSnippet No. 518,301
9.

In early 1989, the Unicode working group expanded to include Ken Whistler and Mike Kernaghan of Metaphor, Karen Smith-Yoshimura and Joan Aliprand of RLG, and Glenn Wright of Sun Microsystems, and in 1990, Michel Suignard and Asmus Freytag from Microsoft and Rick McGowan of NeXT joined the group.

FactSnippet No. 518,302
10.

Unicode Consortium was incorporated in California on 3 January 1991, and in October 1991, the first volume of the Unicode standard was published.

FactSnippet No. 518,303
11.

Unicode Consortium is a nonprofit organization that coordinates Unicode's development.

FactSnippet No. 518,304
12.

Unicode currently covers most major writing systems in use today.

FactSnippet No. 518,305
13.

However, the Unicode versions do differ from their ISO equivalents in two significant ways.

FactSnippet No. 518,306
14.

Thus, The Unicode Standard includes more information, covering—in depth—topics such as bitwise encoding, collation and rendering.

FactSnippet No. 518,307
15.

Previously, The Unicode Standard was sold as a print volume containing the complete core specification, standard annexes, and code charts.

FactSnippet No. 518,308
16.

The Unicode Standard has regularly released annual expanded versions, occasionally with more than one version released in a calendar year and with rare cases where the scheduled release had to be postponed.

FactSnippet No. 518,309
17.

Unicode codespace is divided into seventeen planes, numbered 0 to 16:.

FactSnippet No. 518,310
18.

Graphic characters are characters defined by Unicode to have particular semantics, and either have a visible glyph shape or represent a visible space.

FactSnippet No. 518,311
19.

Unicode maintains a list of uniquely named character sequences for abstract characters that are not directly encoded in Unicode.

FactSnippet No. 518,312
20.

Unicode includes a mechanism for modifying characters that greatly extends the supported glyph repertoire.

FactSnippet No. 518,313
21.

Unicode provides a mechanism for composing Hangul syllables with their individual subcomponents, known as Hangul Jamo.

FactSnippet No. 518,314
22.

Several subsets of Unicode are standardized: Microsoft Windows since Windows NT 4.

FactSnippet No. 518,315
23.

Unicode defines two mapping methods: the Unicode Transformation Format encodings, and the Universal Coded Character Set (UCS) encodings.

FactSnippet No. 518,316
24.

The Unicode Standard allows that the BOM "can serve as signature for UTF-8 encoded text where the character set is unmarked".

FactSnippet No. 518,317
25.

Unicode has become the dominant scheme for internal processing and storage of text.

FactSnippet No. 518,318
26.

Partial support for Unicode can be installed on Windows 9x through the Microsoft Layer for Unicode.

FactSnippet No. 518,319
27.

UTF-8 is the most common Unicode encoding used in HTML documents on the World Wide Web.

FactSnippet No. 518,320
28.

Multilingual text-rendering engines which use Unicode include Uniscribe and DirectWrite for Microsoft Windows, ATSUI and Core Text for macOS, and Pango for GTK+ and the GNOME desktop.

FactSnippet No. 518,321
29.

Unicode is not in principle concerned with fonts per se, seeing them as implementation choices.

FactSnippet No. 518,322
30.

Instead, Unicode-based fonts typically focus on supporting only basic ASCII and particular scripts or sets of characters or symbols.

FactSnippet No. 518,323
31.

Unicode defines a large number of characters that conforming applications should recognize as line terminators.

FactSnippet No. 518,324
32.

In doing so, Unicode does provide a way around the historical platform dependent solutions.

FactSnippet No. 518,325
33.

Nonetheless, few if any Unicode solutions have adopted these Unicode line and paragraph separators as the sole canonical line ending characters.

FactSnippet No. 518,326
34.

Unicode has been criticized for failing to separately encode older and alternative forms of kanji which, critics argue, complicates the processing of ancient Japanese and uncommon Japanese names.

FactSnippet No. 518,327
35.

Unicode was designed to provide code-point-by-code-point round-trip format conversion to and from any preexisting character encodings, so that text files in older character sets can be converted to Unicode and then back and get back the same file, without employing context-dependent interpretation.

FactSnippet No. 518,328
36.

That has meant that inconsistent legacy architectures, such as combining diacritics and precomposed characters, both exist in Unicode, giving more than one method of representing some text.

FactSnippet No. 518,329
37.

Injective mappings must be provided between characters in existing legacy character sets and characters in Unicode to facilitate conversion to Unicode and allow interoperability with legacy software.

FactSnippet No. 518,330
38.

Encoding of any new ligatures in Unicode will not happen, in part because the set of ligatures is font-dependent, and Unicode is an encoding independent of font variations.

FactSnippet No. 518,331
39.

Unicode has a large number of homoglyphs, many of which look very similar or identical to ASCII letters.

FactSnippet No. 518,332