39 Facts About Unicode


Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.

FactSnippet No. 518,294

Unicode can be stored using several different encodings, which translate the character codes into sequences of bytes.

FactSnippet No. 518,295

The Unicode standard defines three and several other encodings exist, all in practice variable-width encodings.

FactSnippet No. 518,296

In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—for each character.

FactSnippet No. 518,297

In other words, Unicode represents a character in an abstract way and leaves the visual rendering to other software, such as a web browser or word processor.

FactSnippet No. 518,298

Unicode explained that "the name 'Unicode' is intended to suggest a unique, unified, universal encoding".

FactSnippet No. 518,299

Unicode is intended to address the need for a workable, reliable world text encoding.

FactSnippet No. 518,300

Unicode gives higher priority to ensuring utility for the future than to preserving past antiquities.

FactSnippet No. 518,301

In early 1989, the Unicode working group expanded to include Ken Whistler and Mike Kernaghan of Metaphor, Karen Smith-Yoshimura and Joan Aliprand of RLG, and Glenn Wright of Sun Microsystems, and in 1990, Michel Suignard and Asmus Freytag from Microsoft and Rick McGowan of NeXT joined the group.

FactSnippet No. 518,302

Unicode Consortium was incorporated in California on 3 January 1991, and in October 1991, the first volume of the Unicode standard was published.

FactSnippet No. 518,303

Unicode Consortium is a nonprofit organization that coordinates Unicode's development.

FactSnippet No. 518,304

Unicode currently covers most major writing systems in use today.

FactSnippet No. 518,305

However, the Unicode versions do differ from their ISO equivalents in two significant ways.

FactSnippet No. 518,306

Thus, The Unicode Standard includes more information, covering—in depth—topics such as bitwise encoding, collation and rendering.

FactSnippet No. 518,307

Previously, The Unicode Standard was sold as a print volume containing the complete core specification, standard annexes, and code charts.

FactSnippet No. 518,308

The Unicode Standard has regularly released annual expanded versions, occasionally with more than one version released in a calendar year and with rare cases where the scheduled release had to be postponed.

FactSnippet No. 518,309

Unicode codespace is divided into seventeen planes, numbered 0 to 16:.

FactSnippet No. 518,310

Graphic characters are characters defined by Unicode to have particular semantics, and either have a visible glyph shape or represent a visible space.

FactSnippet No. 518,311

Unicode maintains a list of uniquely named character sequences for abstract characters that are not directly encoded in Unicode.

FactSnippet No. 518,312

Unicode includes a mechanism for modifying characters that greatly extends the supported glyph repertoire.

FactSnippet No. 518,313

Unicode provides a mechanism for composing Hangul syllables with their individual subcomponents, known as Hangul Jamo.

FactSnippet No. 518,314

Several subsets of Unicode are standardized: Microsoft Windows since Windows NT 4.

FactSnippet No. 518,315

Unicode defines two mapping methods: the Unicode Transformation Format encodings, and the Universal Coded Character Set (UCS) encodings.

FactSnippet No. 518,316

The Unicode Standard allows that the BOM "can serve as signature for UTF-8 encoded text where the character set is unmarked".

FactSnippet No. 518,317

Unicode has become the dominant scheme for internal processing and storage of text.

FactSnippet No. 518,318

Partial support for Unicode can be installed on Windows 9x through the Microsoft Layer for Unicode.

FactSnippet No. 518,319

UTF-8 is the most common Unicode encoding used in HTML documents on the World Wide Web.

FactSnippet No. 518,320

Multilingual text-rendering engines which use Unicode include Uniscribe and DirectWrite for Microsoft Windows, ATSUI and Core Text for macOS, and Pango for GTK+ and the GNOME desktop.

FactSnippet No. 518,321

Unicode is not in principle concerned with fonts per se, seeing them as implementation choices.

FactSnippet No. 518,322

Instead, Unicode-based fonts typically focus on supporting only basic ASCII and particular scripts or sets of characters or symbols.

FactSnippet No. 518,323

Unicode defines a large number of characters that conforming applications should recognize as line terminators.

FactSnippet No. 518,324

In doing so, Unicode does provide a way around the historical platform dependent solutions.

FactSnippet No. 518,325

Nonetheless, few if any Unicode solutions have adopted these Unicode line and paragraph separators as the sole canonical line ending characters.

FactSnippet No. 518,326

Unicode has been criticized for failing to separately encode older and alternative forms of kanji which, critics argue, complicates the processing of ancient Japanese and uncommon Japanese names.

FactSnippet No. 518,327

Unicode was designed to provide code-point-by-code-point round-trip format conversion to and from any preexisting character encodings, so that text files in older character sets can be converted to Unicode and then back and get back the same file, without employing context-dependent interpretation.

FactSnippet No. 518,328

That has meant that inconsistent legacy architectures, such as combining diacritics and precomposed characters, both exist in Unicode, giving more than one method of representing some text.

FactSnippet No. 518,329

Injective mappings must be provided between characters in existing legacy character sets and characters in Unicode to facilitate conversion to Unicode and allow interoperability with legacy software.

FactSnippet No. 518,330

Encoding of any new ligatures in Unicode will not happen, in part because the set of ligatures is font-dependent, and Unicode is an encoding independent of font variations.

FactSnippet No. 518,331

Unicode has a large number of homoglyphs, many of which look very similar or identical to ASCII letters.

FactSnippet No. 518,332