Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
FactSnippet No. 518,294 |
Unicode, formally The Unicode Standard is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems.
FactSnippet No. 518,294 |
Unicode can be stored using several different encodings, which translate the character codes into sequences of bytes.
FactSnippet No. 518,295 |
The Unicode standard defines three and several other encodings exist, all in practice variable-width encodings.
FactSnippet No. 518,296 |
In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—for each character.
FactSnippet No. 518,297 |
In other words, Unicode represents a character in an abstract way and leaves the visual rendering to other software, such as a web browser or word processor.
FactSnippet No. 518,298 |
Unicode explained that "the name 'Unicode' is intended to suggest a unique, unified, universal encoding".
FactSnippet No. 518,299 |
Unicode is intended to address the need for a workable, reliable world text encoding.
FactSnippet No. 518,300 |
Unicode gives higher priority to ensuring utility for the future than to preserving past antiquities.
FactSnippet No. 518,301 |
In early 1989, the Unicode working group expanded to include Ken Whistler and Mike Kernaghan of Metaphor, Karen Smith-Yoshimura and Joan Aliprand of RLG, and Glenn Wright of Sun Microsystems, and in 1990, Michel Suignard and Asmus Freytag from Microsoft and Rick McGowan of NeXT joined the group.
FactSnippet No. 518,302 |
Unicode Consortium was incorporated in California on 3 January 1991, and in October 1991, the first volume of the Unicode standard was published.
FactSnippet No. 518,303 |
Unicode Consortium is a nonprofit organization that coordinates Unicode's development.
FactSnippet No. 518,304 |
Unicode currently covers most major writing systems in use today.
FactSnippet No. 518,305 |
However, the Unicode versions do differ from their ISO equivalents in two significant ways.
FactSnippet No. 518,306 |
Thus, The Unicode Standard includes more information, covering—in depth—topics such as bitwise encoding, collation and rendering.
FactSnippet No. 518,307 |
Previously, The Unicode Standard was sold as a print volume containing the complete core specification, standard annexes, and code charts.
FactSnippet No. 518,308 |
The Unicode Standard has regularly released annual expanded versions, occasionally with more than one version released in a calendar year and with rare cases where the scheduled release had to be postponed.
FactSnippet No. 518,309 |
Unicode codespace is divided into seventeen planes, numbered 0 to 16:.
FactSnippet No. 518,310 |
Graphic characters are characters defined by Unicode to have particular semantics, and either have a visible glyph shape or represent a visible space.
FactSnippet No. 518,311 |
Unicode maintains a list of uniquely named character sequences for abstract characters that are not directly encoded in Unicode.
FactSnippet No. 518,312 |
Unicode includes a mechanism for modifying characters that greatly extends the supported glyph repertoire.
FactSnippet No. 518,313 |
Unicode provides a mechanism for composing Hangul syllables with their individual subcomponents, known as Hangul Jamo.
FactSnippet No. 518,314 |
Several subsets of Unicode are standardized: Microsoft Windows since Windows NT 4.
FactSnippet No. 518,315 |
Unicode defines two mapping methods: the Unicode Transformation Format encodings, and the Universal Coded Character Set (UCS) encodings.
FactSnippet No. 518,316 |
The Unicode Standard allows that the BOM "can serve as signature for UTF-8 encoded text where the character set is unmarked".
FactSnippet No. 518,317 |
Unicode has become the dominant scheme for internal processing and storage of text.
FactSnippet No. 518,318 |
Partial support for Unicode can be installed on Windows 9x through the Microsoft Layer for Unicode.
FactSnippet No. 518,319 |
UTF-8 is the most common Unicode encoding used in HTML documents on the World Wide Web.
FactSnippet No. 518,320 |
Multilingual text-rendering engines which use Unicode include Uniscribe and DirectWrite for Microsoft Windows, ATSUI and Core Text for macOS, and Pango for GTK+ and the GNOME desktop.
FactSnippet No. 518,321 |
Unicode is not in principle concerned with fonts per se, seeing them as implementation choices.
FactSnippet No. 518,322 |
Instead, Unicode-based fonts typically focus on supporting only basic ASCII and particular scripts or sets of characters or symbols.
FactSnippet No. 518,323 |
Unicode defines a large number of characters that conforming applications should recognize as line terminators.
FactSnippet No. 518,324 |
In doing so, Unicode does provide a way around the historical platform dependent solutions.
FactSnippet No. 518,325 |
Nonetheless, few if any Unicode solutions have adopted these Unicode line and paragraph separators as the sole canonical line ending characters.
FactSnippet No. 518,326 |
Unicode has been criticized for failing to separately encode older and alternative forms of kanji which, critics argue, complicates the processing of ancient Japanese and uncommon Japanese names.
FactSnippet No. 518,327 |
Unicode was designed to provide code-point-by-code-point round-trip format conversion to and from any preexisting character encodings, so that text files in older character sets can be converted to Unicode and then back and get back the same file, without employing context-dependent interpretation.
FactSnippet No. 518,328 |
That has meant that inconsistent legacy architectures, such as combining diacritics and precomposed characters, both exist in Unicode, giving more than one method of representing some text.
FactSnippet No. 518,329 |
Injective mappings must be provided between characters in existing legacy character sets and characters in Unicode to facilitate conversion to Unicode and allow interoperability with legacy software.
FactSnippet No. 518,330 |
Encoding of any new ligatures in Unicode will not happen, in part because the set of ligatures is font-dependent, and Unicode is an encoding independent of font variations.
FactSnippet No. 518,331 |
Unicode has a large number of homoglyphs, many of which look very similar or identical to ASCII letters.
FactSnippet No. 518,332 |