Wednesday, March 11, 2020

ICU 66 Released

ICU LogoUnicode® ICU 66 has been released. It updates to Unicode 13, including new characters, scripts, emoji, and corresponding API constants. It also updates to CLDR 36.1 with Unicode 13 updates and bug fixes.

These new, extra Q1 releases are for integration by vendors that could not otherwise release their products with the newest version of Unicode. These are low-impact releases with no other significant feature additions or implementation changes. The next feature releases will be CLDR 37 and ICU 67, scheduled for 2020 April.

For details please see http://site.icu-project.org/download/66.

Tuesday, March 10, 2020

Announcing The Unicode® Standard, Version 13.0

[chart image] Version 13.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 5,390 characters, for a total of 143,859 characters. These additions include four new scripts, for a total of 154 scripts, as well as 55 new emoji characters.

The new scripts and characters in Version 13.0 add support for modern language groups in Africa, Pakistan, South Asia, and China:
  • Arabic script additions to write Hausa, Wolof, and other languages in Africa, and other additions to write Hindko and Punjabi in Pakistan
  • A character for Syloti Nagri in South Asia
  • Bopomofo additions for Cantonese
Support for scholarly work was extended worldwide, including:
  • Yezidi, historically used in Iraq and Georgia for liturgical purposes, with some modern revival of usage
  • Chorasmian, historically used in Central Asia across Uzbekistan, Kazakhstan, and Turkmenistan to write an extinct Eastern Iranian language
  • Dives Akuru, historically used in the Maldives until the 20th century
  • Khitan Small Script, historically used in northern China
Popular symbol additions include:
  • 55 emoji characters, including several new emoji for smileys, gender neutral people, animals, and the potted plant. For the full list of new emoji characters, see emoji additions for Unicode 13.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.
  • Six Creative Commons license symbols that are used to describe functions, permissions, and concepts related to intellectual property that have widespread use on the web
  • Two Vietnamese reading marks that mark ideographs as having a distinct, colloquial reading
  • 214 graphic characters that provide compatibility with various home computers from the mid-1970s to the mid-1980s and with early teletext broadcasting standards
Support for Chinese, Japanese, and Korean (CJK) unified ideographs was enhanced in Version 13.0 by the addition of 4,939 characters in Extension G, which is the first block to be encoded in Plane 3, as well as by significant corrections and improvements to the Unihan database. Changes to Unihan include updated regular expressions for many properties, the addition of several new properties, and the removal of three obsolete provisional properties. See UAX #38, Unicode Han Database (Unihan) for more information on the updates.

Important chart font updates, including:
  • An update to the code charts for the Adlam script, now using the Ebrima font. That font has an improved design and has gained widespread acceptance in the user community.
  • A completely updated font for the CJK Radicals Supplement and the Kangxi Radicals blocks. This font is also used to show the radicals in the CJK unified ideographs code charts, as well as in the radical-stroke indexes.
Additional support for lesser-used languages and scholarly work was extended, including:
  • A character used in Sinhala to write Sanskrit
Unicode properties and specifications determine the behavior of text on computers and phones. Changes in Version 13.0 include the following Unicode Standard Annexes and Technical Standards that have notable modifications:

Five important Unicode annexes updated for Version 13.0:
Three important Unicode specifications updated for Version 13.0:
The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Friday, March 6, 2020

Unicode Locale Data v37α available for testing

The alpha version of Unicode CLDR version 37 is now available for testing. The beta v37 will contain updates to the LDML spec and is planned for March 25, and the release of v37 is planned for April 22.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

v37 is an update release with focus on units and annotations (emoji and symbol names and search keywords).

Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units. See additional details in Specification Changes.

Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added. The collation sequences are updated for new Unicode 13.0, and for emoji.

Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanum, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.

9 New locales added. Caddo [cad], Hindi in Latin script [hi_Latn], Kashmiri in Devanagari script [ks_Deva], Maithili [mai], Manipuri (Meitei Mayek) [mni_Mtei], Nigerian Pidgin [pcm], Santali [sat], Santali (Devanagari) [sat_Deva], and Sindhi (Devanagari) [sd_Deva]. See Locale Coverage Data for the coverage per locale, for both new and old locales.

Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").

Updates to code sets. In particular, the EU is updated (removing GB).

For more details and important notes for smoothly migrating implementations, see the draft release note Unicode CLDR Version 37. For access to the data, see the GitHub tag: release-37-alpha2.


Over 130,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, February 18, 2020

Unicode Consortium Announces Version 13.0 Cover Design

The Unicode Consortium is pleased to announce the new design selected for the cover of the forthcoming print-on-demand publication of The Unicode Standard, Version 13.0. The Unicode Consortium issued an open call for artists and designers to submit cover design proposals. All submitted designs were reviewed by an independent panel.

Unicode 13.0 Book Cover Concept
The selected cover artwork is an original design by Huijun Shan, an award-winning senior UI designer, who has a B.S. in Communication Engineering from Nanjing University. The design was inspired by building blocks for children using letters put together with a scientific color scheme.

Two runner-up designs by Du Lilyu and Saagar Setu were also selected. Lilyu's design cleverly incorporates the Unicode logo into the version number, while Setu's design signifies the endless running for the next Unicode release. Du Lilyu is a graphic designer in China, while Saagar Setu is a Unicode enthusiast based in Ahmedabad, India.

Du Lilyu:
[art by Du Lilyu]
Saagar Setu:
[art by Saagar Setu]

Over 130,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, January 29, 2020

Unicode Emoji 13.0 — Now final for 2020

The Emoji 13.0 are now final, with 62 new emoji such as:

smiling face with tear
Smiling face
with tear
polar bear
Polar bear
 
bubble tea
Bubble tea
pickup truck
Pickup truck
fondue
Fondue
teapot
Teapot
piñata
Piñata
transgender flag
Transgender flag
There are also 55 gender and skin-tone variants, including new gender-inclusive emoji. See the seven cases in boxes below:
gender inclusive images
The new emoji are listed in Emoji Recently Added v13.0, with sample images. These images are just samples: vendors for mobile phones, PCs, and web platforms will typically use different images. In particular, the Emoji Ordering v13.0 chart shows how the new emoji sort compared to the others, with new emoji marked with rounded-rectangles. The other Emoji Charts for Version 13.0. have been updated to show the emoji.

The new emoji typically start showing up on mobile phones in September/October — some platforms may release them earlier. The new emoji will soon be available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

For implementers:
  1. The Emoji 13.0 test file (emoji-test.txt) provides data for vendors to begin working on their emoji fonts and code ahead of the release of Unicode 13.0, scheduled for March 10.
  2. The emoji specification (UTS #51) will have additional guidelines on gender and skin tone, and other clarifications. The definitions in UTS #51 and data files have been enhanced to be more consistent and useful. The final text will be available on March 10.
  3. The CLDR names and search keywords for the new emoji in over 80 languages, and the sort order for emoji, will be finalized by mid-April with the release of CLDR v37.

Over 130,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]