Friday, April 24, 2020

ICU 67 Released

ICU LogoUnicode® ICU 67 has just been released. ICU 67 updates to CLDR 37 locale data with many additions and corrections. This release also includes the updates to Unicode 13, subsuming the special CLDR 36.1 and ICU 66 releases. ICU 67 includes many bug fixes for date and number formatting, including enhanced support for user preferences in the locale identifier. The LocaleMatcher code and data are improved, and number skeletons have a new “concise” form that can be used in MessageFormat strings.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details, please see

Thursday, April 23, 2020

Unicode Locale Data v37 released!

The final version of Unicode CLDR version 37 is now available. It focuses on adding new locales, enhancing support for units of measurement, adding annotations (names and search keywords) for symbols, and adding annotations for Emoji v13.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and accurately convert input measurement into those units.

Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added. The collation sequences are updated for new Unicode 13.0, and for emoji.

Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanum, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.

New locales. New languages at Basic coverage: Fulah (Adlam), Maithili, Manipuri, Santali, Sindhi (Devanagari), Sundanese. New languages at Modern coverage: Nigerian Pidgin. See Locale Coverage Data for the coverage per locale, for both new and old locales.

Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").

Updates to code sets. In particular, the EU is updated (removing GB).

For more details, access to the data and charts, and important notes for smoothly migrating implementations, see Unicode CLDR Version 37.

Friday, April 10, 2020

Technical Alert: Unicode Technical Website Down

TECHNICAL ALERT: the Unicode Consortium's technical website is hosted in a data center that has experienced a catastrophic failure. We are working to get back online, but this may take a couple weeks. We apologize for the inconvenience. BTW: this failure occurred after we announced we are delaying the release of Unicode 14.0.

Wednesday, April 8, 2020

Unicode 14.0 Delayed for 6 Months

Due to COVID-19, the Unicode Consortium has decided to postpone the release of version 14.0 of the Unicode Standard by 6 months, from March to September of 2021. This delay will also impact related specifications and data, such as new emoji characters.

The Unicode Consortium relies heavily on the efforts of volunteers. “Under the current circumstances we’ve heard that our contributors have a lot on their plates at the moment and decided it was in the best interests of our volunteers and the organizations that depend on the standard to push out our release date,” said Mark Davis, President of the Consortium. “This year we simply can’t commit to the same schedule we’ve adhered to in the past.”

ICU and CLDR to stay on schedule

The two other main Unicode projects, ICU and CLDR, are maintaining their 6-month cycles for releases in the spring and fall, although the feature sets this year may be lighter. The CLDR project supplies language- and locale-specific data and specifications, while the ICU project supplies internationalization code libraries that allow operating systems and applications to use Unicode and CLDR data and specifications. These projects are impacted less by current conditions since they have always operated via virtual meetings and are more compartmentalized, meaning that it is easier to withhold a particular feature if it falls behind schedule without jeopardizing the whole release. Sub-projects of CLDR and ICU, such as the CLDR Message Formatting project, will also be little affected.


This announcement does not affect the new emoji included in Unicode Standard version 13.0 announced on March 10, 2020.

Because of the lead time for developers to incorporate emoji into mobile phones, emoji that are finalized in January don’t appear on phones until the following September or so. For example, the emoji that were included in Release 13.0 in March 2020 won’t generally be on phones until the fall of 2020. With the delay of the release of Unicode 14.0, the deadline for submission of new emoji character proposals for Emoji 14.0 is also being postponed until September 2020.

The Consortium is considering whether it is feasible to release emoji sequences in an Emoji 13.1 release. These sequences make use of existing characters. An example from Emoji 13.0 is the black cat, which is internally a combination of the cat emoji and black large square emoji. Since sequences rely only on combinations of existing characters in the Unicode Standard, they can be implemented on a separate schedule, and don’t require a new version of Unicode or the encoding of new characters. Such an Emoji 13.1 release would be in time for release on mobile phones in 2021.

The Emoji Subcommittee will be accepting new emoji character proposals for Emoji 14.0 from June 15, 2020 until September 1, 2020. Any new emoji characters incorporated into Emoji 14.0 would appear on phones and other devices in 2022.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages