Thursday, April 23, 2020

Unicode Locale Data v37 released!

The final version of Unicode CLDR version 37 is now available. It focuses on adding new locales, enhancing support for units of measurement, adding annotations (names and search keywords) for symbols, and adding annotations for Emoji v13.


Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and accurately convert input measurement into those units.

Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added. The collation sequences are updated for new Unicode 13.0, and for emoji.

Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanum, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.

New locales. New languages at Basic coverage: Fulah (Adlam), Maithili, Manipuri, Santali, Sindhi (Devanagari), Sundanese. New languages at Modern coverage: Nigerian Pidgin. See Locale Coverage Data for the coverage per locale, for both new and old locales.

Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").

Updates to code sets. In particular, the EU is updated (removing GB).

For more details, access to the data and charts, and important notes for smoothly migrating implementations, see Unicode CLDR Version 37.