Friday, March 6, 2020

Unicode Locale Data v37α available for testing

The alpha version of Unicode CLDR version 37 is now available for testing. The beta v37 will contain updates to the LDML spec and is planned for March 25, and the release of v37 is planned for April 22.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

v37 is an update release with focus on units and annotations (emoji and symbol names and search keywords).

Expanded locale preferences for units of measurement. The new unit preference and conversion data allows formatting functions to pick the right measurement units for the locale and usage, and convert input measurement into those units. See additional details in Specification Changes.

Emoji 13.0. The emoji annotations (names and search keywords) for the new Unicode 13.0 emoji are added. The collation sequences are updated for new Unicode 13.0, and for emoji.

Annotations (names and keywords) expanded to cover more than emoji. This release includes a small set of Unicode symbols (arrow, math, punctuation, currency, alphanum, and geometric) with more to be added in future releases. For example, see v37/annotations/romance.html.

9 New locales added. Caddo [cad], Hindi in Latin script [hi_Latn], Kashmiri in Devanagari script [ks_Deva], Maithili [mai], Manipuri (Meitei Mayek) [mni_Mtei], Nigerian Pidgin [pcm], Santali [sat], Santali (Devanagari) [sat_Deva], and Sindhi (Devanagari) [sd_Deva]. See Locale Coverage Data for the coverage per locale, for both new and old locales.

Grammatical features added. Grammatical features are added for many languages, a first step to allowing programmers to format units according to grammatical context (eg, the dative version of "3 kilometers").

Updates to code sets. In particular, the EU is updated (removing GB).

For more details and important notes for smoothly migrating implementations, see the draft release note Unicode CLDR Version 37. For access to the data, see the GitHub tag: release-37-alpha2.

Over 130,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages