Wednesday, October 28, 2020

Unicode CLDR Language Data v38 released

[CLDR v38 image] The final release of Unicode CLDR version 38 is now available. Unicode CLDR provides an update to the key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 focused on enhancing the support for existing locales: Support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for many more non-emoji symbols (~400), plus for Emoji v13.1. In this version, there is also substantially higher coverage for (in order of completeness): Norwegian Nynorsk, Hausa, Igbo, Breton, Quechua, Yoruba, Fulah (Adlam script), Chakma, Asturian, Sanskrit, and Dogri.

The Survey Tool has improvements in performance, and introduced structured forum requests to improve coordination among translators. We would like to thank the 393 language experts who contributed to this release.

There are some changes that affect existing specifications and data: for example, the plural rules for French changed to add a new category; the specification for using aliases is more rigorous, and some alias data has changed — along with the specification for handling locale identifier canonicalization. For more information, see Migration.

The overall changes to the data items were:
Added Deleted Changed
155,131 33,805 45,895

See additional details in the CLDR v38 Release note.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages