Wednesday, February 23, 2022

Unicode CLDR v41 Alpha available for testing

[beta image] The Unicode CLDR v41 Alpha is now available for testing. The alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data. Feedback can be filed at CLDR Tickets.

Alpha means that the main data and charts are available for review, but the specification, JSON data, and other components are not yet ready for review. Some data may change if showstopper bugs are found. The planned schedule is:
  • Mar 09 — Beta (data)
  • Mar 23 — Beta2 (spec)
  • Apr 06 — Release
CLDR v41 is a limited-submission release. Most work was on tooling, with only specified updates to the data, namely Phase 3 of the grammatical units of measurement project. The required grammar data for the Modern coverage level increased, with 40 locales adding an average of 4% new data each. Ukrainian grew the most, by 15.6%.

The tooling changes are targeted at the v42 general submission release. They include a number of features and improvements such as progress meter widgets in the Survey Tool.

Finally, the Basic level has been modified to make it easier to onboard new languages, and easier for implementations to filter locale data based on coverage levels.

The following table shows the number of Languages/Locales in this version. (See the v41 Locale Coverage table for more information.)

Level Languages  Locales  Notes
Modern 89 361 Suitable for full UI internationalization
Moderate 13 32 Suitable for full “document content” internationalization, such as formats in a spreadsheet.
Basic 22 21 Suitable for locale selection, such as choice of language in mobile phone settings.
Total 124 414 Total of all languages/locales with ≥ Basic coverage.

Beyond the member organizations of the Unicode Consortium, many dedicated communities and individuals regularly contribute to updating their locales, including:
  • Modern: Cherokee, Cantonese, Scottish Gaelic, Sorbian (Lower), Sorbian (Upper)
  • Moderate: Asturian [nearly Modern], Breton, Faroese, Fulah (Adlam), Kaingang, Nheengatu, Quechua, Sardinian
  • Basic: Bosnian (Cyrillic), Interlingua, Kabuverdianu, Māori, Romansh, Tajik, Tatar, Tongan, Uzbek (Cyrillic), Wolof
Unicode CLDR provides key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]