Thursday, October 28, 2021

Unicode CLDR v40 now available!

[nest image] Unicode CLDR version 40 is now available, with approximately 140,000 new or modified data fields.

In this release, the focus is on:

Grammatical features (gender and case)

In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours". The overall goal for CLDR is to supply building blocks so that implementations of advanced message formatting can handle gender and case.
  • Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv) for all units of measurement.
  • Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a more restricted number of units.
  • Phase 3 (v41) will further expand the units.

Emoji v14 names and search keywords

CLDR supplies short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards.

Modernized Survey Tool front end

The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure was modernized to make it easier to add enhancements (such as the split-screen dashboard) and to fix bugs.

Specification Improvements

The LDML specification has some important fixes and clarifications for Locale Identifiers, Dates, and Units of Measurement.

Please see the CLDR v40 Release Note for details, including:

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages