Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.
CLDR v38 includes:
- Enhancements to existing locale data:
adding support for units of measurement in inflected languages (phase 1),
adding annotations (names and search keywords) for Unicode symbols that are
non-emoji (~400), and annotations for Emoji v13.1.
- Survey Tool upgrades: substantial
performance improvements, plus structured forum entries to improve
coordination among translators.
- To make the canonicalization of locale identifiers clear and unambiguous, provided major restructuring of the specification for it. (This was done in concert with fixes to the alias data to work better with the specification.)
- To support inflected units of measurement:
- minimalPairs adds new elements
caseMinimalPairs and genderMinimalPairs - unit adds a new element gender
- grammaticalData adds new elements
grammaticalDerivations, deriveCompound, and deriveComponent - unitPattern adds a new attribute case
- grammaticalCase, grammaticalGender, grammaticalDefiniteness add a new attribute scope
- compoundUnitPattern1 adds new attributes case and gender
- compoundUnitPattern adds a new attribute case
- minimalPairs adds new elements
- To allow for overriding dictionary-based segmentation breaks, added the Unicode Dictionary Break Exclusion Identifier, with the new key “dx”.
- For picking the correct units of measurement for locales, defined the userPreferences skeleton more precisely.
- For accurate plural categories in compact numbers, added the 'c' operand to plural rules to provide formatting for languages such as French.
The overall changes to the data items were:
Added | Deleted | Changed | Total |
155,131 | 33,805 | 45,895 | 2,175,821 |
Over 140,000 characters are available for adoption
to help the Unicode Consortium’s work on digitally disadvantaged languages