Wednesday, March 27, 2019

Unicode CLDR Version 35 Language/Locale Data Released

mechanical arm emoji image Unicode CLDR 35 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 35 included a limited Survey Tool data collection phase. The following summarizes the changes in the release.

Data 70,000+ new data fields, 13,400+ revised data fields
Basic coverage New languages at Basic coverage: Cebuano (ceb), Hausa (ha), Igbo (ig), Yoruba (yo)
Modern coverage Languages Somali (so) and Javanese (jv) increased coverage from Moderate to Modern
Emoji 12.0 Names and annotations (search keywords) for 90+ new emoji; Also includes fixes for previous names & keywords
Collation Collation updated to Unicode 12.0, including new emoji; Japanese single-character (ligature) era names added to collation and search collation
Measurement units 23 additional units
Date formats Two additional flexible formats, and 20 new interval formats
Japanese calendar In Japanese locale, updated to use Gannen (元年) year numbering for non-numeric formats (which include 年), and to consistently use narrow eras in numeric date formats such as “H31/3/27”.
Region Names Many names updated to local equivalents of “North Macedonia” (MK) and “Eswatini” (SZ).
Segmentation Enhanced Grapheme Cluster Boundary rules for 6 Indic scripts: Gujr, Telu, Mlym, Orya, Beng, Deva.

A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.

For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages