CLDR 35 included a limited Survey Tool data collection phase. The following summarizes the changes in the release.
Data | 70,000+ new data fields, 13,400+ revised data fields |
Basic coverage | New languages at Basic coverage: Cebuano (ceb), Hausa (ha), Igbo (ig), Yoruba (yo) |
Modern coverage | Languages Somali (so) and Javanese (jv) increased coverage from Moderate to Modern |
Emoji 12.0 | Names and annotations (search keywords) for 90+ new emoji; Also includes fixes for previous names & keywords |
Collation | Collation updated to Unicode 12.0, including new emoji; Japanese single-character (ligature) era names added to collation and search collation |
Measurement units | 23 additional units |
Date formats | Two additional flexible formats, and 20 new interval formats |
Japanese calendar | In Japanese locale, updated to use Gannen (元年) year numbering for non-numeric formats (which include 年), and to consistently use narrow eras in numeric date formats such as “H31/3/27”. |
Region Names | Many names updated to local equivalents of “North Macedonia” (MK) and “Eswatini” (SZ). |
Segmentation | Enhanced Grapheme Cluster Boundary rules for 6 Indic scripts: Gujr, Telu, Mlym, Orya, Beng, Deva. |
A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.
For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes.
Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages