Tuesday, December 13, 2011

CLDR v21 Milestone 2 available for testing

Milestone releases of CLDR provide an opportunity to test a snapshot of the next version of CLDR; they are not intended for use in production. CLDR v21 is not a data submission release; instead, the CLDR group is engaged in improving tools, and making specific changes to data.

Note that the CLDR v21 release is intended to support Unicode 6.1, and depends on some new Unicode 6.1 property values for grapheme break and line break. This Milestone 2 release depends on values from the beta versions of Unicode 6.1 data files.

New additions in this Milestone 2 release include:
  • Changes to the segmentation data to match Unicode 6.1. The behaviors associated with the former "th" grapheme break tailoring and "he" line break tailoring have been moved into the root behavior, so those tailorings are no longer necessary and have been deleted.
  • Two new calendar element structures needed for support of the Chinese lunar calendar (and other calendars such as the Hindu lunar calendars); for more information see http://cldr.unicode.org/development/development-process/design-proposals/chinese-calendar-support:
    • Addition of the <monthPatterns> element structure to indicate how to modify standard month names to mark intercalary leap months, as well as (for some calendars) months adjacent to leap months and combined months. This is supported via the standard month pattern characters 'M' and 'L', so the pattern character 'l' (SMALL LETTER L) formerly provided as a way to mark leap months has been deprecated (it was never supported by underlying data).
    • Addition of the <cyclicNameSets> element structure to support cyclic names for years (and other calendar entities in some calendars).
  • A new "ar_001" locale for Modern Standard Arabic as the default content for "ar". This will permit the "ar_EG" locale (formerly the default content for "ar") to use some Egypt-specific names.
  • Addition of codes for South Sudan
  • Other specific data fixes such as for Ukrainian collation, Ewe day periods, various metazones, and some specific translation errors.

Highlights in the Milestone 1 release (Sept. 29) included:
  • Work in support of pending -t- extension in BCP47
  • Deprecation of 'commonlyUsed' element in timezone names
  • Removal of "whole-locale" aliases (data for constructing is in supplementaldata.xml)
  • First cut at incorporating European Ordering Rules (EOR)

The data is available from SVN under "tag/release-21-d02" as described in
The full list of changes in this milestone is
The current draft LDML specification is