Monday, February 20, 2017

Unicode Locale Data v31α available for testing

cldr v31 alpha The Alpha version of Unicode CLDR version 31 is available for testing. The beta v31 will contain updates to the LDML spec and should be available on March 1, with the release of v31 planned for March 15.

CLDR 31 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Aside from the regular updates to codes and data, some of the more noticeable changes are:
  • Canonical codes
    • The subdivision codes were changed to consistently use the bcp47 format.
    • The locales in the language-territory population data and the exemplars directory were regularized (dropping likely scripts subtags).
    • The timezone ID for GMT has been split from UTC.
    • There is a new mechanism for identifying hybrid locales, such as Hinglish.
  • Subdivisions
    • Names for Scotland, Wales, and England have been added in many languages.
  • Emoji 5.0
    • Short names and keywords have been updated for English.
    • Collation (sorting) adds the new 5.0 Emoji characters and sequences, and some fixes for Emoji 4.0 characters and sequences.
  • Transforms
    • The Zawgyi→Unicode transform has been improved.
    • Tamil can now be transcribed to the International Phonetic Alphabet (IPA).
This release did not have a data-submission cycle, so the changes reflect cleanup and bug fixes. For more details, and important notes for smoothly migrating implementations, see Unicode CLDR Version 31. If you find a problem, please file a ticket.

Wednesday, January 11, 2017

New Unicode Character Property EquivalentUnifiedIdeograph

sample image A new character property EquivalentUnifiedIdeograph is proposed for addition to Unicode 10.0. This property associates (where possible) the 365 characters in the CJK Radicals Supplement, Kangxi Radicals, and CJK Strokes blocks to an appropriate CJK Unified Ideograph.

For details of the proposal, a link to the proposed data, and information about how to provide feedback, please see Public Review Issue #344.

Wednesday, December 14, 2016

Adopt-A-Character Grant to Support Indic Scripts

Old Sogdian Sample image The Adopt-a-Character program has awarded a grant to support further development of the following four Indic scripts in the Unicode Standard:
  • Hanifi Rohingya, a script in current use in Myanmar and Bangladesh
  • Nandinagari, a Brahmi-based historic script formerly used in South India
  • Old Sogdian, a group of historic scripts formerly used in Kazakhstan, Pakistan, and Western China
  • Sogdian, derived from Old Sogdian, a group of historic scripts formerly used in Central Asia
The goal of this grant is to enable the development of encoding proposals that can be included in the Unicode Standard. The work will be done by Anshuman Pandey under the direction of Deborah Anderson (SEI, UC Berkeley) and Rick McGowan (Unicode Consortium).

Friday, December 9, 2016

Proposed Update UTR #51, Unicode Emoji (Version 5.0)

flag image A proposed update of UTR #51, Unicode Emoji (Version 5.0) is available for public review and feedback. This new version adds a mechanism to support regional flags, such as Scotland or California, though the choice of which of these flags to support is left to vendors.

Associated charts are available at, and associated data files are available at This proposed update also has a separate data file for the valid emoji presentation sequences, and reflects a small change in the ordering of SELFIE. The charts also add the newest Apple and Facebook emoji.

At this time, the proposed update does not add any additional recommended emoji zwj sequences, nor reclassify any existing Unicode 9.0 characters as emoji. There are proposals for doing so that will be reviewed in the next Unicode Technical Committee meeting.

The review period for the proposed update ends on January 16, 2017. For further information and instructions on how to provide feedback, please see Public Review Issue #343.

This holiday season you can give a unique gift by adopting any emoji, letter, or symbol — and help support the Unicode Consortium’s mission to enable all languages to be used on computers. You can now adopt Unicode 9.0 characters and the Emoji 4.0 emoji sequences (such as woman astronaut or rockstar). ​​See the Adopt-a-Character Page.

Thursday, December 1, 2016

Support Unicode with an Adopt-a-Character Gift this Holiday Season!

holiday imageThis holiday season you can give a unique gift by adopting any emoji, letter, or symbol — and help support the Unicode Consortium’s mission to enable all languages to be used on computers. ​Three levels of sponsorship​ are available​, starting at $100. With over 128,000 characters to choose from, you are certain to find an appropriate character, for even the most demanding recipient. All sponsors will receive a custom digital badge featuring the adopted character for use on the web and elsewhere. Sponsors at the two highest levels will receive a special thank-you gift engraved with the name you supply and the adopted character.

The program funds work on “digitally disadvantaged” languages, both modern and historic. In 2016 the program awarded a grant to support work on a proposal for the Hanifi Rohingya script. The program has also funded work on Egyptian hieroglyphs and Mayan hieroglyphs.

In its first year, the Adopt-a-Character program has had nearly 400 sponsors. Be part of the next wave, with a worthwhile gift!

For more information on the program, or to adopt a character, see the Adopt-a-Character Page.