Thursday, February 23, 2017

Proposed update of UTS #46, for Unicode domain names

UTS #46 “Unicode IDNA Compatibility Processing” is used by many applications to support internationalized domain names with non-English characters. The proposed update to Version 10.0 regenerates the UTS #46 data files based on new additions to the Unicode repertoire, and adds three new parameters for processing: CheckHyphens, CheckBidi, and CheckJoiners. These parameters allow implementations to reflect current practice in browsers. The note about the use of IDNA2008 now includes the number of “missing” IDNA2008 characters (26,568), and is reworded for clarity.

There are two review notes requesting feedback on the use of Joiner characters.

For details and information about how to provide feedback, please see Public Review Issue #347.

Monday, February 20, 2017

Unicode Locale Data v31α available for testing

cldr v31 alpha The Alpha version of Unicode CLDR version 31 is available for testing. The beta v31 will contain updates to the LDML spec and should be available on March 1, with the release of v31 planned for March 15.

CLDR 31 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Aside from the regular updates to codes and data, some of the more noticeable changes are:
  • Canonical codes
    • The subdivision codes were changed to consistently use the bcp47 format.
    • The locales in the language-territory population data and the exemplars directory were regularized (dropping likely scripts subtags).
    • The timezone ID for GMT has been split from UTC.
    • There is a new mechanism for identifying hybrid locales, such as Hinglish.
  • Subdivisions
    • Names for Scotland, Wales, and England have been added in many languages.
  • Emoji 5.0
    • Short names and keywords have been updated for English.
    • Collation (sorting) adds the new 5.0 Emoji characters and sequences, and some fixes for Emoji 4.0 characters and sequences.
  • Transforms
    • The Zawgyi→Unicode transform has been improved.
    • Tamil can now be transcribed to the International Phonetic Alphabet (IPA).
This release did not have a data-submission cycle, so the changes reflect cleanup and bug fixes. For more details, and important notes for smoothly migrating implementations, see Unicode CLDR Version 31. If you find a problem, please file a ticket.

Wednesday, January 11, 2017

New Unicode Character Property EquivalentUnifiedIdeograph

sample image A new character property EquivalentUnifiedIdeograph is proposed for addition to Unicode 10.0. This property associates (where possible) the 365 characters in the CJK Radicals Supplement, Kangxi Radicals, and CJK Strokes blocks to an appropriate CJK Unified Ideograph.

For details of the proposal, a link to the proposed data, and information about how to provide feedback, please see Public Review Issue #344.

Wednesday, December 14, 2016

Adopt-A-Character Grant to Support Indic Scripts

Old Sogdian Sample image The Adopt-a-Character program has awarded a grant to support further development of the following four Indic scripts in the Unicode Standard:
  • Hanifi Rohingya, a script in current use in Myanmar and Bangladesh
  • Nandinagari, a Brahmi-based historic script formerly used in South India
  • Old Sogdian, a group of historic scripts formerly used in Kazakhstan, Pakistan, and Western China
  • Sogdian, derived from Old Sogdian, a group of historic scripts formerly used in Central Asia
The goal of this grant is to enable the development of encoding proposals that can be included in the Unicode Standard. The work will be done by Anshuman Pandey under the direction of Deborah Anderson (SEI, UC Berkeley) and Rick McGowan (Unicode Consortium).

Friday, December 9, 2016

Proposed Update UTR #51, Unicode Emoji (Version 5.0)

flag image A proposed update of UTR #51, Unicode Emoji (Version 5.0) is available for public review and feedback. This new version adds a mechanism to support regional flags, such as Scotland or California, though the choice of which of these flags to support is left to vendors.

Associated charts are available at, and associated data files are available at This proposed update also has a separate data file for the valid emoji presentation sequences, and reflects a small change in the ordering of SELFIE. The charts also add the newest Apple and Facebook emoji.

At this time, the proposed update does not add any additional recommended emoji zwj sequences, nor reclassify any existing Unicode 9.0 characters as emoji. There are proposals for doing so that will be reviewed in the next Unicode Technical Committee meeting.

The review period for the proposed update ends on January 16, 2017. For further information and instructions on how to provide feedback, please see Public Review Issue #343.

This holiday season you can give a unique gift by adopting any emoji, letter, or symbol — and help support the Unicode Consortium’s mission to enable all languages to be used on computers. You can now adopt Unicode 9.0 characters and the Emoji 4.0 emoji sequences (such as woman astronaut or rockstar). ​​See the Adopt-a-Character Page.