The Unicode Blog: March 2015

Wednesday, March 25, 2015

Emoji Glyph and Annotation Recommendations

The Unicode Technical Committee has released a list of recommendations for changes in Unicode chart glyphs and/or annotations for many emoji characters, to promote better interchange across platforms. Feedback either for or against these changes is welcome. For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.

Thursday, March 19, 2015

CLDR Version 27 Released

Unicode CLDR 27 has been released, providing an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

There was no Survey Tool data collection phase for CLDR 27. Instead, the release focused primarily on stability—cleaning up data inheritance and making specific fixes—as well as improvements to the JSON format of the data. Changes include the following:

Cleanup of region locales: A major cleanup effort was undertaken to resolve gratuitous differences between region-specific locales and the parent from which they inherit. In regional locales, it was determined where the parent value was an acceptable replacement for a child-specific value which could then be removed, providing greater consistency in behavior in the various region locales. A special effort was made to clean up country names in certain locales.
Changes to English inheritance: As an outcome of the cleanup effort above, the inheritance model for English locales is now simplified, making all en_XX locales inherit from either “en” directly ( for current or former U.S. territories ), or from British-influenced “en_001 - World English”. This is also reflected in some changes for measurement systems.
Emoji: Data for emoji annotations and an emoji collation were added, to accompany Unicode Technical Report #51, Unicode Emoji.
Collation: There are new sort orders for emoji (as noted above), and an Austrian phonebook sort order. Scripts can be reordered individually, rather than only in specific groups. Fractional tertiary weights are now used that are lower than common, to allow shorter sort-keys with normal Hiragana letters.
Specification: The LDML specification has descriptions of new or modified structure, plus a number of fixes and clarifications. See Modifications for a list of changes.

Improved documentation of locale inheritance and matching, bundle versus item lookup, and parent locale information.
Extensive clarifications to the intended use of the language matching data.
Explicit new definitions of Unicode identifiers, such as Unicode Calendar Identifier, for use in citations.

Charts: The navigation within charts has been improved, and new ones added:

Delta charts, showing detailed changes from v26 to v27.
Day periods, showing the new day period selectors.
Emoji Annotations, showing provisional annotations in English and 21 other locales.
Territory Information, split out of Language Territory Information for easier viewing.

JSON on github: The JSON form of the data is now available on github, rather than being found through the Data link.

Details are provided in http://cldr.unicode.org/index/downloads/cldr-27, along with a detailed Migration section.

Tuesday, March 10, 2015

Unicode 8.0 Beta Review

Mountain View, CA, USA – The Unicode® Consortium today announced the start of the beta review for the forthcoming Unicode 8.0.0, which is scheduled for release in June, 2015. All beta feedback must be submitted by April 27, 2015.
Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML, ...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard.

Unicode 8.0.0 comprises several changes which require careful migration in implementations, including the conversion of Cherokee to a bicameral script, a different encoding model for New Tai Lue, and additional character repertoire. Implementers need to change code and check assumptions regarding case mappings, New Tai Lue syllables, Han character ranges, and confusables. Character additions in Unicode 8.0.0 include emoji symbol modifiers for implementing skin tone diversity, other emoji symbols, a large collection of CJK unified ideographs, a new currency sign for the Georgian lari, and six new scripts. For more information on emoji in Unicode 8.0.0, see the associated draft Unicode Emoji report.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 27, 2015. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-8.0.0.html for more information about testing the 8.0.0 beta.
See http://unicode.org/versions/Unicode8.0.0/ for the current draft summary of Unicode 8.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.

For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.