Monday, March 20, 2017

CLDR Version 31 Released

CLDR CoverageUnicode CLDR 31 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Some of the improvements in the release are:
  • Canonical codes
    • The subdivision codes have been changed to all have the bcp47 format.
    • The locales in the language-territory population data are in canonical format.
    • The timezone ID for GMT has been split from UTC.
    • There is a mechanism for identifying hybrid locales, such as Hinglish.
  • Emoji 5.0
    • Short names and keywords have been updated for English. (Data for other languages to be gathered in the next cycle).
    • Collation (sorting) adds the new 5.0 Emoji characters and sequences, and some fixes for Emoji 4.0 characters and sequences.
    • For Emoji usage, subdivision names for Scotland, Wales, and England have been added for 65 languages.
For further details and links to documentation, see the CLDR Release Notes.

Thursday, March 9, 2017

Unicode 10.0 Beta Review

U10 beta image The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 10.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 10.0, often in coordination with changes to character properties. In particular, there are changes to UAX #14, Unicode Line Breaking Algorithm, UAX #29, Unicode Text Segmentation, and UAX #31, Unicode Identifier and Pattern Syntax. In addition, UAX #50, Unicode Vertical Text Layout, has been newly incorporated as a part of the standard. Four new scripts have been added in Unicode 10.0, including NĂ¼shu. There are also 56 additional emoji characters, a major new extension of CJK ideographs, and 285 hentaigana, important historic variants for Hiragana syllables.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 1, 2017. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-10.0.0.html for more information about testing the 10.0.0 beta.

See http://unicode.org/versions/Unicode10.0.0/ for the current draft summary of Unicode 10.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Friday, March 3, 2017

UTS #51, Unicode Emoji proposed update available

PRI349 image Proposed Update Version 5.0 of UTS #51, Unicode Emoji is available for public review and feedback. This new version is slated to be a Unicode Technical Standard, and thus adds a conformance section and related definitions.

This new version adds a mechanism to support regional flags, such as Scotland or California, although the choice of which of these flags to support is left to vendors beyond a recommended set of three. UTS #51 will have a separate data file for the valid emoji presentation sequences. It also reflects some changes to the recommended sort order that will be released soon in CLDR v31. For more details, see the Modifications section of the document.

Thursday, March 2, 2017

PRI #349: Registration of additional sequences in the Adobe-Japan1 collection

PRI349 image The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #349: A submission for the "Registration of additional sequences in the Adobe-Japan1 collection" has been received by the IVD registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2017-06-02. Please see the submission page for details and instructions on how to review this issue and provide comments: http://www.unicode.org/ivd/pri/pri349/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for CJK Unified Ideographs, which enables standardized interchange in plain text, in accordance with UTS #37, Unicode Ideographic Variation Database.

For further information on Public Review Issues, please see: http://www.unicode.org/review/

Tuesday, February 28, 2017

Netflix Upgrades to Full Member of the Unicode Consortium

Netflix The Unicode Consortium is pleased to announce that Netflix has upgraded from associate member to a full corporate member.

Netflix is the world’s leading Internet television network with over 93 million members in over 190 countries enjoying more than 125 million hours of TV shows and movies per day, including original series, documentaries and feature films.

We look forward to their contributions to the Unicode Standard, ICU, the Common Locale data project, and are grateful for their financial support of the Consortium’s work. Full members of the consortium have a vote in all technical committees, and in the governance of the consortium.