Wednesday, April 26, 2017

Last Call on Unicode 10.0 Beta Review

U10 beta image The beta review period for Unicode 10.0 and related technical standards will close on May 1, 2017. This is the last opportunity for technical comments before version 10.0 is released in Q2 2017. Implementers and interested parties are encouraged to download data files, review proposed updates, and submit comments soon.

In addition to the Unicode Standard proper, three other Unicode Technical Standards have significant text and data file updates that are correlated with the new additions for Unicode 10.0.0. Review of that text and data is also encouraged during the beta review period.

UTS #10, Unicode Collation Algorithm Data files
UTS #39, Unicode Security Mechanisms Data files
UTS #46, Unicode IDNA Compatibility Processing Data files

Additional documents are available for public review and will be discussed at the May UTC meeting, such as the final Emoji 5.0 text, and a proposed Unicode character property. For more information, see the open public review issues and the UTC document registry.

The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 10.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 10.0, often in coordination with changes to character properties. In particular, there are changes to UAX #14, Unicode Line Breaking Algorithm, UAX #29, Unicode Text Segmentation, and UAX #31, Unicode Identifier and Pattern Syntax. In addition, UAX #50, Unicode Vertical Text Layout, has been newly incorporated as a part of the standard. Four new scripts have been added in Unicode 10.0, including Nüshu. There are also 56 additional emoji characters, a major new extension of CJK ideographs, and 285 hentaigana, important historic variants for Hiragana syllables.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 1, 2017. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-10.0.0.html for more information about testing the 10.0.0 beta.

See http://unicode.org/versions/Unicode10.0.0/ for the current draft summary of Unicode 10.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Monday, April 17, 2017

ICU 59 Released

ICU LogoUnicode® ICU 59 has just been released! ICU is the main avenue for many software products and libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

ICU 59 upgrades to CLDR 31 and to emoji 5.0 data, together with segmentation and bidi updates from Unicode 10 beta. The Java code for number formatting has been completely rewritten for reliability and performance. There is also a new case mapping API for styled text, and a technology preview of enhanced language matching.

There are major changes for ICU4C that will make ICU easier to use but require changes in projects using ICU: C++11, char16_t, UTF-8 source files.

For details please see http://site.icu-project.org/download/59

Thursday, April 13, 2017

Call for Unicode 10.0 Cover Design Art

 [cover1] The Unicode Consortium is inviting artists and designers to submit cover design proposals for Version 10.0 of The Unicode Standard.

The cover design will appear on the Unicode Standard 10.0 web page, in the print-on-demand publication, and in associated promotional literature on the Unicode website. The chosen artist will receive full credit in the colophon of the publication, and wherever else the design appears, and receive $700. The two runner-up artists will receive $150 apiece.

Please see the announcement web page for requirements and more details.

Friday, April 7, 2017

PRI #351: Combined registration of the KRName collection and of sequences in that collection

PRI 351 The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #351: A submission for the “Combined registration of the KRName collection and of sequences in that collection” has been received by the IVD Registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2017-07-07. Please see the submission page for details and instructions on how to review this issue and provide comments:

http://www.unicode.org/ivd/pri/pri351/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for Ideographs, which enables standardized interchange in plain text, in accordance with UTS #37, Unicode Ideographic Variation Database.

Monday, March 27, 2017

Unicode Emoji 5.0 characters now final


Fifty-six new emoji characters are in the just released Emoji 5.0 data, including such characters as:

shushing face mage
flying saucerpie
T-Rexbroccoli*
* for healthy eaters!

The new Emoji 5.0 set is fixed, and available for vendors to begin working on their emoji fonts and code ahead of the release of Unicode 10.0, scheduled for June 2017.

The majority of these new emoji characters are the 34 Smileys & People, with 13 new Food & Drink, followed up by 6 Animals & Nature and a few others.

There are an additional 180 emoji sequences for gender and skin-tone in Smileys & People — such as woman in lotus position: medium skin tone — and new regional flags for England, Scotland, and Wales. This makes a total of 239 new emoji (characters and sequences). For a full list, see Emoji Recently Added.

The emoji charts have been updated to show the new characters and sequences. The draft Emoji 5.0 specification will be finalized in the May UTC meeting, and is still available for comment.
The 239 new emoji are also now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

Adopt a Character

Monday, March 20, 2017

CLDR Version 31 Released

CLDR CoverageUnicode CLDR 31 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Some of the improvements in the release are:
  • Canonical codes
    • The subdivision codes have been changed to all have the bcp47 format.
    • The locales in the language-territory population data are in canonical format.
    • The timezone ID for GMT has been split from UTC.
    • There is a mechanism for identifying hybrid locales, such as Hinglish.
  • Emoji 5.0
    • Short names and keywords have been updated for English. (Data for other languages to be gathered in the next cycle).
    • Collation (sorting) adds the new 5.0 Emoji characters and sequences, and some fixes for Emoji 4.0 characters and sequences.
    • For Emoji usage, subdivision names for Scotland, Wales, and England have been added for 65 languages.
For further details and links to documentation, see the CLDR Release Notes.

Thursday, March 9, 2017

Unicode 10.0 Beta Review

U10 beta image The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 10.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 10.0, often in coordination with changes to character properties. In particular, there are changes to UAX #14, Unicode Line Breaking Algorithm, UAX #29, Unicode Text Segmentation, and UAX #31, Unicode Identifier and Pattern Syntax. In addition, UAX #50, Unicode Vertical Text Layout, has been newly incorporated as a part of the standard. Four new scripts have been added in Unicode 10.0, including Nüshu. There are also 56 additional emoji characters, a major new extension of CJK ideographs, and 285 hentaigana, important historic variants for Hiragana syllables.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 1, 2017. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-10.0.0.html for more information about testing the 10.0.0 beta.

See http://unicode.org/versions/Unicode10.0.0/ for the current draft summary of Unicode 10.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Friday, March 3, 2017

UTS #51, Unicode Emoji proposed update available

PRI349 image Proposed Update Version 5.0 of UTS #51, Unicode Emoji is available for public review and feedback. This new version is slated to be a Unicode Technical Standard, and thus adds a conformance section and related definitions.

This new version adds a mechanism to support regional flags, such as Scotland or California, although the choice of which of these flags to support is left to vendors beyond a recommended set of three. UTS #51 will have a separate data file for the valid emoji presentation sequences. It also reflects some changes to the recommended sort order that will be released soon in CLDR v31. For more details, see the Modifications section of the document.

Thursday, March 2, 2017

PRI #349: Registration of additional sequences in the Adobe-Japan1 collection

PRI349 image The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #349: A submission for the "Registration of additional sequences in the Adobe-Japan1 collection" has been received by the IVD registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2017-06-02. Please see the submission page for details and instructions on how to review this issue and provide comments: http://www.unicode.org/ivd/pri/pri349/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for CJK Unified Ideographs, which enables standardized interchange in plain text, in accordance with UTS #37, Unicode Ideographic Variation Database.

For further information on Public Review Issues, please see: http://www.unicode.org/review/

Tuesday, February 28, 2017

Netflix Upgrades to Full Member of the Unicode Consortium

Netflix The Unicode Consortium is pleased to announce that Netflix has upgraded from associate member to a full corporate member.

Netflix is the world’s leading Internet television network with over 93 million members in over 190 countries enjoying more than 125 million hours of TV shows and movies per day, including original series, documentaries and feature films.

We look forward to their contributions to the Unicode Standard, ICU, the Common Locale data project, and are grateful for their financial support of the Consortium’s work. Full members of the consortium have a vote in all technical committees, and in the governance of the consortium.

Monday, February 27, 2017

Be a Part of IUC 41! Call for Participation

IUC 41 The Internationalization and Unicode Conference® (IUC) is the annual conference of the Unicode Consortium where experts and industry leaders gather to map the future of internationalization, ignite new ideas and present the latest in technologies and best practices for creation, management, and testing of global, web, and multilingual software solutions.

Join in with other industry leaders to present your ideas and solutions at the 41st Internationalization & Unicode Conference (IUC 41) in Santa Clara, California, October 16-18, 2017.

Please submit your proposals for presentations or tutorials by Friday, March 24, 2017. Topics can include case studies, best practices, innovative technology, or evolving standards.

Full details and information about how to submit an abstract can be found on the IUC 41 Call for Participation page.

Thursday, February 23, 2017

Proposed update of UTS #46, for Unicode domain names

UTS #46 “Unicode IDNA Compatibility Processing” is used by many applications to support internationalized domain names with non-English characters. The proposed update to Version 10.0 regenerates the UTS #46 data files based on new additions to the Unicode repertoire, and adds three new parameters for processing: CheckHyphens, CheckBidi, and CheckJoiners. These parameters allow implementations to reflect current practice in browsers. The note about the use of IDNA2008 now includes the number of “missing” IDNA2008 characters (26,568), and is reworded for clarity.

There are two review notes requesting feedback on the use of Joiner characters.

For details and information about how to provide feedback, please see Public Review Issue #347.

Monday, February 20, 2017

Unicode Locale Data v31α available for testing

cldr v31 alpha The Alpha version of Unicode CLDR version 31 is available for testing. The beta v31 will contain updates to the LDML spec and should be available on March 1, with the release of v31 planned for March 15.

CLDR 31 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Aside from the regular updates to codes and data, some of the more noticeable changes are:
  • Canonical codes
    • The subdivision codes were changed to consistently use the bcp47 format.
    • The locales in the language-territory population data and the exemplars directory were regularized (dropping likely scripts subtags).
    • The timezone ID for GMT has been split from UTC.
    • There is a new mechanism for identifying hybrid locales, such as Hinglish.
  • Subdivisions
    • Names for Scotland, Wales, and England have been added in many languages.
  • Emoji 5.0
    • Short names and keywords have been updated for English.
    • Collation (sorting) adds the new 5.0 Emoji characters and sequences, and some fixes for Emoji 4.0 characters and sequences.
  • Transforms
    • The Zawgyi→Unicode transform has been improved.
    • Tamil can now be transcribed to the International Phonetic Alphabet (IPA).
This release did not have a data-submission cycle, so the changes reflect cleanup and bug fixes. For more details, and important notes for smoothly migrating implementations, see Unicode CLDR Version 31. If you find a problem, please file a ticket.

Wednesday, January 11, 2017

New Unicode Character Property EquivalentUnifiedIdeograph

sample image A new character property EquivalentUnifiedIdeograph is proposed for addition to Unicode 10.0. This property associates (where possible) the 365 characters in the CJK Radicals Supplement, Kangxi Radicals, and CJK Strokes blocks to an appropriate CJK Unified Ideograph.

For details of the proposal, a link to the proposed data, and information about how to provide feedback, please see Public Review Issue #344.