Wednesday, March 16, 2016

CLDR Version 29 Released

CLDR CoverageUnicode CLDR 29 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.

New BCP47 extension keys have been added for specifying transliteration and emoji presentation, and for customizing locales with region-specific settings. Many new transforms are provided, the rule format has been simplified, and BCP47 IDs have been added for all transforms. Region data now includes appropriate preferences for day periods such as “6:00 in the morning” and “7:00 in the evening”, and there is new structure for choosing appropriate units based on region and usage. A Cantonese locale has been added. The emoji ordering has been improved, and annotations are provided for more emoji and in more locales. The JSON-format data has been extended to include number spellout (RBNF) and script metadata.

The specification and charts have also been updated.

For further details and links to documentation, see the CLDR Release Notes

Tuesday, March 15, 2016

Be a Part of Our 40th Conference!

Call for Participation Now Open

 [IUC 40 Banner]

For twenty-five years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. The 40th conference will be held this year on November 1-3, 2016 in Santa Clara, California.

Two Key Themes for This Year

Breaking All Barriers: Explore how software providers can meet the globalization challenges of supporting the burgeoning diversity of communication platforms around the world, including mobile, tablets, social media, video, and voice. Examine how online social platforms are supporting multilingual text and rich content in hundreds of languages. Often the task is not just to publish in multiple languages, but to accept input in alternative forms, analyze it for meaning and sentiment, look for patterns in big data, or automate its routing or translation. This theme also includes the latest advances in relevant standards, and emerging and historic scripts.

Trained, Tested, Trusted: Understand best practices in process and among teams reliably delivering high quality global products. Examine how developers build, test, and deploy great global products. Explore technologies for design, localization, multilingual testing, workflow management, and content management.

This is the conference where you can promote your ideas and experience working with natural languages, multicultural user interfaces, producing and supporting multinational and multilingual products, linguistic algorithms, applying internationalization across mobile and social media platforms, or advancements in relevant standards.

We welcome your proposals for papers and tutorials. View examples of content from past conferences on the IUC 40 website.

Thursday, March 10, 2016

Unicode 9.0 Beta Review

 [Adlam Sample Image] Mountain View, CA, USA – The Unicode® Consortium today announced the start of the beta review for the forthcoming Unicode 9.0.0, which is scheduled for release in June, 2016. All beta feedback must be submitted by May 2, 2016.

Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones – plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). Thus it is important to ensure a smooth transition to each new version of the Unicode Standard.

Unicode 9.0.0 comprises several additions and changes which require careful migration in implementations. These include asymmetric case mappings, numerous variation sequences, new fractional numeric values, and changes to property values, especially East_Asian_Width values. The line breaking and text segmentation algorithms handle character sequences that represent emoji as indivisible units via the addition of new property values and rules. Implementers need to modify code and check assumptions for all affected processes to support these additions and changes.

The new character repertoire includes 74 emoji symbols, 19 symbols used in Japanese TV broadcasting, and multiple additions to existing scripts. There are six new scripts, of which three are in modern use (Adlam, Osage, and Newa) and three are historic (Bhaiksuki, Marchen, and Tangut). Adlam and Osage have case pairs and require data updates for casing functions. Tangut is a large ideographic script whose addition incurred changes to the Unicode Collation Algorithm (used as the basis for sorting text in all languages).

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 2, 2016. Feedback instructions are on the beta page.

See for more information about testing the 9.0.0 beta.

See for the current draft summary of Unicode 9.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emoji One, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium