The Unicode Blog: February 2012

Friday, February 17, 2012

Localization World Unicode workshop, June 2012, Paris

We are pleased to announce that Localization World is organizing a one-day Unicode workshop on Unicode, including an introduction with Richard Ishida and three additional sessions. This will take place on the preconference day, June 4, 2012, in Paris. Richard is an experienced presenter at Unicode conferences, and is well known for his clear and effective presentations.

The Unicode Consortium’s goal is to enable people around the world to use computers in any language. The Consortium is involved in core internationalization specifications at the heart of all modern software, such as the Unicode Standard for character encoding. The Consortium’s involvement in localization is a key extension of this work. The Unicode Consortium maintains and extends the Common Data Locale Repository (CLDR), and in 2011 established the Unicode Localization Interoperability Technical Committee to improve the interoperability of localization data interchange.

For more information, including the program of the June LocalizationWorld Conference, please see http://www.localizationworld.com/lwparis2012/program.php .

Helena Chapman, chair, Unicode Localization Interoperability Technical Committee
Ulrich Henes, Donna Parrish and Daniel Goldschmidt, chair, vice-chairs, Localization World Conference Program Committee

Friday, February 10, 2012

Unicode Releases Common Locale Data Repository, Version 21.0

Mountain View, CA, February 10, 2012 - The Unicode® Consortium announced today the release of a new version of the Unicode Common Locale Data Repository (Unicode CLDR 21.0), providing key building blocks for software to support the world's languages.

Unicode CLDR 21.0 contains data for 193 languages and 170 territories: 528 locales in all. This release did not include a public data submission phase, and focused on improvements to the LDML structure and tools, and consistency of data.

Main features included the updates for Unicode 6.1, a major cleanup of timezone names, date format data, and delimiters (“…” vs „…“ vs „…” vs …); the new BCP47 -t- extension; addition of ordinal categories (1st, 2nd,…), collation reordering (eg, Cyrillic before Latin), multiple numbering systems for a locale, abbreviated numbers (eg, “1.2 B”); and restructuring of Chinese calendar data. For more information on other changes since the 2.0.1 release, see the CLDR 21 Release Note.

Unicode CLDR is by far the largest and most extensive standard repository of locale data. This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; choosing languages or countries by name; transliterating different alphabets; and many others. Unicode CLDR 21 is part of the Unicode locale data project, together with the Unicode Locale Data Markup Language (LDML: http://unicode.org/reports/tr35/). LDML is an XML format used for general interchange of locale data, such as in Microsoft's .NET.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts. For more information about the Unicode CLDR project (including charts) see http://cldr.unicode.org/.

Thursday, February 2, 2012

UTS #10, Unicode Collation Algorithm, Version 6.1 Released

Mountain View, CA, USA – February 2, 2010 – The new version of Unicode Technical Standard #10, Unicode Collation Algorithm has been released, updating to Unicode Version 6.1.

This new version adds a number of features:

The collation ordering for the 732 new Unicode characters.
A major revision to the ordering of "variable" characters into groups, separating punctuation and symbols. This change may present migration issues for some implementations.
Options added for ignoring spaces and punctuation (but not symbols), and for reordering groupings of characters, such as putting Latin characters before Greek (for Greek users), or digits after letters.

There are also important improvements in documentation:

A new section on asymmetric search (where a query of the base character 'e' matches é, è,…, but a query of the more specific é doesn't match other accented versions or the base character).
Important restructuring and clarifications of other sections.

Wednesday, February 1, 2012

UTS #46, Unicode IDNA Compatibility Processing, Version 6.1 Released

Mountain View, CA, USA – February 1, 2010 – The new version of Unicode Technical Standard #46, Unicode IDNA Compatibility Processing has been released, updating to Unicode Version 6.1. It adds support for 528 additional characters in internationalized domain names (IDN).
The specification provides two main features for use with the internationalized domain names specification released in August 2010 (IDNA2008):

A comprehensive mapping to reflect user expectations for casing and other variants of domain names. This mapping is allowed by IDNA2008, and follows the same principles as in the previous version of that specification (IDNA2003). It thus provides users consistency between old and new versions.
A compatibility mechanism that supports internationalized domain names valid under the IDNA2003 specification and the IDNA2008 specification. This second feature allows browsers, search engines, and other clients to handle both old and new domain names during the transitional period until registries update their rules to follow IDNA2008.

UTS #46 supplies normative data tables that are synchronized with the latest version of the Unicode Standard, allowing implementations to update without recalculation. This new version also provides an "NV8" flag in the data files, making it easier for implementations to disable the compatibility mechanism.