Friday, December 14, 2012

Unicode Stability Policies Updated

The Unicode Character Encoding Stability Policies ensure that developers know what they can depend on between successive releases of the Unicode Standard.
Recent changes to these policies include new guarantees:
  • Property aliases will not be reused later for different properties.
  • Property value aliases will not be reused later for different property values.
  • Characters with the General_Category of Number are guaranteed to have a corresponding Numeric_Type value.
Additionally, the wording for two earlier guarantees about General_Category and Bidi_Class have been clarified:
  • No new General_Category property values will ever be added.
  • New Bidi_Class property values can only be added for a tightly constrained class of new character additions.
For the exact wording of these new and updated guarantees, see Unicode Character Encoding Stability Policies.

Wednesday, December 12, 2012

Feedback requested for Unicode 6.3


Unicode 6.3 is slated to be released in 2013Q3. Now is your opportunity to comment on the contents of this release.

The text of the Unicode Standard Annexes (segmentation, normalization, identifiers, etc.) is open for comments and feedback, with proposed update versions posted at UAX Proposed Updates. Initially, the contents of these documents are unchanged: the one exception is UAX #9 (BIDI), which has major revisions in PRI232. Changes to the text will be rolled in over the next few months, with more significant changes being announced. Feedback is especially useful on the changes in the proposed updates, and should be submitted by mid-January for consideration at the Unicode Technical Committee meeting at the end of January.

A later announcement will be sent when the beta versions of the Unicode character properties for 6.3 are available for comment. The only characters planned for this release are a small number of bidi control characters connected with the changes to UAX #9.

Monday, December 10, 2012

Unicode Collation Proposed Update

The Unicode Collation Algorithm (UCA) data is being modified to make all digits with the same numeric value sort the same, whether they are European (ASCII), Arabic, Devanagari, or others. In addition, the format of the main data table has changed to omit the (unused) 4th level weight, and some data tables are moved to the Unicode CLDR project.

These and other changes are in the new proposed update: see PRI 235. For the exact list of modifications, see Modifications.

Friday, November 16, 2012

Unicode 6.2 core specification now available

The Unicode 6.2 core specification is now available. The text has been updated to align it with changes to Unicode algorithms and properties that were released in September, including the addition of the newly adopted Turkish lira sign. The release of the core specification completes the definitive documentation of the Unicode Standard, Version 6.2.

For more details, see http://www.unicode.org/versions/Unicode6.2.0.

Friday, October 26, 2012

CLDR Version 22.1 Released

Oct 26, 2012 — Unicode CLDR 22.1 has been released, providing an update to the key building blocks for software supporting the world's languages.

Unicode CLDR 22.1 contains data for 215 languages and 227 territories—654 locales in all. Version 22.1 is an update release, with several important fixes to CLDR 22.0, such as addition of the new Turkish currency symbol, and simpler patterns for fallback timezone formatting (“Los Angeles Time” instead of “United States Time (Los Angeles)”). For details, see CLDR-22.1.

CLDR is by far the largest and most extensive standard repository of locale data, used by a wide spectrum of companies for their software internationalization and localization. It is widely deployed via International Components for Unicode (ICU), and also accessed directly by companies such as Apple, Google, IBM, Twitter, and many others. CLDR is part of the Unicode locale data project, together with the Unicode Locale Data Markup Language (LDML)—an XML format used for general interchange of locale data, such as in Microsoft's .NET.

See the Charts pages for views of the CLDR data, organized in various ways. For more information about the Unicode CLDR project see cldr.unicode.org.





About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Tamil Virtual University, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.

Tuesday, October 16, 2012

Two New Public Review Issues, UAX #9 and UTR #20

The Unicode Technical Committee has posted two new issues for public review and comment.

    http://www.unicode.org/review/

Review periods for the new items close on January 21, 2013.
Please see the page for links to discussion and relevant documents.

Briefly, the new issues are:

PRI #232, Proposed Update UAX #9, Unicode Bidirectional Algorithm

UAX #9 will be updated for Unicode 6.2.1. This proposed update involves a substantial extension of the Unicode Bidirectional Algorithm to allow for the implementation of isolate runs. It also introduces a new X_Bidi_Class property in support of that extension. See the modifications section of the proposed update for information on specific changes to sections in the document.
http://www.unicode.org/review/pri232/

PRI #233, Proposed Update UTR #20, Unicode in XML and other Markup Languages

This Unicode Technical Report will have its references corrected and various other small editorial changes made to bring it up-to-date with Unicode 6.2.
http://www.unicode.org/review/pri233/

To supply feedback on these issues, see http://www.unicode.org/review/#feedback .

Wednesday, September 26, 2012

Announcing The Unicode Standard, Version 6.2

Version 6.2 of the Unicode Standard is now available. This version adds only a single character, the newly adopted Turkish Lira sign; however, the properties and behaviors for many other characters have been adjusted. Emoji and pictographic symbols now have significantly improved line-breaking, word-breaking and grapheme cluster behaviors. The script categorizations for some characters are improved and better documented.

The Unicode Collation Algorithm has been greatly enhanced for Version 6.2, with a major overhaul of its documentation. There have also been significant changes to the collation weight tables, including improved handling of tertiary weights for characters with decompositions, and changed weights for some pictographic symbols.

The newly encoded Turkish Lira sign, like other currency symbols, is expected to be heavily used in its target environment. The Unicode Consortium accelerated the release of Unicode 6.2, to accommodate the urgent need for this character.

For more details of this release, see http://www.unicode.org/versions/Unicode6.2.0/.

Monday, September 10, 2012

CLDR Version 22 Released


Mountain View, CA, Sept. 10, 2012  - The Unicode® Consortium announced today the release of a new version of the Unicode Common Locale Data Repository (Unicode CLDR 22.0), providing key building blocks for software to support the world's languages.

Unicode CLDR 22.0 contains data for 215 languages and 227 territories—654 locales in all. The main focus for this release is to flesh out data items in major languages and locales, yielding an increase of over 100% in the total number of data fields. Other major features include the addition of keyboard mapping data for different platforms, the new Zhuyin (Bopomofo) sort order for Chinese, and script metadata. There are also enhancements to compact decimals (such as formatting 1,000,000 as “1 million” or “1M”) for different languages and to rule-based number formats (such as writing 423 as "four hundred and twenty-three"). For more details, see the CLDR 22.0 Release Note.
CLDR is used to adapt software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; choosing languages or countries by name; and transliterating different alphabets.

It is by far the largest and most extensive standard repository of locale data, used by a wide spectrum of companies for their software internationalization and localization. It is widely deployed via International Components for Unicode (ICU), and also accessed directly by companies such as Apple, Google, IBM, Twitter, and many others. 

CLDR is part of the Unicode locale data project, together with the Unicode Locale Data Markup Language (LDML)—an XML format used for general interchange of locale data, such as in Microsoft's .NET. See the charts pages for views of the CLDR data, organized in various ways. For more information about the Unicode CLDR project see cldr.unicode.org.

About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Tamil Virtual University, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.