Tuesday, January 31, 2012

Announcing the Unicode Standard, Version 6.1

Mountain View, January 31, 2012. The Unicode Consortium announces the release of Version 6.1 of the Unicode Standard, continuing Unicode's long-term commitment to support the full diversity of languages around the world. This latest version adds characters to support additional languages of China, other Asian countries, and Africa. It also addresses educational needs in the Arabic-speaking world. A total of 732 new characters have been added. For full details, see http://www.unicode.org/versions/Unicode6.1.0/.

This version of the Standard also brings technical improvements to support implementers. Improved changes to property values and their aliases mean that properties now have easy-to-specify labels. The new labels combined with a new script extensions property means that regular expressions can be more straightforward and are easier to validate.

Over 200 new Standardized Variants have been added for emoji characters, allowing implementations to distinguish preferred display styles between text and emoji styles. For example:

26FA FE0E U+26FA+U+FE0E/ TENT text style
26FA FE0F U+26FA+U+FE0F/ TENT emoji style
26FD FE0E U+26FD+U+FE0E/ FUEL PUMP text style
26FD FE0F U+26FD+U+FE0F/ FUEL PUMP emoji style

Among the notable property changes and additions in Unicode 6.1 are two new line break property values, which improve the line-breaking behavior of Hebrew and Japanese text. Segmentation behavior was also improved for Thai, Lao, and similar languages.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.1. These will be finalized in February:
  • UTS #10, Unicode Collation Algorithm
  • UTS #46, Unicode IDNA Compatibility Processing

Friday, January 6, 2012

Release candidate for Unicode 6.1 character data

Because Unicode is at the foundation of all modern software using text, it is important to verify that problems are not introduced with new versions. If your implementation uses Unicode data, please download and test the final release candidate of the Unicode 6.1 data (UCD) with your implementation now. Please note that the Unicode Collation Algorithm (UCA) and the Unicode IDNA Compatibility Processing are correlated with version 6.1; if you have an implementation of them, please check the data below as well.

That data can be found in:
  1. Unicode
    1. http://unicode.org/Public/6.1.0/ucd/ (data, semicolon-delimited)
    2. http://unicode.org/Public/6.1.0/ucdxml/ (data, xml)
    3. http://www.unicode.org/reports/tr44/proposed.html (documentation)
  2. UCA
    1. http://unicode.org/Public/UCA/6.1.0/ (data)
    2. http://www.unicode.org/reports/tr10/proposed.html (documentation)
  3. IDNA compatibility
    1. http://unicode.org/Public/idna/6.1.0/ (data)
    2. http://www.unicode.org/reports/tr46/proposed.html (documentation)
For more information, see http://unicode.org/versions/beta.html.
Note that at this point in the process, no substantive changes can be made unless:
  1. a problem is found in carrying out the actions directed by the Unicode Technical Committee for the release, or
  2. an editorial problem is found in the data comments or documentation.
The Unicode Consortium is planning to move up the release date of Unicode 6.1 (UCD and UAXes) to January instead of February, so any final comments should be made by January 6th. You can send your comments using the Contact Form (http://www.unicode.org/reporting.html).

The draft code charts for Unicode 6.1 have also been updated. We encourage users to check the code charts carefully to verify correctness of the new characters added to Unicode 6.1 and to ensure that there are no regressions in glyph shapes for previously encoded characters. For links to the charts, see http://unicode.org/versions/beta.html.