Monday, September 30, 2013

Announcing The Unicode Standard, Version 6.3

The Unicode Consortium announces Version 6.3 of the Unicode Standard and with it, significantly improved bidirectional behavior. The updated Version 6.3 Unicode Bidirectional Algorithm now ensures that pairs of parentheses and brackets have consistent layout and provides a mechanism for isolating runs of text.

Based on contributions from major browser developers, the updated Bidirectional Algorithm and five new bidi format characters will improve the display of text for hundreds of millions of users of Arabic, Hebrew, Persian, Urdu, and many others. The display and positioning of parentheses will better match the normal behavior that users expect. By using the new methods for isolating runs of text, software will be able to construct messages from different sources without jumbling the order of characters. The new bidi format characters correspond to features in markup (such as in CSS). Overall, these improvements also bring greater interoperability and an improved ability for inserting text and assembling user interface elements.

The improvements come with new rigor: the Consortium now offers two reference implementations and greatly improved testing and test data.

In a major enhancement for CJK usage, this new version adds standardized variation sequences for all 1,002 CJK compatibility ideographs. These sequences address a well-known issue of the CJK compatibility ideographs — that they could change their appearance when any process normalized the text. Using the new standardized variation sequences allows authors to write text which will preserve the specific required shapes of these CJK ideographs, even under Unicode normalization.

Version 6.3 includes other improvements as well:
  • Improved Unihan data to better align with ISO/IEC 10646
  • Better support for Hebrew word break behavior and for ideographic space in line breaking
Get started with Unicode 6.3 today!

Wednesday, September 18, 2013

CLDR Version 24 released

Unicode CLDR 24 has been released, providing an update to the key building blocks for software supporting the world's languages. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.
Unicode CLDR 24 focused on additional structure for formatting units, dates, and times, and improving data coverage. This version contains data for 238 languages and 259 territories—740 locales in all. Ten languages were added to the 100%-modern-coverage list for a total of 70 languages. Between the new languages, and the new structure, more data was entered than in any previous release.

The new structure focused primarily on formatting of units and improvements to date and time formatting.
  • fractional plural forms. major extension to handle fractions (eg, some languages use the equivalent of “1.2 teaspoons” but “2.1 teaspoon”)
  • measurement units. many additional unit types (“10.3 kg”), in up to 6 plural forms per language
  • compound units. video length: "23 hrs, 7 mins", or "23:07"
  • dates/times. new relative fields such as "last Sunday", and "now"; 12 hour time formats that omit "am/pm"; neutral eras ("405 BCE"); additional timezone falback regional patterns ("{city} Daylight Time")
  • number formatting. exponential notation (1.42×1023), at-least ("99+"), ranges ("3.5-4.5 kg"), narrow currency symbols (both "US$12.23" and "$12.23").
  • collation. major simplification of rule syntax, updated root files to Unicode 6.3; preliminary version of European Ordering Rules; documentation of the CLDR Collation Algorithm (extending UCA)
  • JSON. improved support, including new structure and data.
In addition, the data already present from CLDR v23 was reviewed for the supported languages, and many improvements made.

Details of coverage improvements and new features are provided in, along with a detailed Migration section.

About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium:

Monday, September 16, 2013

Henry Luce Foundation Grant to Unicode in Support of Encoding Tangut

The Consortium is very pleased to announce the generous grant made by the Henry Luce Foundation to support progress on encoding Tangut. The Luce Foundation has made a one-time grant to the Unicode Consortium to support a December 2013 meeting to further progress the Tangut script for its eventual incorporation into the Unicode Standard and the associated ISO/IEC 10646 International Standard. The meeting will bring together scholars of Tangut and experts in the character encoding process to agree on the character repertoire for this large and complex script. Work on this grant is directed by Dr. Deborah Anderson, Technical Director of the Consortium and the Project Leader of the UC Berkeley's Script Encoding Initiative.

Wednesday, September 4, 2013

UTR #50, Unicode Vertical Text Layout, now published

The Unicode Consortium is pleased to announce the first published version of UTR #50, Unicode Vertical Text Layout. Up until now, vertical text layout has been challenging for implementers due to the somewhat ambiguous nature of character orientation, and often relied on information that is buried deep within font formats. This Unicode Technical Report lays the groundwork for a non-ambiguous vertical text layout model that can serve a broad range of environments. Details can be found at

For general information about Unicode Technical Reports, please see