Friday, March 15, 2013

CLDR Version 23 Released

Unicode CLDR 23 has been released, providing an update to the key building blocks for software supporting the world's languages.

Unicode CLDR 23.0 contains data for 215 languages and 227 territories—654 locales in all. This release focused primarily on improvements to the LDML structure and tools, and on consistency of data. It includes substantially improved support for non-Gregorian calendars (such as the Japanese Imperial calendar used extensively in Japan). The data and structure has also been modified to easily permit changing between 12 and 24 hour formats, and between 2 digit and 4 digit years. The new Unicode character is used for the Turkish Lira, and information is provided for currencies that round to 5 cents (or other subunits) in cash transactions. For most languages that use non-Latin scripts, characters in the language’s script now collate before those in other scripts (including A-Z). Language-specific letter-casing changes (Lower, Upper, Title) have been added for Azerbaijani, Greek, Lithuanian, and Turkish. Keyboard data has also been updated for Android. Also, as of this release, the LDML specification is split into multiple parts, each focusing on a particular area.

The release had a short cycle so that we could move to the new regular semi-annual schedule. It thus only included a limited data submission phase, for 4 languages only: Armenian (hy), Georgian (ka), Mongolian (mn), and Welsh (cy). For those languages, the data increased by over 100%.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium

Tuesday, March 12, 2013

Unicode 6.3 Beta Review

The Unicode® Consortium today announced the start of beta review for the forthcoming Unicode 6.3.0. All beta feedback must be submitted by April 29, 2013.

The main feature of Unicode 6.3 is the update of the Unicode Bidirectional Algorithm and five newly-encoded bidirectional format control characters: U+061C ARABIC LETTER MARK and the isolate span controls U+2066..U+2069. This version also rolls in various minor corrections for errata and other small updates for the Unicode Character Database.

Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML,...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard. Software developers and other experts are strongly encouraged to review the beta data files and documentation for Unicode 6.3.0 carefully, and to provide any feedback regarding errors or other issues to the Unicode Consortium. Software developers can also get an early start in testing their programs with the beta data files so they will be ready for the release of Unicode 6.3.0 in June, 2013.

• See, for information about testing the 6.3.0 beta.
• See for the current draft summary of Unicode 6.3.0.

Wednesday, March 6, 2013

In Memoriam page for Unicode contributors

Unicode is a project that has been built by hundreds of people over many decades. Some people involved in this project are no longer with us, and we wish to remember their contributions:

Tuesday, March 5, 2013

Specifying Optional Conjuncts in Malayalam

The UTC has posted a new Public Review Issue regarding a proposal to specify optional conjuncts in Malayalam.

In Malayalam there are two prevailing orthographies, traditional and reformed. Both are written using the same Malayalam character set. The difference between them is typically manifested only by the font. Traditional orthography accommodates more full conjuncts, while the reformed orthography would use visible virama (Chandrakkala) separated sequences for many of those full conjuncts.

This proposal specifies the further use of ZWJ and ZWNJ in sequences in the Malayalam script to indicate preferences for optional display of conjuncts. Such sequences are intended to indicate the preferences, both for rendering systems that support the reformed Malayalam orthography and for systems that support the traditional Malayalam orthography.

The UTC is seeking feedback on this proposal, regarding its advisability and potential impacts on implementations, as well as any suggestions for alternative approaches to the issues raised in the background document.

Friday, March 1, 2013

New FAQ on Private-use Characters, Noncharacters and Sentinels

A new FAQ page devoted to the topic of private-use characters, noncharacters, and sentinels has been posted on the Unicode web site. This FAQ aims to clear up confusion about whether noncharacters are permitted in Unicode text, and how they differ from ordinary private-use characters. The recently published Corrigendum #9: Clarification About Noncharacters makes it clear that noncharacters are permitted even in interchange, and the new FAQ page addresses some of the fine points about their usage and about differences from other types of Unicode code points. The brief mentions of noncharacters in other FAQ pages have also been updated accordingly.

Are you unclear about what Unicode “noncharacters” even are? The new FAQ page also answers basic questions about noncharacters and private-use characters, and provides a bit of history about how they came to be part of the Unicode Standard.