Wednesday, November 1, 2017

CLDR Version 32 Released

Graph of CLDR 32 data growth Unicode CLDR 32 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Some of the improvements in the release are:
  • More complete data
    • Major contributions of main locale data for Chakma (ccp), Sindhi (sd), Odia (or), Kabyle (kab), Pashto (ps), Turkmen (tk), Norwegian Nynorsk (nn), Assamese (as), and others.
    • Rule-based number formats for Indian English, Akan, Hindi (oblique), Cherokee; revisions to some others.
    • Import of draft subdivision names and language groups from wikidata.
  • New data types
    • Numeric exemplars. For example, in zh: [\- , . % ‰ + 0 1 2 3 4 5 6 7 8 9 〇 一 七 三 九 二 五 八 六 四]
    • “Disjunctive” list style (eg “a, b, or c”)
    • AvailableFormats items for day periods (skeleton “Bhm” → pattern “h:mm B” → “1:30 in the afternoon”)
  • Major additions for Emoji
    • Emoji name and keyword updates for Unicode 10 and Emoji 5.0 (minor updates for English, full data collection for other languages). Keywords now in sorted order.
    • Adjustments to emoji collation
For further details and links to documentation, see the CLDR Release Notes.

ICU 60 Released

ICU LogoUnicode® ICU 60 has just been released! ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

ICU 60 upgrades to Unicode 10 and CLDR 32, and ICU4J has been tested with Java 9. ICU 60 includes a new API for number formatting. There are many more features and bug fixes.

For details please see

Tuesday, October 31, 2017

Graphemics in the 21st Century Conference

[graphematik image] The Unicode Consortium is pleased to announce a conference that may be of interest to our user community.

/gʁafematik/ 2018 is the first conference bringing together disciplines concerned with writing systems and their representation in written communication. The conference aims to reflect on the current state of research in the area, and on the role that writing and writing systems play in neighboring disciplines like computer science and information technology, communication, typography, psychology, and pedagogy. In particular it aims to study the effect of the growing importance of Unicode with regard to the future of reading and writing in human societies. Reflecting the richness of perspectives on writing systems, /gʁafematik/ is actively interdisciplinary, and welcomes proposals from researchers from the fields of computer science and information technology, linguistics, communication, pedagogy, psychology, history, and the social sciences.

/gʁafematik/ aims to create a space for the discussion of the range of approaches to writing systems, and specifically to bridge approaches in linguistics, informatics, and other fields. It will provide a forum for explorations in terminology, methodology, and theoretical approaches relating to the delineation of an emerging interdisciplinary area of research that intersects with intense activity in practical implementations of writing systems.

The conference will be held at IMT Atlantique (formerly Télécom Bretagne) at Brest, France, on June 14-16, 2018.

Topics will include:
  • Epistemology of graphemics: history, onomastics, topics, interaction with other disciplines
  • Foundations of graphemics
  • History and typology of writing systems, comparative graphemics
  • Semiotics of writing and of writing systems
  • Computational/formal graphemics
  • Graphemic theory of Unicode encoding
For more information, please consult:

Friday, October 13, 2017

New Gold Sponsor comprigo

The Unicode Consortium is pleased to announce that comprigo is now a gold sponsor for:
comprigo's  sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage.
As a product- and price comparison website, comprigo redefines the assets of online shopping by helping our customers make the best possible purchase decision. With our permanent adoption we want to make a statement and support the unique visual semantics of Unicode Consortium in a world where visual communication becomes more and more important. As a globally active software company, we want to support Unicode by not just preserving linguistic heritage, but also by enabling intercultural communication. The comprigo Moneybag is a perfect representation of what users gain from using our services: purchase the best product for the best price.  — comprigo
The Unicode Consortium thanks comprigo for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character

Monday, September 25, 2017

Proposed Draft UTR #53, Unicode Arabic Mark Ordering Algorithm Now Available for Public Review

The Unicode Consortium has released Proposed Draft Unicode Technical Report #53, Unicode Arabic Mark Ordering Algorithm. This UTR describes an algorithm for determining correct rendering of Arabic combining mark sequences.

The combining classes of Arabic combining characters in Unicode are a mixture of special classes for specific marks plus two more generalized classes for all the other marks. For many years this has resulted in inconsistent rendering for sequences with multiple combining marks such as:

The algorithm described in this UTR provides a method to reorder Arabic combining marks in order to accomplish the following goals:
  • The inside-out rendering rule will display combining marks in the expected visual order.
  • Ensure identical display of canonically equivalent sequences.
  • Provide a mechanism for overriding the display order in exceptional cases.
The document is in “Proposed Draft” state, and made available for public review and comment. Information about this type of document can be found on the About Unicode Technical Reports page.

For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the PRI #359 page.