Monday, October 24, 2016

ICU 58 Released

ICU LogoUnicode® ICU version 58 has just been released! ICU is the main avenue for many software products and libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

ICU 58 provides full support for the recent Unicode 9.0 release with 7,500 new characters and many property improvements. It covers the Unicode 9.0 emoji characters — plus the latest draft version of Emoji 4.0 — for a total of 2,444 emoji characters and sequences, including the new ZWJ sequences for gendered professions; ICU word & line breaking is updated for Emoji 4.0. ICU 58 incorporates the latest version 30 of Unicode CLDR locale data with a significant increase in data coverage.

There are a number of new APIs, including ones for measurement system unit display names (such as “acre” or “Hektar” in 80 languages), and improvements in performance and robustness. For Java, the unit tests are converted to JUnit, for easier and faster integration into test suites.

For details please see http://site.icu-project.org/download/58

Wednesday, October 5, 2016

CLDR Version 30 Released

CLDR CoverageUnicode CLDR 30 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.

  • Unicode support is updated to 9.0, including updated Unihan readings for the pinyin collation and Han-Latin transforms, and support for new script codes and number systems.
  • The set of language codes for translation has been updated, with a significant increase in the total number of translated language names.
  • Substantial new data has been added for likely subtags (e.g., to get the main script for each language).
  • New data items have been added to support relative times such as “3 Fridays ago” or “this hour”.
  • New draft format and preference structure has been added to support week designations such as “the week of August 10” or “week 3 of March”.
  • New <characterlabels> data can be used to generate labels for groups of related characters in character pickers.
  • The structure for emoji annotations has been revised, and the data has been significantly updated. The emoji collation has been updated, and data is added for improved segmentation behavior. Added a specification for synthesizing ZWJ sequence names.
  • The CLDR 30 Survey Tool data collection resulted in a net increase in data items of about 9.2%, with an additional 5.9% of items changed.
For further details and links to documentation, see the CLDR Release Notes