Thursday, December 1, 2016

Support Unicode with an Adopt-a-Character Gift this Holiday Season!

holiday imageThis holiday season you can give a unique gift by adopting any emoji, letter, or symbol — and help support the Unicode Consortium’s mission to enable all languages to be used on computers. ​Three levels of sponsorship​ are available​, starting at $100. With over 128,000 characters to choose from, you are certain to find an appropriate character, for even the most demanding recipient. All sponsors will receive a custom digital badge featuring the adopted character for use on the web and elsewhere. Sponsors at the two highest levels will receive a special thank-you gift engraved with the name you supply and the adopted character.

The program funds work on “digitally disadvantaged” languages, both modern and historic. In 2016 the program awarded a grant to support work on a proposal for the Hanifi Rohingya script. The program has also funded work on Egyptian hieroglyphs and Mayan hieroglyphs.

In its first year, the Adopt-a-Character program has had nearly 400 sponsors. Be part of the next wave, with a worthwhile gift!

For more information on the program, or to adopt a character, see the Adopt-a-Character Page.

Monday, November 28, 2016

113 New Unicode Emoji (plus skin tones)

113 new emoji are now available in UTR #51 Unicode Emoji, Version 4.0. The main focus of this 4.0 release is further enhancing gender representation and professions. These new emoji are already appearing on smart phones and other devices and platforms that support emoji. See the full list in Emoji Recently Added.

The new emoji will soon be available for adoption, helping fund projects to improve language support.

Unlike the 72 emoji characters added to Unicode 9.0 in June, these are not new Unicode characters. Most of these new emoji are sequences of existing emoji, “glued together” with a special invisible character so that they appear and behave like a single character. This glue character is called a ZWJ, pronounced “zwidge” or /zwɪdʒ/. Three existing Unicode 9.0 characters (gender and medical symbols) were changed to qualify as emoji, for use in those ZWJ sequences.

Two of the new sequences are flags, 10 are family groupings (such as mother with daughter), 32 are new professions/roles (such as man or woman astronaut), and 66 are explicit-gendered variants (such as man or woman running). 99 of these sequences, plus 5 other characters (such as snowboarder), can also now have the 5 skin tone modifiers.

The technical documentation has also been updated, with additional guidelines for implementers and the new versions of the emoji data files for use in programs.

Wednesday, November 16, 2016

Proposed Update UTS #37, Unicode Ideographic Variation Database

The Unicode Consortium has posted a new issue for public review and comment.

UTS #37, Unicode Ideographic Variation Database, is being updated to broaden the scope of base character, from characters with the Unified_Ideograph property to characters with the Ideographic property, excluding characters that canonically or compatibly decompose. The substantive changes can be found in Section 2, Description. This proposed update is currently under review with a closing date of 2017-01-16. For more information, please see Public Review Issue #337.

Monday, October 24, 2016

ICU 58 Released

ICU LogoUnicode® ICU version 58 has just been released! ICU is the main avenue for many software products and libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

ICU 58 provides full support for the recent Unicode 9.0 release with 7,500 new characters and many property improvements. It covers the Unicode 9.0 emoji characters — plus the latest draft version of Emoji 4.0 — for a total of 2,444 emoji characters and sequences, including the new ZWJ sequences for gendered professions; ICU word & line breaking is updated for Emoji 4.0. ICU 58 incorporates the latest version 30 of Unicode CLDR locale data with a significant increase in data coverage.

There are a number of new APIs, including ones for measurement system unit display names (such as “acre” or “Hektar” in 80 languages), and improvements in performance and robustness. For Java, the unit tests are converted to JUnit, for easier and faster integration into test suites.

For details please see

Wednesday, October 5, 2016

CLDR Version 30 Released

CLDR CoverageUnicode CLDR 30 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.

  • Unicode support is updated to 9.0, including updated Unihan readings for the pinyin collation and Han-Latin transforms, and support for new script codes and number systems.
  • The set of language codes for translation has been updated, with a significant increase in the total number of translated language names.
  • Substantial new data has been added for likely subtags (e.g., to get the main script for each language).
  • New data items have been added to support relative times such as “3 Fridays ago” or “this hour”.
  • New draft format and preference structure has been added to support week designations such as “the week of August 10” or “week 3 of March”.
  • New <characterlabels> data can be used to generate labels for groups of related characters in character pickers.
  • The structure for emoji annotations has been revised, and the data has been significantly updated. The emoji collation has been updated, and data is added for improved segmentation behavior. Added a specification for synthesizing ZWJ sequence names.
  • The CLDR 30 Survey Tool data collection resulted in a net increase in data items of about 9.2%, with an additional 5.9% of items changed.
For further details and links to documentation, see the CLDR Release Notes