The Unicode Blog: September 2015

Wednesday, September 30, 2015

UAX #29, Unicode Text Segmentation, update to improve Mongolian word segmentation

Unicode Standard Annex #29, Unicode Text Segmentation, will be updated for Unicode 9.0. A draft of the proposed update is available for general public review and comment.

The Word_Break classification of U+202F NARROW NO-BREAK SPACE (NNBSP) is revised to correct the text segmentation behavior of U+202F for Mongolian usage. For further background on this issue and possible ways to address it, see PRI #308, Property Change for U+202F NARROW NO-BREAK SPACE (NNBSP).

In this revision, the formerly empty Prepend class of the Grapheme_Cluster_Break property is redefined to consist of all prefixed format control characters and a few other characters with certain Indic_Syllabic_Category property values.

The corresponding property value changes will be incorporated in the UCD data files for Unicode 9.0.

Thursday, September 17, 2015

CLDR Version 28 Released

Unicode CLDR 28 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.

General locale data. Overall, about 5% of the data items in this release are new (see Growth), while about 8% have corrections. Notable changes include a major review of and improvement to Spanish locales for Latin America; the addition of two new “modern-coverage” locales (Belarusian and Irish); and moving certain data from en_GB to en_001 for improved quality and reduced data size in locales that use en_GB conventions.
Formatting. There are a number of new units and types of formats, with a major revision to the day period rules—preferred for many languages instead of AM/PM (“10:30 at night”)—with localizations; the addition of compact formatting for currencies (“€10M”, “€10 million”), and the addition of more unit measures, including 7 new general units (duration-century), 21 new per-unit types, 4 new units for measuring personal age (needed for some languages), and new coordinate units for formatting latitude and longitude across languages (“10°N”).
Identifiers. The new features extend the ability to specify subregions of countries, validate identifiers, and customize locales, including the addition of subdivisions of countries, such as Scotland and California (localized names are not yet present, except for English); the addition of validity data for currency codes, measurement units, and locale identifier elements (allowing validation of Unicode language and locale identifiers without requiring BCP47 data); the addition of seven -u- extension keys and corresponding types to allow customization of locales (“cf” for specifying standard vs accounting currency formats), and the clarification of the specification of identifiers, especially for validity testing.

The specification and charts have also been updated.

Tuesday, September 15, 2015

Facebook Joins as Full Member of the Unicode Consortium

The Unicode Consortium is pleased to announce that Facebook has joined as a full member.

Founded in 2004, Facebook’s mission is to give people the power to share and make the world more open and connected. People use Facebook to stay connected with friends and family, to discover what’s going on in the world, and to share and express what matters to them.

We look forward to their contributions to Unicode projects and are grateful for their financial support of the consortium’s work. Full members of the consortium have a vote in all technical committees, and in the governance of the consortium. See the complete list of members.

Monday, September 14, 2015

Emoji One Joins the Unicode Consortium

The Unicode Consortium is pleased to announce that Emoji One has joined as a supporting member. Emoji One is a small, independent group of emoji developers providing an open source emoji set for digital and non-digital use worldwide.

Emoji One is very motivated to support emoji standards, creativity, and innovation to the best of their abilities. Rick Moby, Founder, has said, “We’re honored to be welcomed and included with this unique group of individuals responsible for the emoji and internationalization standards that are so vital to the community.” For more, see Emoji One’s announcement.

We look forward to their contributions to Unicode projects, and are grateful for their financial support of the consortium’s work. Supporting members of the consortium have a half vote in all technical committees. See the complete list of members.