Friday, October 9, 2015

New Unicode Pages on Emoji

Croissant Emoji ImageNew information about emoji is available on The Unicode Consortium website, including the following:

Emoji Candidates — The comprehensive list of all 67 emoji candidate emoji characters that have been accepted by the UTC (Unicode Technical Committee) as candidates, but are not yet added to the Unicode Standard.

Emoji Resources — External resources with useful information about Emoji.

In addition, the Emoji charts have been refreshed with new emoji images and two reformatted pages from UTR#51. Most of the new images are from Apple's September 2015 releases (OS X 10.11 and iOS 9.0), mainly additional flag emoji.

Emoji Recently Added — the emoji characters mostly recently added to the Unicode Standard.

Emoji ZWJ Sequences — a catalog of emoji zwj sequences that are supported on at least one commonly available platform.

Media Articles on Emoji has also been updated.

The UTC will be meeting the first week of November, and on the agenda will be additional emoji recommendations from the Emoji Subcommittee.

Monday, October 5, 2015

Proposed Update UAX #31, Unicode Identifier and Pattern Syntax

Hash DonutUnicode Standard Annex #31, Unicode Identifier and Pattern Syntax, will be updated for Unicode 9.0. The proposed update is now available for general public review and comment.

A major change in the proposed update is the addition of a new section with recommended syntax for Unicode hashtags, also including emoji characters.

The draft also makes it clearer that XID_Start/Continue properties are preferred over ID_Start/Continue, and modifies the syntax of the definition to customization cleaner, and allow for medial-only characters in identifiers.

Friday, October 2, 2015

EmojiXpress Joins the Unicode Consortium

The Unicode Consortium is pleased to announce that EmojiXpress has joined as a Supporting member. EmojiXpress is one of the most popular iOS Emoji keyboards worldwide, focused on providing the best Emoji and Sticker messaging experience.

EmojiXpress is looking forward to contributing their data and user feedback to Emoji related discussions. By joining the Unicode Consortium, EmojiXpress is demonstrating the importance of supporting the world’s languages on mobile communication devices, and joining other members to craft common solutions.

We look forward to their contributions to Unicode projects, and are grateful for their financial support of the consortium’s work. Supporting members of the consortium have a half vote in all technical committees. See the complete list of members.

Wednesday, September 30, 2015

UAX #29, Unicode Text Segmentation, update to improve Mongolian word segmentation

Mongolian wordUnicode Standard Annex #29, Unicode Text Segmentation, will be updated for Unicode 9.0. A draft of the proposed update is available for general public review and comment.

The Word_Break classification of U+202F NARROW NO-BREAK SPACE (NNBSP) is revised to correct the text segmentation behavior of U+202F for Mongolian usage. For further background on this issue and possible ways to address it, see PRI #308, Property Change for U+202F NARROW NO-BREAK SPACE (NNBSP).

In this revision, the formerly empty Prepend class of the Grapheme_Cluster_Break property is redefined to consist of all prefixed format control characters and a few other characters with certain Indic_Syllabic_Category property values.

The corresponding property value changes will be incorporated in the UCD data files for Unicode 9.0.

Thursday, September 17, 2015

CLDR Version 28 Released

CLDR 28 CoverageUnicode CLDR 28 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.
  • General locale data. Overall, about 5% of the data items in this release are new (see Growth), while about 8% have corrections. Notable changes include a major review of and improvement to Spanish locales for Latin America; the addition of two new “modern-coverage” locales (Belarusian and Irish); and moving certain data from en_GB to en_001 for improved quality and reduced data size in locales that use en_GB conventions.
  • Formatting. There are a number of new units and types of formats, with a major revision to the day period rules—preferred for many languages instead of AM/PM (“10:30 at night”)—with localizations; the addition of compact formatting for currencies (“€10M”, “€10 million”), and the addition of more unit measures, including 7 new general units (duration-century), 21 new per-unit types, 4 new units for measuring personal age (needed for some languages), and new coordinate units for formatting latitude and longitude across languages (“10°N”).
  • Identifiers. The new features extend the ability to specify subregions of countries, validate identifiers, and customize locales, including the addition of subdivisions of countries, such as Scotland and California (localized names are not yet present, except for English); the addition of validity data for currency codes, measurement units, and locale identifier elements (allowing validation of Unicode language and locale identifiers without requiring BCP47 data); the addition of seven -u- extension keys and corresponding types to allow customization of locales (“cf” for specifying standard vs accounting currency formats), and the clarification of the specification of identifiers, especially for validity testing.
The specification and charts have also been updated.