Monday, December 21, 2015

Unicode Updates Emoji Charts and Expands List of Candidates

U+1F381 WRAPPED PRESENT ImageThe Unicode Consortium announced today it is updating the Unicode emoji charts with the following major changes:

The Full Emoji Data chart adds the latest updates for Google, Twitter, Windows, and EmojiOne emoji images. It also now includes all the emoji sequences, for a total of 1,624 emoji.

The Emoji Candidates chart is updated to add the 7 characters recently approved as candidates, for a total of 74 emoji candidates. There are refreshed images courtesy of Adobe, and EmojiXpress.

The Emoji Ordering chart now shows the diverse family emoji, and adds subcategories to make the organization clearer.

And the Emoji Style chart has been extended to list the 1,624 emoji characters and sequences as text with variation selectors and fonts, for testing with browsers.

Wednesday, December 16, 2015

Unicode Launches Adopt-a-Character Campaign to Support the World’s “Digitally Disadvantaged” Living Languages

Non-profit consortium invites public to adopt any emoji, letter or symbol as fun, meaningful gifts that fund research and coding needed to support minority languages

U+1F381 WRAPPED PRESENT ImageMOUNTAIN VIEW, Calif.—(BUSINESS WIRE)—Unicode Consortium, the 501(c)(3) non-profit that standardizes the way computers represent text in all languages – including emoji characters – today announced its Adopt-a-Character campaign. The new program is an opportunity to adopt and dedicate an emoji, letter or any symbol on the keyboard to help Unicode’s important work of supporting the world’s languages in digital form. Adoption options are available at $100, $1,000 and $5,000 levels and make meaningful and fun gifts for the holidays or any occasion. Adoption donations are tax deductible in the U.S.

Funds raised will be used to support Unicode’s core mission of developing and extending the necessary standards, data and software to support the world’s living languages. Unicode works with linguists, experts, cultural leaders and technologists to create coding standards to support minority languages in digital form.

“Beyond our work standardizing emoji, Unicode is tackling some big challenges that might surprise many people,” said Mark Davis, co-founder and president of the Unicode Consortium and an internationalization expert at Google. “The vast majority of the world’s living languages, close to 98 percent, are ‘digitally disadvantaged’ – meaning they are not supported on the most popular devices, operating systems, browsers and mobile applications. For example, only a handful of African languages have adequate digital support. The funds from our new Adopt-a-Character campaign will help us continue the important standardization work that is best done by a neutral organization like Unicode.”

Ensuring Digital Vitality, from Cherokee to N’Ko

So far, Unicode’s resources have been focused on the most-prominent scripts and languages of the world. Gathering information for less-prominent scripts and languages – such as Berber, Balinese, Cherokee, Javanese, N’Ko, Pahawh Hmong and Kashmiri – is often more difficult, requiring travel, research, engineering resources and software tooling.

Just 15 years ago, Cherokee was not available digitally and now as a result of Unicode’s work it can be found on computers, mobile devices such as the iPhone and iPad, and on Gmail. Because of Unicode’s work standardizing N’Ko – a script used to write a number of the West African Mande languages, with a population of over 20 million people – publishers are now able to modernize their operations, print in multiple locations and reach a broader audience.

“The Internet has made us all more acutely aware of how small our world is and how rich the creations of its inhabitants are,” said Greg Welch, a Unicode board member and Senior Director, Strategic Marketing, Mobile Client Platforms at Intel. “As we become a more connected and paperless global society, we cannot leave minority and digitally disadvantaged languages behind. It’s vital to ensure that the text on which a culture’s propagation depends makes it across the digital divide.”

How to Adopt-a-Character

More information about Adopt-a-Character can be found at

About Unicode Consortium

The Unicode Consortium’s mission is to lay a solid foundation for digital support of the world’s languages. If you've used any computer or smartphone, then you're using Unicode and have benefited from the consortium’s work. The consortium – whose members include companies such as Adobe, Apple, Facebook, Google, IBM, Microsoft and more – is a 501(c)(3) non-profit that emerged from the technology industry’s effort to standardize the way computers represent text (including emoji) in all languages – from English to Chinese to Zulu – across different devices and operating systems. The group operates largely as a volunteer organization that is funded by membership fees and donations. A full list of members is on

Monday, November 30, 2015

Unicode Cherokee Chart Font Updated

Phoreus Chart ImageThe Unicode Consortium has recently updated the current code charts for the Cherokee script specifically to provide improved reference glyphs for the lowercase letters introduced in Version 8.0. The new font is Phoreus Cherokee, a modern digital design by Mark Jamra of TypeCulture® LLC.

The new charts can be viewed through the current Charts page.

Wednesday, November 25, 2015

New Character Property for Prepended Concatenation Marks

Arabic ImageThe Unicode Technical Committee is seeking feedback on a proposal to define a new character property for the class of prepended concatenation marks, also referred to as prefixed format control characters or, more generically, as subtending marks. Characters in that class include U+0600 ARABIC NUMBER SIGN and U+06DD ARABIC END OF AYAH. The new property, named Prepended_Concatenation_Mark and targeted for Unicode 9.0, would provide a mechanism to handle subtending marks collectively via properties rather than by hardcoded enumeration. A detailed description of the issue and how to provide feedback are given in Public Review Issue #310.

Thursday, November 19, 2015

Wednesday, November 18, 2015

Unicode 8.0 Paperback Available

(bod image)The Unicode 8.0 core specification is now available in paperback book form.

Responding to continued interest, the editorial committee has created a pair of modestly priced print-on-demand volumes that contain the complete text of the core specification of Version 8.0 of the Unicode Standard.

This edition is the 6×9 inch US trade paperback size, making the two volumes compact. The two volumes may be purchased separately or together. The cost for the pair is US $16.49, plus postage and applicable taxes. Please visit the description page to order.

Note that these volumes do not include the Version 8.0 code charts, nor do they include the Version 8.0 Standard Annexes and Unicode Character Database, all of which are available only on the Unicode website.

Purchase The Unicode Standard, Version 8.0 - Core Specification

Tuesday, November 17, 2015

Unicode Board of Directors Election Results

(bod image)The Unicode Consortium announces the election of four Directors for a three-year term beginning January 2016:
  • Bob Jung of Google
  • Alolita Sharma of Twitter
  • Greg Welch of Intel
  • Dachuan Zhang of Microsoft
Alolita Sharma joins the Consortium Board of Directors for the first time. Bob Jung, Greg Welch, and Dachuan Zhang have been re-elected to continue their service as Directors.

Thursday, November 12, 2015

Unicode 9.0 Candidate Characters

sample_1f95e.pngThe Unicode Consortium has accepted 7 new emoji characters as candidates for Unicode 9.0, scheduled for release in mid-2016. This makes a total of 74 emoji candidates. These join thousands of non-emoji candidate characters for Unicode 9.0.

At this point, the characters for Unicode 9.0 are candidates—not yet finalized—so some may be removed from the candidate list, and others may be added. Names, images, and code points may also change, so these candidates are not yet ready for use in production systems. The additions of emoji characters to Unicode are based on the emoji selection factors. Other prospective emoji characters are still being assessed and could be approved as candidates in the future.

There is also a new version of UTR #51, Unicode Emoji, which provides design guidelines and data for improving emoji interoperability across platforms, and gives background information about emoji symbols. Aside from general clarifications in the text, several annexes are moved to separate pages to allow for faster updates, the level distinction among emoji is removed, and certain characters no longer allow for emoji modifiers for skin-tone. These changes are also reflected in new machine-readable emoji data files for implementations.

The emoji charts have also been updated. These include a full listing of emoji characters (with images from various vendors), the default ordering of emoji, annotations, when various emoji were added to Unicode, and more.

Tuesday, October 20, 2015

Proposed Update UTR #50, Unicode Vertical Text Layout

Vertical Text ImageA Proposed Update of this UTR is now available for public review and comment. The UTR is being reissued with a set of data updated to the character repertoire of Unicode Version 8.0. In this revision, four characters are added to the arrows tailoring set. For details on the proposed changes in the data, please refer to the Modifications section in the UTR.

For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the PRI #309 page.

Friday, October 9, 2015

New Unicode Pages on Emoji

Croissant Emoji ImageNew information about emoji is available on The Unicode Consortium website, including the following:

Emoji Candidates — The comprehensive list of all 67 emoji candidate emoji characters that have been accepted by the UTC (Unicode Technical Committee) as candidates, but are not yet added to the Unicode Standard.

Emoji Resources — External resources with useful information about Emoji.

In addition, the Emoji charts have been refreshed with new emoji images and two reformatted pages from UTR#51. Most of the new images are from Apple's September 2015 releases (OS X 10.11 and iOS 9.0), mainly additional flag emoji.

Emoji Recently Added — the emoji characters mostly recently added to the Unicode Standard.

Emoji ZWJ Sequences — a catalog of emoji zwj sequences that are supported on at least one commonly available platform.

Media Articles on Emoji has also been updated.

The UTC will be meeting the first week of November, and on the agenda will be additional emoji recommendations from the Emoji Subcommittee.

Monday, October 5, 2015

Proposed Update UAX #31, Unicode Identifier and Pattern Syntax

Hash DonutUnicode Standard Annex #31, Unicode Identifier and Pattern Syntax, will be updated for Unicode 9.0. The proposed update is now available for general public review and comment.

A major change in the proposed update is the addition of a new section with recommended syntax for Unicode hashtags, also including emoji characters.

The draft also makes it clearer that XID_Start/Continue properties are preferred over ID_Start/Continue, and modifies the syntax of the definition to customization cleaner, and allow for medial-only characters in identifiers.

Friday, October 2, 2015

EmojiXpress Joins the Unicode Consortium

The Unicode Consortium is pleased to announce that EmojiXpress has joined as a Supporting member. EmojiXpress is one of the most popular iOS Emoji keyboards worldwide, focused on providing the best Emoji and Sticker messaging experience.

EmojiXpress is looking forward to contributing their data and user feedback to Emoji related discussions. By joining the Unicode Consortium, EmojiXpress is demonstrating the importance of supporting the world’s languages on mobile communication devices, and joining other members to craft common solutions.

We look forward to their contributions to Unicode projects, and are grateful for their financial support of the consortium’s work. Supporting members of the consortium have a half vote in all technical committees. See the complete list of members.

Wednesday, September 30, 2015

UAX #29, Unicode Text Segmentation, update to improve Mongolian word segmentation

Mongolian wordUnicode Standard Annex #29, Unicode Text Segmentation, will be updated for Unicode 9.0. A draft of the proposed update is available for general public review and comment.

The Word_Break classification of U+202F NARROW NO-BREAK SPACE (NNBSP) is revised to correct the text segmentation behavior of U+202F for Mongolian usage. For further background on this issue and possible ways to address it, see PRI #308, Property Change for U+202F NARROW NO-BREAK SPACE (NNBSP).

In this revision, the formerly empty Prepend class of the Grapheme_Cluster_Break property is redefined to consist of all prefixed format control characters and a few other characters with certain Indic_Syllabic_Category property values.

The corresponding property value changes will be incorporated in the UCD data files for Unicode 9.0.

Thursday, September 17, 2015

CLDR Version 28 Released

CLDR 28 CoverageUnicode CLDR 28 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.
  • General locale data. Overall, about 5% of the data items in this release are new (see Growth), while about 8% have corrections. Notable changes include a major review of and improvement to Spanish locales for Latin America; the addition of two new “modern-coverage” locales (Belarusian and Irish); and moving certain data from en_GB to en_001 for improved quality and reduced data size in locales that use en_GB conventions.
  • Formatting. There are a number of new units and types of formats, with a major revision to the day period rules—preferred for many languages instead of AM/PM (“10:30 at night”)—with localizations; the addition of compact formatting for currencies (“€10M”, “€10 million”), and the addition of more unit measures, including 7 new general units (duration-century), 21 new per-unit types, 4 new units for measuring personal age (needed for some languages), and new coordinate units for formatting latitude and longitude across languages (“10°N”).
  • Identifiers. The new features extend the ability to specify subregions of countries, validate identifiers, and customize locales, including the addition of subdivisions of countries, such as Scotland and California (localized names are not yet present, except for English); the addition of validity data for currency codes, measurement units, and locale identifier elements (allowing validation of Unicode language and locale identifiers without requiring BCP47 data); the addition of seven -u- extension keys and corresponding types to allow customization of locales (“cf” for specifying standard vs accounting currency formats), and the clarification of the specification of identifiers, especially for validity testing.
The specification and charts have also been updated.

Tuesday, September 15, 2015

Facebook Joins as Full Member of the Unicode Consortium

The Unicode Consortium is pleased to announce that Facebook has joined as a full member.

Founded in 2004, Facebook’s mission is to give people the power to share and make the world more open and connected. People use Facebook to stay connected with friends and family, to discover what’s going on in the world, and to share and express what matters to them.

We look forward to their contributions to Unicode projects and are grateful for their financial support of the consortium’s work. Full members of the consortium have a vote in all technical committees, and in the governance of the consortium. See the complete list of members.

Monday, September 14, 2015

Emoji One Joins the Unicode Consortium

The Unicode Consortium is pleased to announce that Emoji One has joined as a supporting member. Emoji One is a small, independent group of emoji developers providing an open source emoji set for digital and non-digital use worldwide.

Emoji One is very motivated to support emoji standards, creativity, and innovation to the best of their abilities. Rick Moby, Founder, has said, “We’re honored to be welcomed and included with this unique group of individuals responsible for the emoji and internationalization standards that are so vital to the community.” For more, see Emoji One’s announcement.

We look forward to their contributions to Unicode projects, and are grateful for their financial support of the consortium’s work. Supporting members of the consortium have a half vote in all technical committees. See the complete list of members.

Monday, August 31, 2015

Henry Luce Foundation Grant to the Unicode Consortium

The Henry Luce Foundation has made a grant to the Unicode Consortium in support of three meetings between Unicode specialists, experts, and user communities in Mongolia and China. The meetings, which will take place from 2015 to 2017, will discuss encoding issues relating to specific scripts in the region, such as Mongolian Square and Soyombo. The goal of the meetings is to move the scripts forward in the encoding process, so scholars and the relevant user communities will eventually be able to create, send, and search materials in these scripts electronically. The project is headed by Dr. Deborah Anderson, Technical Director of the Consortium, and Project Leader of the UC Berkeley Script Encoding Initiative.

For information about previous grant support by the Henry Luce Foundation to the Unicode Consortium, see Foundation Grants.

Thursday, August 20, 2015

Unicode Version 8.0 - Complete Text of the Core Specification Published

The core specification for Version 8.0 of the Unicode Standard is now available, containing significant updates and improvements:
  • A rewritten description of casing to account for the addition of a set of lowercase Cherokee syllables
  • A substantial revision to the documentation on emoji symbols, including descriptions of the new symbol modifiers for implementing skin tone diversity
  • An update to New Tai Lue to describe the change of model from logical to visual
  • Descriptions for five new scripts and Sutton SignWriting
  • Improvements to existing script descriptions, including Bengali, Devanagari, Malayalam, and to the description of tag characters.
In Version 8.0, the standard grew by 7,716 characters. This version continues the Unicode Consortium’s long-standing commitment to support the full diversity of languages around the world by adding new scripts and other characters that support additional languages of Africa and India, such as Ik, Kulango, and Tai Ahom. The text of the latest version also documents the newly adopted Georgian lari currency symbol.

All other components of Unicode 8.0 were released on June 17, 2015 to allow vendors to update their implementations of Unicode 8.0 as early as possible. These components include the Unicode Standard Annexes, code charts, and the Unicode Character Database. The publication of the core specification completes the definitive documentation of the Unicode Standard, Version 8.0. A print-on-demand (POD) version for Unicode 8.0 is planned for later publication.

For more information, see Unicode 8.0.0.

Wednesday, August 19, 2015

Keynote Speaker Announced for IUC 39

Babel Rousers: The 900 Year Quest to Build a Better Language

Arika Okrent
Linguist and Author

After a Monday full of tutorials for new attendees and those requiring a refresher, join us Tuesday morning for a keynote presentation by Arika Okrent, linguist and author of In the Land of Invented Languages. Arika will be illustrating the history of approaches to language invention, both ingenious and foolhardy, by looking at particular words from these languages.

About IUC 39, October 26-28, 2015: The Internationalization and Unicode® Conference (IUC) is the premier event covering the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world. Read more.

Thursday, July 23, 2015

Feedback on repertoire additions for ISO/IEC 10646 4th and 5th editions

The Unicode Technical Committee is soliciting feedback on pending additions to the draft repertoire of characters, to help discover any errors in character names, incorrect glyphs, or other problems. There is a short window of opportunity to review and comment on the repertoire additions in two documents.

The Unicode Standard is developed in synchrony with ISO/IEC 10646. After ISO balloting is completed on any repertoire additions, no further changes or corrections will be possible. (See for additional information on the stages in ISO standards development.) Advance feedback on these repertoire additions will help inform the UTC discussions about its own contribution to the ISO balloting process.

Please see the individual Public Review Issue pages for further details:

Tuesday, June 30, 2015

SwiftKey joins the Unicode Consortium

The Unicode® Consortium is pleased to announce that SwiftKey is joining the Unicode Consortium as an associate member. We look forward to their contributions to the Unicode Standard and other consortium work, which will involve helping to make data-driven decisions about which emoji ultimately make it to people's phones and other devices. As part of this decision-making process, SwiftKey will be providing the Consortium with aggregate, anonymized emoji usage data from its SwiftKey Cloud services.

For the full list of Unicode Consortium members, see

SwiftKey Keyboard is the keyboard app for iPhone and Android known for learning and predicting favorite words, phrases, and emoji. Founded in London in 2008, SwiftKey’s technology is now found on more than 250M devices worldwide.

For more, see SwiftKey's announcement on joining the Unicode Consortium.

Representing Additional Types of Flags

The UTC is considering a proposal to extend the types of flags which can be reliably represented by certain sequences of Unicode characters. In addition to the current mechanism using pairs of regional indicator symbols—already widely implemented—the proposal would use sequences of the TAG characters in the range U+E0030..U+E005A to represent other types of flags. The proposal also provides guidelines to specify valid sequences of TAG characters and how to interpret them. Full details of the proposal are provided in the background document.

The UTC welcomes feedback on this proposed new mechanism. Feedback could consist of an indication of support or opposition to the proposal, with reasons why, or could consist of suggestions for improvement of the proposal.

For further information, please see the Public Review Issues page.

Wednesday, June 17, 2015

Announcing The Unicode® Standard, Version 8.0

Version 8.0 of the Unicode Standard is now available. It includes 41 new emoji characters (including five modifiers for diversity), 5,771 new ideographs for Chinese, Japanese, and Korean, the new Georgian lari currency symbol, and 86 lowercase Cherokee syllables. It also adds letters to existing scripts to support Arwi (the Tamil language written in the Arabic script), the Ik language in Uganda, Kulango in the Côte d’Ivoire, and other languages of Africa. In total, this version adds 7,716 new characters and six new scripts.

The first version of Unicode Technical Report #51, Unicode Emoji is being released at the same time. That document describes the new emoji characters. It provides design guidelines and data for improving emoji interoperability across platforms, gives background information about emoji symbols, and describes how they are selected for inclusion in the Unicode Standard. The data is used to support emoji characters in implementations, specifying which symbols are commonly displayed as emoji, how the new skin-tone modifiers work, and how composite emoji can be formed with joiners. The Unicode website now supplies charts of emoji characters, showing vendor variations and providing other useful information.

The 41 new emoji in Unicode 8.0 include the following:

five emoji modifiers
Faces and Hands

(For the full list, including images, see emoji additions for Unicode 8.0.)

Phones and computers often need operating system updates to support new emoji, which may take some time. It is also now clear which existing characters, such as the often requested SHOPPING BAGS, can be used as emoji. Once phones and computers support these characters, people will be able to see colorful images such as the BOTTLE WITH POPPING CORK above.

Three other important Unicode specifications are updated for Version 8.0:
Some of the changes in Version 8.0 and associated Unicode technical standards may require modifications in implementations. For more information, see Unicode 8.0 Migration and the migration sections of UTS #10, UTS #39, and UTS #46. For full details on Version 8.0, see Unicode 8.0.

Monday, June 1, 2015

Join us in Santa Clara for IUC 39 (October 26-28)

IUC39The conference program has just been announced for this year's Internationalization and Unicode® Conference (IUC), October 26-28 in Santa Clara, California.

This is the premier annual event covering the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. The program focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.

Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

This highly rated conference features excellent technical content, industry-tested recommendations and updates on the latest standards and technology. Subject areas include web globalization, programming practices, endangered languages and unencoded scripts, integrating with social networking software, implementing mobile apps, and handling emoji. This year's conference will also highlight new features in Unicode and other relevant standards.

In addition, please join us in welcoming over 20 first-time speakers to the program! This is just another reason to attend; fresh talks, fresh faces, and fresh ideas!  

Friday, May 22, 2015

Unicode 9.0 Candidate Emoji

The Unicode Consortium has accepted 38 emoji characters as candidates for Unicode 9.0, scheduled for release in mid-2016. At this point, these emoji are candidates—not yet finalized—so some may be removed from the candidate list, and others may be added. Names, images, and code points may also change, so these candidates are not yet ready for use in production systems.

These emoji have been accepted as candidates for Unicode 9.0 for a variety of reasons. They may be needed for compatibility with emoji characters in existing systems. For example, the FACE WITH COWBOY HAT was accepted for compatibility with the emoji used in Yahoo Messenger. Some are chosen based on expected high frequency of use or because they are highly popular requests from online communities. Others fill gaps in the existing set of Unicode emoji, as by completing a gender pair.

Many other prospective emoji characters are still being assessed and could be approved in the future. For more information about selection criteria, see Selection Factors in UTR #51, Unicode Emoji.

The images shown below are draft black and white versions for the Unicode 9.0 charts. Once the emoji candidates have been finalized, vendors that support emoji will provide colorful and better-designed displays for each of these. For example, the emoji for shrug might appear as shown on the right.

Some of these new emoji would take the new emoji modifiers as discussed in Diversity. Some emoji may also get annotations to help guide design and usage. For example, the cucumber emoji could also be used to represent a pickle.

Candidate Unicode Name