The Unicode Blog: 2016

Wednesday, December 14, 2016

Adopt-A-Character Grant to Support Indic Scripts

The Adopt-a-Character program has awarded a grant to support further development of the following four Indic scripts in the Unicode Standard:

Hanifi Rohingya, a script in current use in Myanmar and Bangladesh
Nandinagari, a Brahmi-based historic script formerly used in South India
Old Sogdian, a group of historic scripts formerly used in Kazakhstan, Pakistan, and Western China
Sogdian, derived from Old Sogdian, a group of historic scripts formerly used in Central Asia

The goal of this grant is to enable the development of encoding proposals that can be included in the Unicode Standard. The work will be done by Anshuman Pandey under the direction of Deborah Anderson (SEI, UC Berkeley) and Rick McGowan (Unicode Consortium).

Friday, December 9, 2016

Proposed Update UTR #51, Unicode Emoji (Version 5.0)

A proposed update of UTR #51, Unicode Emoji (Version 5.0) is available for public review and feedback. This new version adds a mechanism to support regional flags, such as Scotland or California, though the choice of which of these flags to support is left to vendors.

Associated charts are available at http://www.unicode.org/emoji/charts-beta/index.html, and associated data files are available at http://www.unicode.org/Public/emoji/5.0/. This proposed update also has a separate data file for the valid emoji presentation sequences, and reflects a small change in the ordering of SELFIE. The charts also add the newest Apple and Facebook emoji.

At this time, the proposed update does not add any additional recommended emoji zwj sequences, nor reclassify any existing Unicode 9.0 characters as emoji. There are proposals for doing so that will be reviewed in the next Unicode Technical Committee meeting.

The review period for the proposed update ends on January 16, 2017. For further information and instructions on how to provide feedback, please see Public Review Issue #343.

This holiday season you can give a unique gift by adopting any emoji, letter, or symbol — and help support the Unicode Consortium’s mission to enable all languages to be used on computers. You can now adopt Unicode 9.0 characters and the Emoji 4.0 emoji sequences (such as woman astronaut or rockstar). See the Adopt-a-Character Page.

Thursday, December 1, 2016

Support Unicode with an Adopt-a-Character Gift this Holiday Season!

This holiday season you can give a unique gift by adopting any emoji, letter, or symbol — and help support the Unicode Consortium’s mission to enable all languages to be used on computers. Three levels of sponsorship are available, starting at $100. With over 128,000 characters to choose from, you are certain to find an appropriate character, for even the most demanding recipient. All sponsors will receive a custom digital badge featuring the adopted character for use on the web and elsewhere. Sponsors at the two highest levels will receive a special thank-you gift engraved with the name you supply and the adopted character.

The program funds work on “digitally disadvantaged” languages, both modern and historic. In 2016 the program awarded a grant to support work on a proposal for the Hanifi Rohingya script. The program has also funded work on Egyptian hieroglyphs and Mayan hieroglyphs.

In its first year, the Adopt-a-Character program has had nearly 400 sponsors. Be part of the next wave, with a worthwhile gift!

For more information on the program, or to adopt a character, see the Adopt-a-Character Page.

Monday, November 28, 2016

113 New Unicode Emoji (plus skin tones)

113 new emoji are now available in UTR #51 Unicode Emoji, Version 4.0. The main focus of this 4.0 release is further enhancing gender representation and professions. These new emoji are already appearing on smart phones and other devices and platforms that support emoji. See the full list in Emoji Recently Added.

The new emoji will soon be available for adoption, helping fund projects to improve language support.

Unlike the 72 emoji characters added to Unicode 9.0 in June, these are not new Unicode characters. Most of these new emoji are sequences of existing emoji, “glued together” with a special invisible character so that they appear and behave like a single character. This glue character is called a ZWJ, pronounced “zwidge” or /zwɪdʒ/. Three existing Unicode 9.0 characters (gender and medical symbols) were changed to qualify as emoji, for use in those ZWJ sequences.

Two of the new sequences are flags, 10 are family groupings (such as mother with daughter), 32 are new professions/roles (such as man or woman astronaut), and 66 are explicit-gendered variants (such as man or woman running). 99 of these sequences, plus 5 other characters (such as snowboarder), can also now have the 5 skin tone modifiers.

The technical documentation has also been updated, with additional guidelines for implementers and the new versions of the emoji data files for use in programs.

Wednesday, November 16, 2016

Proposed Update UTS #37, Unicode Ideographic Variation Database

The Unicode Consortium has posted a new issue for public review and comment.

UTS #37, Unicode Ideographic Variation Database, is being updated to broaden the scope of base character, from characters with the Unified_Ideograph property to characters with the Ideographic property, excluding characters that canonically or compatibly decompose. The substantive changes can be found in Section 2, Description. This proposed update is currently under review with a closing date of 2017-01-16. For more information, please see Public Review Issue #337.

Monday, October 24, 2016

ICU 58 Released

Unicode® ICU version 58 has just been released! ICU is the main avenue for many software products and libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

ICU 58 provides full support for the recent Unicode 9.0 release with 7,500 new characters and many property improvements. It covers the Unicode 9.0 emoji characters — plus the latest draft version of Emoji 4.0 — for a total of 2,444 emoji characters and sequences, including the new ZWJ sequences for gendered professions; ICU word & line breaking is updated for Emoji 4.0. ICU 58 incorporates the latest version 30 of Unicode CLDR locale data with a significant increase in data coverage.

There are a number of new APIs, including ones for measurement system unit display names (such as “acre” or “Hektar” in 80 languages), and improvements in performance and robustness. For Java, the unit tests are converted to JUnit, for easier and faster integration into test suites.

For details please see http://site.icu-project.org/download/58

Wednesday, October 5, 2016

CLDR Version 30 Released

Unicode CLDR 30 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.

Unicode support is updated to 9.0, including updated Unihan readings for the pinyin collation and Han-Latin transforms, and support for new script codes and number systems.
The set of language codes for translation has been updated, with a significant increase in the total number of translated language names.
Substantial new data has been added for likely subtags (e.g., to get the main script for each language).
New data items have been added to support relative times such as “3 Fridays ago” or “this hour”.
New draft format and preference structure has been added to support week designations such as “the week of August 10” or “week 3 of March”.
New <characterlabels> data can be used to generate labels for groups of related characters in character pickers.
The structure for emoji annotations has been revised, and the data has been significantly updated. The emoji collation has been updated, and data is added for improved segmentation behavior. Added a specification for synthesizing ZWJ sequence names.
The CLDR 30 Survey Tool data collection resulted in a net increase in data items of about 9.2%, with an additional 5.9% of items changed.

For further details and links to documentation, see the CLDR Release Notes

Wednesday, September 21, 2016

Emoji Deadline

Reminder: Emoji proposals must be submitted by October 1 to be considered for Unicode 10 (2017). See Process and Timeline.

Also, see the latest emoji charts. Both the v3.0 and the v4.0 beta have been regenerated with updated images, and with updated sorting order, short names, and keywords (annotations) from the alpha Unicode CLDR v30 release.

Tuesday, September 20, 2016

Unicode 9.0 Paperback Available

The Unicode 9.0 core specification is now available in paperback book form with a new, original cover design. This edition consists of a pair of modestly priced print-on-demand volumes containing the complete text of the core specification of Version 9.0 of the Unicode Standard.

Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. The cost for the pair is US $16.75, plus postage and applicable taxes. Please visit the description page to order.

Note that these volumes do not include the Version 9.0 code charts, nor do they include the Version 9.0 Standard Annexes and Unicode Character Database, which are freely available on the Unicode website.

Purchase The Unicode Standard, Version 9.0 - Core Specification

Tuesday, September 13, 2016

New FAQ on Myanmar Scripts and Languages

A new FAQ on Myanmar Scripts and Languages has been posted on the Unicode website. This FAQ discusses the use of the Myanmar script in Unicode, and covers the challenges of encoding, display, and interoperating with existing non-Unicode encodings such as Zawgyi.

http://www.unicode.org/faq/myanmar.html

Wednesday, August 31, 2016

Keynote Speaker Announced for IUC 40

My Life as a Higher Level Protocol

John Hudson

After sitting in on a full day of in depth tutorials, join us Wednesday morning as we kick off our 25th year with a keynote presentation by John Hudson, Co-Founder, Tiro Typeworks. John has spent two decades working at the messy interface between text encoding and typography, much of it making fonts for complex scripts. In his keynote presentation, he reflects on some of the messiest aspects of this work, and why, after twenty years, he's convinced that a holistic overview of text is necessary.

About IUC 40, November 1-3, 2016: For twenty-five years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. Please join us for our 40th conference! This year's event is being held on November 1-3, 2016 in Santa Clara, California. Read more.

Thursday, August 11, 2016

Proposed Update UTR #51, Unicode Emoji (Version 4.0)

A proposed update of UTR #51, Unicode Emoji (Version 4.0) is available for public review and feedback. This new version covers a total of 2,243 emoji, an increase from the 1,788 in Version 3.0.

There are several important changes in the proposed update. Three existing symbols have been newly classified as emoji: U+2640 FEMALE SIGN, U+2642 MALE SIGN, and U+2695 STAFF OF AESCULAPIUS. These are used in sequences to represent additional professions and to make gender distinctions among emoji. Many new emoji zwj sequences are cataloged, including professions and roles, gender distinctions, and new family groupings. Two new flag emoji have been added, one as an emoji zwj sequence and one as a regional indicator pair. Ten additional emoji characters are newly classified as emoji modifier bases. This results in 50 new emoji modifier sequences, displaying skin tone diversity. For example, see the emoji data for U+1F93C WRESTLERS.

Associated charts are available at http://unicode.org/emoji/charts-beta/index.html, and associated data files are available at http://unicode.org/Public/emoji/4.0/.

The review period for the proposed update ends on October 24, 2016. Feedback can be submitted through the online reporting form.

Thursday, July 21, 2016

Unicode Consortium Announces Cover Design

The Unicode Consortium is pleased to announce the new design selected for the cover of the forthcoming print-on-demand publication of The Unicode Standard, Version 9.0. This is the first time the Unicode Consortium issued an open call for artists and designers to submit cover design proposals. All submitted designs were reviewed by an independent panel.

[cover art by Gabee Ayres]

The selected artwork is an original design by Gabee Ayres, a student and teaching assistant at the University of Pennsylvania with a background in fine arts, design and logic. Of her design, Ms. Ayres says, “I wanted to create a cover that reflected the technology inherent in Unicode without looking impassive or unwelcoming.”

Two runner-up designs by Jiachen Hu and Laura von Husen were also selected. Jiachen Hu is a computer science student at the University of California, Berkeley. Laura von Husen earned a Master’s degree in graphic design and illustration, and currently lives in Hamburg, Germany.

Jiachen Hu:
[Jiachen Hu]

Laura von Husen:
[Laura von Husen]

Tuesday, July 19, 2016

Unicode Version 9.0 - Complete Text of the Core Specification Published

The core specification for Version 9.0 of the Unicode Standard is now available, containing significant updates and improvements, including descriptions for six new scripts, 72 new emoji characters, and 19 symbols for the new 4K TV standard.

In Version 9.0, the standard added precisely 7,500 characters. This version continues the Unicode Consortium’s firm commitment to support the full diversity of languages around the world by adding support for lesser-used writing systems of additional languages, including Osage, Nepal Bhasa, Fulani, and the Bravanese dialect of Swahili. Characters are also added to support the Warsh orthography for Arabic in West Africa and for the historic Tangut script of China.

All other components of Unicode 9.0 were released on June 21, 2016 to allow vendors to update their implementations of Unicode 9.0 as early as possible. Those components include the Unicode Standard Annexes, code charts, and the Unicode Character Database. The publication of the core specification completes the definitive documentation of the Unicode Standard, Version 9.0. A print-on-demand (POD) version for Unicode 9.0 is planned for later publication, with new cover art created by Gabee Ayres.

For more information, see Unicode 9.0.0.

Wednesday, July 6, 2016

Adopt-A-Character Grant to Support Egyptian Hieroglyphs

The Adopt-a-Character program has awarded a grant to support further development of Egyptian hieroglyphs in the Unicode Standard. The initial grant allows a Unicode encoding expert to participate in a meeting at the University of Cambridge on Egyptian hieroglyphs. One meeting goal is to progress the representation of Unicode Egyptian hieroglyphs, including extending the repertoire. The meeting is hosted by the working group “Informatique et Egyptologie” of the International Association of Egyptologists, and will take place from 11-12 July, 2016.

Egyptian hieroglyphs date from the end of the fourth millennium BCE, and were used for more than 3,000 years. They represent a significant milestone in the world’s written legacy, capturing important literary, historical, and religious works. Egyptian hieroglyphs are studied by academics and also attract interest from the general public, young and old.

In 2009, a core set of Egyptian hieroglyphs was published in Unicode 5.2. In January 2016, three new format control characters, which will aid in the layout of Egyptian hieroglyphs, were approved by the Unicode Technical Committee. The three new format characters, as well as a large preliminary proposal for additional Egyptian hieroglyphs, will be discussed at the Cambridge meeting. The Cambridge meeting is a further step in the process of improving the support of Unicode Egyptian hieroglyphs.

Tuesday, June 21, 2016

Announcing The Unicode® Standard, Version 9.0

Version 9.0 of the Unicode Standard is now available. Version 9.0 adds exactly 7,500 characters, for a total of 128,172 characters. These additions include six new scripts and 72 new emoji characters.

The new scripts and characters in Version 9.0 add support for lesser-used languages worldwide, including:

Osage, a Native American language
Nepal Bhasa, a language of Nepal
Fulani and other African languages
The Bravanese dialect of Swahili, used in Somalia
The Warsh orthography for Arabic, used in North and West Africa
Tangut, a major historic script of China

Important symbol additions include:

19 symbols for the new 4K TV standard
72 emoji characters such as the following

Smileys & people		ROLLING ON THE FLOOR LAUGHING
Smileys & people		FACE PALM
Hand gestures		HAND WITH INDEX AND MIDDLE FINGERS CROSSED
Animals		BUTTERFLY
Food		AVOCADO
Food		SHALLOW PAN OF FOOD
Drink		CLINKING GLASSES
Travel		MOTOR SCOOTER
Sports		PERSON DOING CARTWHEEL

For the full list, see emoji additions for Unicode 9.0. For a detailed description of support for emoji characters by the Unicode Standard, see UTR #51, Unicode Emoji.

Three other important Unicode specifications have been updated for Version 9.0:

UTS #10, Unicode Collation Algorithm — sorting Unicode text
UTS #39, Unicode Security Mechanisms — reducing Unicode spoofing
UTS #46, Unicode IDNA Compatibility Processing — compatible processing of non-ASCII URLs

Some of the changes in Version 9.0 and associated Unicode technical standards and reports may require modifications in implementations. For more information, see Unicode 9.0 Migration and the migration sections of UTS #10, UTS #39, UTS #46, and UTR #51. For full details on Version 9.0, see http://unicode.org/versions/Unicode9.0.0/

Thursday, June 16, 2016

Unicode 9.0 Emoji Available for Adoption

The Unicode Consortium’s Adopt-a-Character program is an opportunity to permanently adopt and dedicate an emoji, letter or any symbol on the keyboard. The new Unicode 9.0 emoji are now available for adoption, including

(shrug),

(face palm),

(crossed fingers),

(bacon), and 68 others. The funds help the consortium’s work of supporting the world’s languages in digital form.

We welcome sponsors of the new characters to join existing sponsors like Elastic in helping to further the work of the Unicode Consortium.

The emoji charts have also been updated with these new emoji, and with new images from Messenger, EmojiOne, EmojiXpress, and others. Soon after Unicode 9.0 is released, the other new Unicode 9.0 characters will be available for adoption.

Monday, June 6, 2016

72 New Emoji Characters

The 72 new emoji characters for Unicode 9.0 are now final, and listed in Emoji Recently Added. They include 7 faces, 7 people, 7 hand gestures, 14 plants/animals, 18 food emoji, 12 sports emoji, and a few others. The corresponding documentation in UTR #51 Unicode Emoji, Version 3.0 has also been updated, with additional guidelines for implementers and the new versions of the emoji data files. These should appear on smart phones and other devices that support emoji once vendors have a chance to update them.

Four of the new emoji are added to complete gender pairs. Work has already begun on the Version 4.0 of Unicode Emoji, with a focus on further enhancing gender representation, and targeted to appear in the near future.

The new emoji characters will soon be available for adoption, helping support projects to improve language support.

Friday, June 3, 2016

Encoding the Mayan Script: your Adopt-a-Character sponsorships at work

The first grant of funds from Unicode’s Adopt-a-Character program has been awarded to UC Berkeley’s Script Encoding Initiative (SEI), for the first two phases of a project to include Mayan hieroglyphs as Unicode characters.

Thanks go to our sponsors for providing funds to support this grant. Adopting a character helps the Unicode Consortium in its goal to support the world’s languages.

Mayan hieroglyphs were used from 250 BCE until the 1500s. Mayan textual records include historical, literary, religious, and mythological information, as well as a sophisticated mathematical system on par with that of the Romans. Mayan astronomical records continue to capture the attention of astronomers today. Including Mayan hieroglyphs as Unicode characters will allow them to be used on computers around the world. See more about Mayan.

Mayan is a complex script, requiring special support in layout and presentation. The first phase is a catalog and analysis of the Dresden codex, resulting in a draft set of Unicode atomic signs and composition mechanisms needed for full Mayan text. The second phase is based on that analysis: preparation of a proposal for layout and presentation mechanisms in Unicode text, using those atomic elements. These two phases are to be completed in 2017.

Wednesday, May 18, 2016

ICU joins the Unicode Consortium

Today we are welcoming the ICU project into the Unicode Consortium.

Every smartphone and laptop uses the Unicode encoding and Unicode CLDR data for language support: from Arabic to Japanese to Zulu — and even plain English. The Unicode Consortium provides the data, but has not provided software to directly use that data, until now.

The ICU (International Components for Unicode) project has long provided software that implements the Unicode data and algorithms. ICU is a mature, very widely deployed set of C/C++ and Java software libraries, open-sourced since 1999 under the stewardship of IBM. When you see a date or number written in your language on your smartphone, for example, or a list of sorted names, the formatting and sorting are done with ICU.

There has long been a close working relationship between the various Unicode Consortium committees and the ICU team, with many people working on Unicode projects as well as ICU. That has ensured that Unicode data and algorithms can be effectively and quickly implemented.

IBM made the decision to transfer ICU to the Unicode Consortium so that ICU could benefit from the formal and open governance that the Unicode Consortium offers. “IBM has a long history in our commitment to open standards as a driver of innovation for our customers worldwide,” said Helena Chapman, IBM Globalization Executive. By moving ICU under the Unicode Consortium, it provides a cross-industry, open source collaboration that will drive greater consistency and interoperability across computing platforms to the benefit of global technology users world-wide. IBM has been an active member of the Unicode Consortium since its inception, and is pleased to see this further consolidation of foundational open source globalization standards.

The ICU team has become a new Consortium technical committee, along with the other Unicode committees. ICU will be released under the Unicode open-source license (similar to the previous license), just like the Unicode Character Database and the CLDR data. For users of ICU, we’ll try to make this transition as smooth as possible.

The Unicode Consortium and the ICU team would like to thank IBM for many years of project stewardship, as well as for major past and ongoing contributions to the project.

For more information, see http://site.icu-project.org/

Monday, May 16, 2016

PRI #326: Combined registration of the MSARG collection sequences

The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #326: A submission for the “Combined registration of the MSARG collection and of sequences in that collection” has been received by the IVD registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2016-08-12. Please see the submission page for details and instructions on how to review this issue and provide comments:

http://www.unicode.org/ivd/pri/pri326/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for CJK Unified Ideographs, which enables standardized interchange in plain text, in accordance with UTS #37, Unicode Ideographic Variation Database.

Wednesday, May 4, 2016

Not Just Emoji

Every programmer knows about Unicode. Most other people have no idea what it is, even though they use Unicode every day. Every character you type on your smartphone or laptop — and every character you read — is defined by the Unicode Consortium.

The awareness of the Unicode Consortium has grown recently, with the spread of emoji. But from the news articles, it’s easy to get the impression that emoji is the only thing we do. In reality, there are over 120,000 characters defined, and as you see below, only a small fraction of them are emoji.

For example, this June we’ll be adding 7,500 characters — and of those new characters, fewer than 1% of them are emoji. The majority of the characters are from 6 new scripts: some in modern use, and some historic.

CLDR is the other main project for the Unicode Consortium. It provides the building blocks for supporting a variety of different languages. We’ve just released CLDR v29, and are about to start data submission for v30. Especially if you are a native speaker of a “digitally disadvantaged” language, we encourage you to join the other contributors to CLDR to help with this effort.

The Unicode Consortium is a volunteer-driven 501(c)(3) non-profit organization. Some people may work on emoji, while others work on ancient scripts, or Chinese ideographs. Others work on the language support in CLDR, or other projects.

Sponsors

You can help fund the work of the consortium — even if you don’t contribute technically — by adopting your favorite character through the Adopt A Character program.

— Mark Davis, President

Friday, April 15, 2016

Call for Unicode 9.0 Cover Design Art

The Unicode Consortium is inviting artists and designers to submit cover design proposals for Version 9.0 of The Unicode Standard. This is the first time Unicode is extending this invitation.

The cover design would appear on the Unicode Standard 9.0 web page, in the print-on-demand publication, and in associated promotional literature on the Unicode website. The chosen artist will receive full credit in the colophon of the publication, and wherever else the design appears, and receive $700. The two runner-up artists will receive $150 apiece.

Everyone in the world uses Unicode every time they read or type any character on any laptop, tablet, or smart phone. This is the opportunity to be on the cover of the standard for those characters.

Please see the announcement web page for requirements and more details.

Wednesday, March 16, 2016

CLDR Version 29 Released

Unicode CLDR 29 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks. The following summarizes the main improvements in the release.

New BCP47 extension keys have been added for specifying transliteration and emoji presentation, and for customizing locales with region-specific settings. Many new transforms are provided, the rule format has been simplified, and BCP47 IDs have been added for all transforms. Region data now includes appropriate preferences for day periods such as “6:00 in the morning” and “7:00 in the evening”, and there is new structure for choosing appropriate units based on region and usage. A Cantonese locale has been added. The emoji ordering has been improved, and annotations are provided for more emoji and in more locales. The JSON-format data has been extended to include number spellout (RBNF) and script metadata.

The specification and charts have also been updated.

For further details and links to documentation, see the CLDR Release Notes

Tuesday, March 15, 2016

Be a Part of Our 40th Conference!

Call for Participation Now Open

For twenty-five years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. The 40th conference will be held this year on November 1-3, 2016 in Santa Clara, California.

Two Key Themes for This Year

Breaking All Barriers: Explore how software providers can meet the globalization challenges of supporting the burgeoning diversity of communication platforms around the world, including mobile, tablets, social media, video, and voice. Examine how online social platforms are supporting multilingual text and rich content in hundreds of languages. Often the task is not just to publish in multiple languages, but to accept input in alternative forms, analyze it for meaning and sentiment, look for patterns in big data, or automate its routing or translation. This theme also includes the latest advances in relevant standards, and emerging and historic scripts.

Trained, Tested, Trusted: Understand best practices in process and among teams reliably delivering high quality global products. Examine how developers build, test, and deploy great global products. Explore technologies for design, localization, multilingual testing, workflow management, and content management.

This is the conference where you can promote your ideas and experience working with natural languages, multicultural user interfaces, producing and supporting multinational and multilingual products, linguistic algorithms, applying internationalization across mobile and social media platforms, or advancements in relevant standards.

We welcome your proposals for papers and tutorials. View examples of content from past conferences on the IUC 40 website.

Thursday, March 10, 2016

Unicode 9.0 Beta Review

Mountain View, CA, USA – The Unicode® Consortium today announced the start of the beta review for the forthcoming Unicode 9.0.0, which is scheduled for release in June, 2016. All beta feedback must be submitted by May 2, 2016.

Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones – plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). Thus it is important to ensure a smooth transition to each new version of the Unicode Standard.

Unicode 9.0.0 comprises several additions and changes which require careful migration in implementations. These include asymmetric case mappings, numerous variation sequences, new fractional numeric values, and changes to property values, especially East_Asian_Width values. The line breaking and text segmentation algorithms handle character sequences that represent emoji as indivisible units via the addition of new property values and rules. Implementers need to modify code and check assumptions for all affected processes to support these additions and changes.

The new character repertoire includes 74 emoji symbols, 19 symbols used in Japanese TV broadcasting, and multiple additions to existing scripts. There are six new scripts, of which three are in modern use (Adlam, Osage, and Newa) and three are historic (Bhaiksuki, Marchen, and Tangut). Adlam and Osage have case pairs and require data updates for casing functions. Tangut is a large ideographic script whose addition incurred changes to the Unicode Collation Algorithm (used as the basis for sorting text in all languages).

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 2, 2016. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-9.0.0.html for more information about testing the 9.0.0 beta.

See http://unicode.org/versions/Unicode9.0.0/ for the current draft summary of Unicode 9.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emoji One, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.

Monday, February 29, 2016

Draft Unicode Emoji Enhancements

Unicode emoji characters are specified by UTR #51, Unicode Emoji and its related data files. Now available for public review and comment are a proposed update of UTR #51, plus a draft of a related new document, UTS #52, Unicode Emoji Mechanisms.

UTS #52, Unicode Emoji Mechanisms provides a new way of representing customizations of Unicode emoji characters. The first specified customizations provide for flags for subdivisions of countries (such as Scotland or California), gender variants (such as female runners or males raising a hand), hair color variants (a red-haired dancer), and directional variants (pointing a hand or bicyclist to the right). Currently this is only a draft, but feedback is being solicited on a number of topics. From users of emoji, feedback would be useful on which variants are the highest priority, and whether any characters should be added or removed to the lists of characters that qualify for each variant. From implementers, feedback is needed on whether there are any technical problems in the customization mechanism itself, and whether that mechanism is sufficiently extensible for future types of customizations.

The proposed update UTR #51, Unicode Emoji describes two new mechanisms for controlling whether emoji characters appear as text (black and white) or with a colorful rendition, and clarifies some of the previous text. There is also a proposed narrowing of the definition of the sequences used for family groupings.

Feedback must be submitted through the associated Public Review Issues by May 1 for consideration at the 2016Q2 Unicode Technical Committee meeting.

PRI #319: UTR #51, Unicode Emoji
PRI #321: UTS #52, Unicode Emoji Mechanisms

Tuesday, February 9, 2016

Unicode Candidate Emoji

The Unicode Consortium has accepted 5 new emoji characters as candidates for Unicode 10.0, scheduled for release in mid-2017. These 5 new emoji candidates are listed on the Emoji Candidates page, together with the 74 candidates for Unicode 9.0. These join thousands of non-emoji candidate characters for Unicode 10.0.

Candidate characters for Unicode are not yet finalized—so some may be removed from the candidate list, and others may be added. Names, images, and code points may also change, so these candidates are not yet ready for use in production systems. Other prospective emoji characters are still being assessed and could be approved as candidates in the future.

Proposals for new emoji characters can be submitted at Submitting Emoji Character Proposals, which also explains the selection factors used to assess new emoji proposals, the process, and the timeline.

Show your support of Unicode, and adopt a character!

Thursday, January 21, 2016

Proposal to Remove Some Hira/Kata From Script_Extensions

The Script_Extensions property values for some characters contain Hiragana, Katakana, or Bopomofo, when they should only contain Han. The Unicode Technical Committee is considering removing the Hiragana, Katakana, or Bopomofo in these cases, and would like feedback as to any that should not be changed, and any others that should be. Public Review Issue #316 contains details of a proposal to remove these items from Script_Extensions.

For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.

Thursday, January 14, 2016

Proposed Update UAX #9, Unicode Bidirectional Algorithm

A new proposed update of UAX #9, Unicode Bidirectional Algorithm for the Unicode 9.0 release is now available for public review and comment.

The table in Section 2.7, Markup and Formatting, has been updated to reflect changes to isolates in HTML5 and CSS.

For further information and instructions on how to leave feedback, please see Public Review Issue #315.

Proposed Update UAX #45, U-Source Ideographs

A new proposed update of UAX #45, U-Source Ideographs for the Unicode 9.0 release is now available for public review and comment.

Many updates and additions have been made to the USourceData.txt and the accompanying list of glyphs for all the U-Source ideographs, USourceGlyphs.pdf. For the latest versions of the source data and glyph files for review, see the versioned files posted in the Unicode 9.0 UCD data file review directory.

For further information and instructions on how to leave feedback, please see Public Review Issue #314.

Wednesday, January 13, 2016

Proposed Update UTS #39, Unicode Security Mechanisms

A new proposed update of UTS #39, Unicode Security Mechanisms is now available for public review and comment.

The proposed update of this Unicode Technical Standard includes new material for email security profiles and text about the use of Script_Extensions. The data file confusablesWholeScript.txt has been withdrawn, because in practice the process of derivation of whole script confusables depends on the particular set of characters supported by an application. The use of a data file is replaced by a logical process of deriving the whole-script confusables data based on the set of supported characters.

For further information and instructions on how to leave feedback, please see Public Review Issue #313.

Wednesday, January 6, 2016

Unicode Tutorial Workshop in Oman (Feb 14-16, 2016)

This tutorial workshop, sponsored by the Unicode Consortium and organized by the German University of Technology in Oman, is a three-day event designed to familiarize the audience with the Unicode Standard and the concepts of internationalization. It is the first ever Unicode event to be held in the Middle East.

The workshop program includes an introduction to Writing Systems & Unicode, plus presentations on Arabic Typography, web best practices, mobile internationalization, and more.

The workshop website provides full information about the event.

Tuesday, January 5, 2016

Feedback on Draft additional repertoire for ISO/IEC 10646:2016 (5th edition) CD2

The Unicode Technical Committee is soliciting feedback on pending additions to the draft repertoire of characters, to help discover any errors in character names, incorrect glyphs, or other problems. There is a short window of opportunity to review and comment on the repertoire additions noted below.

The following additional repertoire from ISO/IEC 10646:2016 (5th Edition), which is in committee ballot, is under review. See the associated repertoire in: Draft additional repertoire for ISO/IEC 10646:2016 (5th edition) CD2.

The Unicode Standard is developed in synchrony with ISO/IEC 10646. After ISO balloting is completed on any repertoire additions, no further changes or corrections will be possible. (See the FAQ Standards Developing Organizations for additional information on the stages in ISO standards development.) Advance feedback on these repertoire additions will help inform the UTC discussions about its own contribution to the ISO balloting process.

Documents referenced in the draft repertoire with numbers such as L2/15-088 are available in the UTC Document Registry.

For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.