Friday, November 22, 2019

Unicode solicits feedback on open emoji definition: QID Emoji

dinosaur The QID Emoji Tag Sequences (or QID emoji, for short) have been proposed to further open up the process of defining new  emoji.

The proposal is intended to provide a well-defined mechanism for implementations to support additional valid emoji that are not representable by Unicode characters or standard emoji sequences. This proposed new mechanism would allow for the interchange of emoji whose meaning is discoverable, and which should be correctly parsed by all conformant implementations (although only displayed by implementations that support it). The meaning of each of these valid emoji would be established by reference to a Wikidata QID.

The Unicode Consortium would appreciate feedback on this proposal.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.


Thursday, November 21, 2019

Call for feedback on UTS #18: Unicode Regular Expressions

Regex image Regular expressions are a powerful tool for using patterns to search and modify text. They are a key component of many programming languages, databases, and spreadsheets.

Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance levels for supporting Unicode in regular expressions. A proposed update of that specification is now available for public review and comment. The following are the main modifications in this draft:
  • Broadened the scope of properties to allow for properties of strings (as well as properties of code points).
  • Added 11 Emoji properties including RGI sets as Full Properties in Level 2.
  • Added other new properties as Full Properties in Level 2: Equivalent_Unified_Ideograph, Vertical_Orientation, Regional_Indicator, Indic_Positional_Category, Indic_Syllabic_Category.
  • Provided a draft data file with property metadata for matching and validating non-UCD properties and their values for syntax such as \p{pname=pvalue}, so that such properties can be used in the same way as UCD properties. See Annex D.
There are a number of review notes requesting feedback on these and other possible changes. In particular, the Unicode Technical Committee would appreciate feedback on the discussion of and syntax for properties of strings, and on the recommended properties to be supported at Level 2.

The review period closes on 2020-01-06. For more information on reviewing and supplying feedback, see Proposed Update UTS #18, Unicode Regular Expressions.

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

U13 beta image The beta review period for Unicode 13.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 13.0 includes a number of changes and 5,930 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 13.0, often in coordination with changes to character properties. For the first time, a CJK extension has been encoded in plane 3, the Tertiary Ideographic Plane. Four new scripts have been added in Unicode 13.0. There are also 55 additional emoji characters and many other new emoji, including the transgender flag and polar bear.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by January 6, 2020. Feedback instructions are on the beta page.

See for more information about testing the 13.0.0 beta.

See for the current draft summary of Unicode 13.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.


Monday, October 28, 2019

Unicode 2019 Bulldog Awards

Image of Heninger (left) and Lindenberg (right) The Unicode Consortium announces the 2019 Bulldog Award recipients: Andy Heninger and Norbert Lindenberg.

Andy Heninger is recognized for many years of contributions to the work of the Consortium, including providing crucial implementations of segmentation and regular expression support in International Components for Unicode (ICU). Prior to having these functions in ICU, support for them in Unicode implementations was very limited. Both contributions are key to robust text support. For example, correct segmentation is what keeps family emoji from splitting apart!

Norbert Lindenberg has made significant contributions over the years to internationalizing the Web and has brought deep script expertise to the Unicode Script Ad Hoc group. He has contributed to the models of many of the Unicode Standard’s complex scripts, including Thai, Myanmar, Khmer, Javanese, and Tamil. His work has been used by organizations such as Mozilla, Yahoo!, Sun Microsystems, and Apple.

For many years, both have been bulldogs for robust Unicode text support.

More details of their many contributions can be found on the Unicode Bulldog Award page.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.


Monday, October 21, 2019

Emoji 12.1 release: 168 Emoji added

Emoji 12.1 blog image Emoji 12.1, with 168 new emoji, has been released. There are 138 new gender-neutral forms, so you will soon be able to text about people without specifying their gender. Thirty new combinations of people holding hands with various skin tones were also added.

The new emoji are listed in Emoji Recently Added v12.1, along with sample images. These images are merely samples: vendors for mobile phones, PCs, and web platforms will typically design their own fonts for emoji. In particular, the Emoji Ordering v12.1 chart shows how the new emoji should be sorted within the order of existing emoji, with new emoji marked with rounded rectangles. The other Emoji Charts for Version 12.1 have been updated to show the emoji.

Initial names and search keywords are available in different languages in Unicode CLDR 36, such as health worker (doctor, nurse, ...). Those will be refined during this quarter.

emoji 12.1 image two

The new Emoji 12.1 data is available for vendors to use for their emoji fonts and code. These new emoji should start showing up on mobile phones in this quarter and next quarter. The new emoji will soon be available for adoption to help the Unicode Consortium’s work to support digitally-disadvantaged languages.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.