Friday, November 22, 2019

Unicode solicits feedback on open emoji definition: QID Emoji

dinosaur The QID Emoji Tag Sequences (or QID emoji, for short) have been proposed to further open up the process of defining new  emoji.

The proposal is intended to provide a well-defined mechanism for implementations to support additional valid emoji that are not representable by Unicode characters or standard emoji sequences. This proposed new mechanism would allow for the interchange of emoji whose meaning is discoverable, and which should be correctly parsed by all conformant implementations (although only displayed by implementations that support it). The meaning of each of these valid emoji would be established by reference to a Wikidata QID.

The Unicode Consortium would appreciate feedback on this proposal.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.


Thursday, November 21, 2019

Call for feedback on UTS #18: Unicode Regular Expressions

Regex image Regular expressions are a powerful tool for using patterns to search and modify text. They are a key component of many programming languages, databases, and spreadsheets.

Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance levels for supporting Unicode in regular expressions. A proposed update of that specification is now available for public review and comment. The following are the main modifications in this draft:
  • Broadened the scope of properties to allow for properties of strings (as well as properties of code points).
  • Added 11 Emoji properties including RGI sets as Full Properties in Level 2.
  • Added other new properties as Full Properties in Level 2: Equivalent_Unified_Ideograph, Vertical_Orientation, Regional_Indicator, Indic_Positional_Category, Indic_Syllabic_Category.
  • Provided a draft data file with property metadata for matching and validating non-UCD properties and their values for syntax such as \p{pname=pvalue}, so that such properties can be used in the same way as UCD properties. See Annex D.
There are a number of review notes requesting feedback on these and other possible changes. In particular, the Unicode Technical Committee would appreciate feedback on the discussion of and syntax for properties of strings, and on the recommended properties to be supported at Level 2.

The review period closes on 2020-01-06. For more information on reviewing and supplying feedback, see Proposed Update UTS #18, Unicode Regular Expressions.

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

U13 beta image The beta review period for Unicode 13.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 13.0 includes a number of changes and 5,930 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 13.0, often in coordination with changes to character properties. For the first time, a CJK extension has been encoded in plane 3, the Tertiary Ideographic Plane. Four new scripts have been added in Unicode 13.0. There are also 55 additional emoji characters and many other new emoji, including the transgender flag and polar bear.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by January 6, 2020. Feedback instructions are on the beta page.

See for more information about testing the 13.0.0 beta.

See for the current draft summary of Unicode 13.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.