Friday, January 10, 2020

New Unicode Working Group: Message Formatting

at time on date One of the challenges in adapting programs to work with different languages is message formatting. This is the process of formatting and inserting data values into messages in the user’s language. For example, “The package will arrive at {time} on {date}” could be translated into German as “Das Paket wird am {date} um {time} geliefert”, and the particular {time} and {date} variables would be automatically formatted for German, and inserted in the right places.

The Unicode Consortium has provided message formatting for some time via the ICU programming libraries and CLDR locale data repository. But until now we have not had a syntax for localizable message strings standardized by Unicode. Furthermore, the current ICU MessageFormat is relatively complex for existing operations, such as plural forms, and it does not scale well to other language properties, such as gender and inflections.

The Unicode CLDR Technical Committee is formalizing a new working group to develop a technical specification for message format that addresses these issues. That working group is called the Message Format Working Group and is chaired by Romulo Cintra from CaixaBank. Other participants currently represented are Amazon, Dropbox, Facebook, Google, IBM, Mozilla, OpenJSF, and Paypal.

For information on how to get involved, visit the working group’s GitHub page:

Open discussions will take place on GitHub, and written notes will be posted after every meeting.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.


Thursday, January 9, 2020

New Unicode Technical Director

Roozbeh PournaderThe Unicode Consortium would like to welcome a new Technical Director, Roozbeh Pournader.

Roozbeh Pournader has been working on internationalization, standardization, open source software, and digital typography since 1994, when he was in high school, where he also participated in scientific olympiads and received several medals, including the Gold Medal at the International Olympiad in Informatics, 1996.

He started his internationalization career by adding Persian support to Donald Knuth’s typesetting system, TeX. Later, while studying Software Engineering at Sharif University of Technology, he founded the FarsiWeb Project that introduced and evangelized internationalization, Unicode, and open source in Iran. At FarsiWeb, Roozbeh led the development of two national Iranian standards, on information interchange (ISIRI 6219) and keyboard input (ISIRI 9147), which helped transition Persian users from old character sets to Unicode. To this day, FarsiWeb alumni, trained by Roozbeh, continue to work in the internationalization field at major tech companies.

Roozbeh founded the Persian Wikipedia in 2003 and received the Unicode Bulldog award in 2009 for his contributions to Unicode and CLDR’s support for complex scripts. Since moving to the United States, he has worked as an Internationalization Engineer at HighTech Passport, Google (working on Noto fonts, bidirectional support, Android internationalization, and Google Fonts), and Facebook. He has been WhatsApp’s Internationalization Lead at Facebook from early 2018.

Roozbeh has been formally representing various organizations to the Unicode Consortium, including High Council of Informatics (2000–2008), HighTech Passport (2009–2011), Google (2011–2018), and Facebook (2018–present). He is also the Vice Chair of the Unicode Script Ad Hoc Group.

For the listing of current directors and officers of the Consortium please see Unicode Directors, Officers and Staff

Friday, November 22, 2019

Unicode solicits feedback on open emoji definition: QID Emoji

dinosaur The QID Emoji Tag Sequences (or QID emoji, for short) have been proposed to further open up the process of defining new  emoji.

The proposal is intended to provide a well-defined mechanism for implementations to support additional valid emoji that are not representable by Unicode characters or standard emoji sequences. This proposed new mechanism would allow for the interchange of emoji whose meaning is discoverable, and which should be correctly parsed by all conformant implementations (although only displayed by implementations that support it). The meaning of each of these valid emoji would be established by reference to a Wikidata QID.

The Unicode Consortium would appreciate feedback on this proposal.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.


Thursday, November 21, 2019

Call for feedback on UTS #18: Unicode Regular Expressions

Regex image Regular expressions are a powerful tool for using patterns to search and modify text. They are a key component of many programming languages, databases, and spreadsheets.

Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance levels for supporting Unicode in regular expressions. A proposed update of that specification is now available for public review and comment. The following are the main modifications in this draft:
  • Broadened the scope of properties to allow for properties of strings (as well as properties of code points).
  • Added 11 Emoji properties including RGI sets as Full Properties in Level 2.
  • Added other new properties as Full Properties in Level 2: Equivalent_Unified_Ideograph, Vertical_Orientation, Regional_Indicator, Indic_Positional_Category, Indic_Syllabic_Category.
  • Provided a draft data file with property metadata for matching and validating non-UCD properties and their values for syntax such as \p{pname=pvalue}, so that such properties can be used in the same way as UCD properties. See Annex D.
There are a number of review notes requesting feedback on these and other possible changes. In particular, the Unicode Technical Committee would appreciate feedback on the discussion of and syntax for properties of strings, and on the recommended properties to be supported at Level 2.

The review period closes on 2020-01-06. For more information on reviewing and supplying feedback, see Proposed Update UTS #18, Unicode Regular Expressions.

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

U13 beta image The beta review period for Unicode 13.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 13.0 includes a number of changes and 5,930 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 13.0, often in coordination with changes to character properties. For the first time, a CJK extension has been encoded in plane 3, the Tertiary Ideographic Plane. Four new scripts have been added in Unicode 13.0. There are also 55 additional emoji characters and many other new emoji, including the transgender flag and polar bear.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by January 6, 2020. Feedback instructions are on the beta page.

See for more information about testing the 13.0.0 beta.

See for the current draft summary of Unicode 13.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.