The Unicode Blog

Thursday, January 9, 2020

New Unicode Technical Director

The Unicode Consortium would like to welcome a new Technical Director, Roozbeh Pournader.

Roozbeh Pournader has been working on internationalization, standardization, open source software, and digital typography since 1994, when he was in high school, where he also participated in scientific olympiads and received several medals, including the Gold Medal at the International Olympiad in Informatics, 1996.

He started his internationalization career by adding Persian support to Donald Knuth’s typesetting system, TeX. Later, while studying Software Engineering at Sharif University of Technology, he founded the FarsiWeb Project that introduced and evangelized internationalization, Unicode, and open source in Iran. At FarsiWeb, Roozbeh led the development of two national Iranian standards, on information interchange (ISIRI 6219) and keyboard input (ISIRI 9147), which helped transition Persian users from old character sets to Unicode. To this day, FarsiWeb alumni, trained by Roozbeh, continue to work in the internationalization field at major tech companies.

Roozbeh founded the Persian Wikipedia in 2003 and received the Unicode Bulldog award in 2009 for his contributions to Unicode and CLDR’s support for complex scripts. Since moving to the United States, he has worked as an Internationalization Engineer at HighTech Passport, Google (working on Noto fonts, bidirectional support, Android internationalization, and Google Fonts), and Facebook. He has been WhatsApp’s Internationalization Lead at Facebook from early 2018.

Roozbeh has been formally representing various organizations to the Unicode Consortium, including High Council of Informatics (2000–2008), HighTech Passport (2009–2011), Google (2011–2018), and Facebook (2018–present). He is also the Vice Chair of the Unicode Script Ad Hoc Group.

For the listing of current directors and officers of the Consortium please see Unicode Directors, Officers and Staff

Friday, November 22, 2019

Unicode solicits feedback on open emoji definition: QID Emoji

The QID Emoji Tag Sequences (or QID emoji, for short) have been proposed to further open up the process of defining new emoji.

The proposal is intended to provide a well-defined mechanism for implementations to support additional valid emoji that are not representable by Unicode characters or standard emoji sequences. This proposed new mechanism would allow for the interchange of emoji whose meaning is discoverable, and which should be correctly parsed by all conformant implementations (although only displayed by implementations that support it). The meaning of each of these valid emoji would be established by reference to a Wikidata QID.

The Unicode Consortium would appreciate feedback on this proposal.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Thursday, November 21, 2019

Call for feedback on UTS #18: Unicode Regular Expressions

Regular expressions are a powerful tool for using patterns to search and modify text. They are a key component of many programming languages, databases, and spreadsheets.

Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance levels for supporting Unicode in regular expressions. A proposed update of that specification is now available for public review and comment. The following are the main modifications in this draft:

Broadened the scope of properties to allow for properties of strings (as well as properties of code points).
Added 11 Emoji properties including RGI sets as Full Properties in Level 2.
Added other new properties as Full Properties in Level 2: Equivalent_Unified_Ideograph, Vertical_Orientation, Regional_Indicator, Indic_Positional_Category, Indic_Syllabic_Category.
Provided a draft data file with property metadata for matching and validating non-UCD properties and their values for syntax such as \p{pname=pvalue}, so that such properties can be used in the same way as UCD properties. See Annex D.

There are a number of review notes requesting feedback on these and other possible changes. In particular, the Unicode Technical Committee would appreciate feedback on the discussion of and syntax for properties of strings, and on the recommended properties to be supported at Level 2.

The review period closes on 2020-01-06. For more information on reviewing and supplying feedback, see Proposed Update UTS #18, Unicode Regular Expressions.

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

The beta review period for Unicode 13.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 13.0 includes a number of changes and 5,930 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 13.0, often in coordination with changes to character properties. For the first time, a CJK extension has been encoded in plane 3, the Tertiary Ideographic Plane. Four new scripts have been added in Unicode 13.0. There are also 55 additional emoji characters and many other new emoji, including the transgender flag and polar bear.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by January 6, 2020. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-13.0.0.html for more information about testing the 13.0.0 beta.

See http://unicode.org/versions/Unicode13.0.0/ for the current draft summary of Unicode 13.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to https://home.unicode.org/membership/members/.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Monday, October 28, 2019

Unicode 2019 Bulldog Awards

Image of Heninger (left) and Lindenberg (right)

The Unicode Consortium announces the 2019 Bulldog Award recipients: Andy Heninger and Norbert Lindenberg.

Andy Heninger is recognized for many years of contributions to the work of the Consortium, including providing crucial implementations of segmentation and regular expression support in International Components for Unicode (ICU). Prior to having these functions in ICU, support for them in Unicode implementations was very limited. Both contributions are key to robust text support. For example, correct segmentation is what keeps family emoji from splitting apart!

Norbert Lindenberg has made significant contributions over the years to internationalizing the Web and has brought deep script expertise to the Unicode Script Ad Hoc group. He has contributed to the models of many of the Unicode Standard’s complex scripts, including Thai, Myanmar, Khmer, Javanese, and Tamil. His work has been used by organizations such as Mozilla, Yahoo!, Sun Microsystems, and Apple.

For many years, both have been bulldogs for robust Unicode text support.

More details of their many contributions can be found on the Unicode Bulldog Award page.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Monday, October 21, 2019

Emoji 12.1 release: 168 Emoji added

Emoji 12.1, with 168 new emoji, has been released. There are 138 new gender-neutral forms, so you will soon be able to text about people without specifying their gender. Thirty new combinations of people holding hands with various skin tones were also added.

The new emoji are listed in Emoji Recently Added v12.1, along with sample images. These images are merely samples: vendors for mobile phones, PCs, and web platforms will typically design their own fonts for emoji. In particular, the Emoji Ordering v12.1 chart shows how the new emoji should be sorted within the order of existing emoji, with new emoji marked with rounded rectangles. The other Emoji Charts for Version 12.1 have been updated to show the emoji.

Initial names and search keywords are available in different languages in Unicode CLDR 36, such as health worker (doctor, nurse, ...). Those will be refined during this quarter.

The new Emoji 12.1 data is available for vendors to use for their emoji fonts and code. These new emoji should start showing up on mobile phones in this quarter and next quarter. The new emoji will soon be available for adoption to help the Unicode Consortium’s work to support digitally-disadvantaged languages.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Wednesday, October 9, 2019

The Most Frequent Emoji

How does the Unicode Consortium choose which new emoji to add? One important factor is data about how frequently the current emoji are used. Patterns of usage help to inform decisions about future emoji. The Consortium has been working to assemble this information and make it available to the public.

And the two most frequently used emoji in the world are...

😂 and ❤️

The new Unicode Emoji Frequency page shows a list of the Unicode v12.0 emoji ranked in order of how frequently they are used.

“The forecasted frequency of use is a key factor in determining whether to encode new emoji, and for that it is important to know the frequency of use of existing emoji,” said Mark Davis, President of the Unicode Consortium. “Understanding how frequently emoji are used helps prioritize which categories to focus on and which emoji to add to the Standard.”

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Friday, October 4, 2019

ICU 65 Released

Unicode® ICU 65 has just been released. It updates to CLDR 36 locale data with many additions and corrections, and some new measurement units. The Java LocaleMatcher API is improved, and ported to C++. For building ICU data, there are new filtering options, and new tracing support for data loading in ICU4C.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details please see http://site.icu-project.org/download/65.

Thursday, January 9, 2020

New Unicode Technical Director

Friday, November 22, 2019

Unicode solicits feedback on open emoji definition: QID Emoji

Thursday, November 21, 2019

Call for feedback on UTS #18: Unicode Regular Expressions

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

About the Unicode Consortium

Monday, October 28, 2019

Unicode 2019 Bulldog Awards

Monday, October 21, 2019

Emoji 12.1 release: 168 Emoji added

Wednesday, October 9, 2019

The Most Frequent Emoji

Friday, October 4, 2019

ICU 65 Released

Links of Interest

Blog Archive

Labels

Followers

Thursday, January 9, 2020

Friday, November 22, 2019

Thursday, November 21, 2019

Tuesday, November 19, 2019

About the Unicode Consortium

Monday, October 28, 2019

Monday, October 21, 2019

Wednesday, October 9, 2019

Friday, October 4, 2019

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog