The Unicode Blog: 2021

Thursday, December 2, 2021

The Most Frequently Used Emoji of 2021

The Unicode Emoji Mirror Project

92% of the world’s online population use emoji — but which emoji are we using? The Unicode Consortium, the not-for-profit organization responsible for digitizing the world’s languages, gathers information about how frequently emoji are used. Looking at patterns of usage helps to determine what new emoji should be added to the Unicode Standard. As part of this effort, we are making that data available to the public.

The new Unicode Emoji Frequency page lists the Unicode v12.0 emoji ranked in order of how frequently they were used in 2021 and what has changed since 2019. Check it out for more analysis, insights and patterns that illustrate our collective experience during a global pandemic.

#UnicodeEmojiMirror

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, November 17, 2021

Unicode Emoji 15.0 Provisional Candidates

The Unicode Technical Committee has approved the list of provisional candidates for Emoji 15.0. They are slated for release in September 2022 together with Unicode 15.0. These candidates were identified by the Unicode Emoji Subcommittee after reviewing proposals ranked according to previously-determined selection factors.

The list of provisional emoji candidates can be found here. Note that they have not yet been assigned code points or properties. For comments on these candidates, please reference PRI #435 in your feedback.

How to Provide Feedback: For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.

Feedback is reviewed by the relevant committee according to their meeting schedule.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, November 10, 2021

ICU4X 0.4 Released

Unicode® ICU4X 0.4 has just been released. This revision brings an implementation of Unicode Properties, major performance and memory improvements for DateTimeFormat, and extends the data provider data loading models with BlobDataProvider.

ICU4X 0.4 also adds initial time zone support in DateTimeFormat, week of month/year, iteration APIs in Segmenter and experimental ListFormatter.

The ICU4X team is shifting to work on the 0.5 release in accordance with the roadmap and a product requirements document setting sights on a stable 1.0 release in Q2 2022.

ICU4X aims to develop a highly modular set of internationalization components for resource-constrained environments, portable across programming languages.

Multiple early adopters use ICU4X in pre-release software in Rust, C, C++, and WebAssembly. The team is ready to onboard additional early adopters to refine the APIs, build processes, and feature sets before the 1.0 release. The team is also looking for contributors to write code generation for additional target programming languages. For more information, please open a discussion on the ICU4X GitHub.

For details, please see the changelog.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, October 28, 2021

Unicode CLDR v40 now available!

Unicode CLDR version 40 is now available, with approximately 140,000 new or modified data fields.

In this release, the focus is on:

Grammatical features (gender and case)

In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours". The overall goal for CLDR is to supply building blocks so that implementations of advanced message formatting can handle gender and case.

Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv) for all units of measurement.
Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a more restricted number of units.
Phase 3 (v41) will further expand the units.

Emoji v14 names and search keywords

CLDR supplies short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards.

Modernized Survey Tool front end

The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure was modernized to make it easier to add enhancements (such as the split-screen dashboard) and to fix bugs.

Specification Improvements

The LDML specification has some important fixes and clarifications for Locale Identifiers, Dates, and Units of Measurement.

Please see the CLDR v40 Release Note for details, including:

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

ICU 70 Released

Unicode® ICU 70 has just been released. ICU 70 incorporates updates to Unicode 14, including new characters, scripts, emoji, and corresponding API constants. ICU 70 adds support for emoji properties of strings. It also updates to CLDR 40 locale data with many additions and corrections. ICU 70 also includes many other bug fixes and enhancements, especially for measurement unit formatting, and it can now be built and used with C++20 compilers.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details, please see https://icu.unicode.org/download/70.

Note: Our website has moved. Please adjust your bookmarks.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, October 6, 2021

Unicode CLDR v40 Beta available for testing

The Unicode CLDR v40 Beta is now available for testing. The beta has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data. Feedback can be filed at CLDR Tickets.

Beta means that the main data, charts, and specification are available for review, but the JSON data is not yet ready for review. Some data may change if showstopper bugs are found. The planned schedule is:

Oct 27 — Release

In CLDR v40, the focus is on:

Grammatical features (gender and case) for units of measurement in additional locales

In many languages, forming grammatical phrases requires dealing with grammatical gender and case. Without that, it can sound as bad as "on top of 3 hours" instead of "in 3 hours"
Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv).
Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a more restricted number of units.

Emoji v14 names and search keywords

These supply short names and search keywords for the new emoji, so that implementations can build on them to provide, for example, type-ahead in keyboards

Modernized Survey Tool front end.

The Survey Tool is used to gather all the data for locales. The outmoded Javascript infrastructure (very difficult to enhance or even fix bugs) was modernized.

Specification Improvements

Notably in the areas of Locale Identifiers, Dates, and Units of Measurement

There are many other changes: to find out more, see the draft CLDR v40 release page, which has information on accessing the date, reviewing charts of the changes, and necessary migration changes.

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Tuesday, September 14, 2021

Announcing The Unicode® Standard, Version 14.0

Version 14.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 838 characters, for a total of 144,697 characters. These additions include five new scripts, for a total of 159 scripts, as well as 37 new emoji characters.

The new scripts and characters in Version 14.0 add support for modern language groups in Bosnia, India, Indonesia, Iran, Java, Malaysia, Mongolia, Myanmar, Pakistan, and the Philippines, plus other languages in Africa and North America, including:

Arabic script additions that include honorifics and additions for Quranic use, and characters used to write languages across Africa, the Balkans, and South and Southeast Asia
The Vithkuqi script historically used to write Albanian and currently undergoing a modern revival
The Tangsa script used to write the Tangsa language, spoken in India and Myanmar
The Toto script used to write the Toto language in northeast India
Many Latin script additions for extended IPA

Popular symbol additions include:

37 emoji characters, including several new emoji for emotion and hand gestures (smileys, hands, animals and nature, food and drink, transport, and activities). For the full list of new emoji characters, see emoji additions for Unicode 14.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.

Other symbol and notational additions include:

The som currency sign used in the Kyrgyz Republic
Znamenny musical notation developed in Russia

Support for other modern languages and scholarly work extends worldwide, including:

Cypro-Minoan, historically used primarily on the island of Cyprus
Old Uyghur, historically used in Central Asia and elsewhere to write Turkic, Chinese, Mongolian, Tibetan, and Arabic languages
Ahom, Balinese, Brahmi, Canadian aboriginal languages, Glagolitic, Kaithi, Kannada, Mongolian, Tagalog, Takri, and Telugu
Arabic support for Hausa, Wolof, Hindko, and Punjabi, and Ethiopic support for Gurage

Important chart font updates, including:

Significant updates to the CJK auxiliary blocks and enclosed alphanumerics

Unicode properties and specifications determine the behavior of text on computers and phones. Changes in Version 14.0 include the following Unicode Standard Annexes and Technical Standards that have notable modifications:

Five important Unicode annexes updated for Version 14.0:

Three important Unicode specifications updated for Version 14.0:

UTS #10, Unicode Collation Algorithm — sorting Unicode text
UTS #39, Unicode Security Mechanisms — reducing Unicode spoofing
UTS #46, Unicode IDNA Compatibility Processing — compatible processing of non-ASCII URLs

The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, September 9, 2021

Unicode CLDR v40 Alpha available for testing

The Unicode CLDR v40 Alpha is now available for testing. The alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data. Feedback can be filed at CLDR Tickets.

Alpha means that the main data and charts are available for review, but the specification, JSON data, and other components are not yet ready for review. Some data may change if showstopper bugs are found. The planned schedule is:

Sep 21 — Beta (data)
Oct 06 — Beta2 (spec)
Oct 27 — Release

In CLDR v40, the main focus is on:

Grammatical features (gender and case) for units of measurement in additional locales

Phase 1 (v39) of grammatical features included just 12 locales (da, de, es, fr, hi, it, nl, no, pl, pt, ru, sv).

Phase 2 (v40) has expanded the number of locales by 29 (am, ar, bn, ca, cs, el, fi, gu, he, hr, hu, hy, is, kn, lt, lv, ml, mr, nb, pa, ro, si, sk, sl, sr, ta, te, uk, ur), but for a narrower set of units.
Emoji v14 names and search keywords
Modernized Survey Tool front end.

There are many other changes: to find out more, see the draft CLDR v40 release page, which has information on accessing the date, reviewing charts of the changes, and necessary migration changes.

Unicode CLDR provides key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Tuesday, September 7, 2021

Unicode Consortium Announces Version 14.0 Cover Design

The Unicode Consortium is pleased to announce the new design selected for the cover of the forthcoming print-on-demand publication of The Unicode Standard, Version 14.0. The Unicode Consortium issued an open call for artists and designers to submit cover design proposals. All submitted designs were reviewed by an independent panel.

The selected cover artwork for Version 14.0 is an original design by Sophia Tai, an MA student in Typeface Design at the University of Reading. Her cover art represents type in boxes, which shares a visual language with the arrangement of metal type, as well as the Unicode code charts. She selected a global mix of characters to present a variety of writing systems, using neon colors to create liveliness. The neutral background represents a sense of being down to earth, as well as the longevity and preservation of writing systems.

Two runner-up designs were also selected. One is a contemporary design by Beatriz de Paula Mattos, a graphic design student at the University of Vale do Itajaí, Brazil. The other runner-up design was created by Jesús Barrientos Mora, a professor with a degree in Type Design, who also leads the Talavera Type Workshop foundry in Puebla, Mexico.

Beatriz de Paula Mattos:

Jesús Barrientos Mora:

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Tuesday, August 3, 2021

Announcing Internationalization & Unicode Conference #45 Keynote Speaker Gretchen McCulloch

Taking Playfulness Seriously—When character sets are used in unexpected ways

HEEEEELLLLLLOOOO friends of Unicode! HaVE yoU HEard? 🚨🚨This year’s keynote speaker is Gretchen McCulloch, internet linguist and bestselling author of the 2019 book, Because Internet: Understanding the New Rules of Language. ✍️ You ✍️ may ✍️ also ✍️ know ✍️ Gretchen ✍️ from ✍️ her ✍️ column ✍️ about ✍️ internet ✍️ language ✍️ in ✍️ WIRED. If you aren’t familiar withGretchen’s book it includes great insights of how language and technology evolves! Don’t miss this unique opportunity to hear from her in person. (♥ω♥*)

Her talk, “Taking Playfulness Seriously—When character sets are used in unexpected ways,” explores those trailblazing language disrupters, who aren’t checking out the Oxford English dictionary or asking themselves, “What would my college English prof do about this comma?” You know who you are! 👀 She’ll discuss all the creative ways real-life netizens playfully create ASCII art out of text or combine emoji to convey new meanings, as well as the problems that arise when these kinds of creative uses clash with technical tools behind the scenes that aren’t expecting the unexpected—and what some solutions might look like.

In addition to Gretchen McCulloch, the conference offers Unicode tutorials, talks and panels on internationalization, web design, emoji, indigenous languages, historical scripts, and more. Of course, the conference also includes plenty of networking opportunities, as well as a special celebration of the Unicode Consortium’s 30th anniversary!

See What’s Happening at IUC 45

For thirty years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. Join us in Santa Clara to promote your ideas and experiences working with natural languages, multicultural user interfaces, producing and supporting multinational and multilingual products, linguistic algorithms, applying internationalization across mobile and social media platforms, or advancements in relevant standards.

Join expert practitioners and industry leaders as they present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

Join us for the Internationalization & Unicode Conference 45, October 13-15, 2021, Santa Clara, California. To register and learn more, please visit the Internationalization & Unicode Conference website. Object Management Group®, (OMG®) organizes the Internationalization and Unicode Conferences around the world under an exclusive license granted by the Unicode Consortium.

Tuesday, July 20, 2021

The Unicode Consortium Welcomes Toral Cowieson as Executive Director & COO

Since its founding, the Unicode Consortium has grown and expanded its charter and scope. We’re embarking on a new chapter in the evolution of the Consortium and are pleased to announce the appointment of Toral Cowieson in the newly-created position of Executive Director & COO.

“We are thrilled to have Toral joining the team,” said Mark Davis, President and cofounder of the Consortium. “She brings a wealth of experience in leadership across non-profits, corporations, and board service. Her recent time at the Internet Society, including as head of Strategy and Impact Measurement, puts Unicode in good stead for this next stage of growth."

In this senior executive position reporting to the Chair of the Board of Directors, Ms. Cowieson will collaborate with the Board, officers, and team to extend the technical mission and impact, set the future agenda and program priorities, and ensure the long-term health and sustainability of the organization.

“Unicode standards are at the heart of how users seamlessly receive and share information across the nearly 22 billion devices around the world. I’m honored and excited to be joining the Consortium at this juncture, and look forward to working with the Board, staff, and the extended Unicode community to advance the mission and have an even greater impact in the years to come,” commented Ms. Cowieson.

In addition to Ms. Cowieson joining as Executive Director, the Consortium is also pleased to announce the following changes:

Board and Other Leadership Updates

Iris Orriss, who joined the Unicode Board in 2019, has been elected as the Treasurer of the Consortium. She is VP of Internationalization, Product Quality, and Product Experience Analytics at Facebook. Ms. Orriss is also Chair of the Board’s Finance and Funding Committee.

Greg Welch, member of the Board since 2013, has been elected as the Secretary of the Consortium and carries forward the excellent work done in this office by Michel Suignard for more than a decade. Mr. Welch is also Chair of the Board’s Governance & Nominating Committee.

Markus Scherer, the Chair of the ICU Technical Committee, has been appointed a Vice President. He is a member of the Google software internationalization team, focusing on the effective use of Unicode and on the development and deployment of cross-product internationalization libraries.

Announcing Unicode Fellows

The Consortium has recently created a new category for distinguished contributors, whose deep, long-term knowledge of internationalization and dedication to work on standards has greatly benefited the Consortium for many years. The Consortium is pleased to announce its two inaugural Unicode Fellows.

Peter Edberg has been named a Unicode Fellow. He has worked on internationalization, text and language support at Apple since 1988. He has been Apple’s representative to the Consortium for many years, and has been actively involved since 2008 with the Unicode CLDR and ICU projects.

Michel Suignard has been named a Unicode Fellow after serving as Secretary for the Unicode Consortium from 2007 to 2020. He worked for more than twenty-five years at Microsoft, where he held various positions in the development and sales divisions, many involving the development of the Unicode Standard. He is currently an independent consultant working on character encoding related matters, such as Internationalized Domain Names (IDN) and typography. Michel is the code chart editor for the Unicode Standard and is also the project editor of ISO/IEC 10646, which is the ISO standard aligned with the Unicode Standard.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Monday, July 12, 2021

Adopt a Character to Celebrate World Emoji Day

This week, the Unicode Consortium is excited to celebrate the calendar emoji, 📅, commonly displayed with July 17th. People are the power driving the popularity of emoji through their innovative use of them to share joy, activities, sports, individuality, and so much more.

Celebrate a favorite emoji or character this week by adopting a character! While many characters have been adopted since the program launched in December 2015, hundreds of emoji haven’t been adopted by anyone at any level, including fantastic ones like clapping hands (👏twelfth👏most👏used👏emoji👏), check box (for all your to-do list dreams), and the loudly crying emoji (I’m so proud of you! 😭). Imagine the possible messages you could send with a gift adoption! For example:

Congratulations!!!!! 🥂
Love You! 🖤
Kisses 💋
Did you see this👇🏽
Yes, I adopted this face in your name 🥴
My bad 😳
Happy Birthday! To 100 more! 🎂

When you celebrate World Emoji Day this week by sponsoring your favorite emoji or another character for yourself or as a gift, your donation helps the non-profit Unicode Consortium support the world’s many languages and make the digital world more inclusive. The Consortium is funded by membership fees and donations from individuals, corporations, and other organizations. Your donations help support the vital work of the Consortium, making modern software and computing systems support the widest range of human languages. The Consortium will use your donation to improve language support and to preserve digital heritage. For more details, see How Donations are Used.

Tuesday, June 8, 2021

Unicode 14.0 Beta Review

The beta review period for Unicode 14.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones-plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 14.0 includes a number of changes and 838 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 14.0. Five new scripts have been added in Unicode 14.0. There are also additional emoji characters.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by July 13, 2021. This will be a slightly shorter review period of only five weeks, so prompt feedback is appreciated. Feedback instructions are on the beta page.

See https://www.unicode.org/versions/beta-14.0.0.html for more information about testing the 14.0.0 beta.

See https://www.unicode.org/versions/Unicode14.0.0/ for the current draft summary of Unicode 14.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Microsoft, Netflix, Sultanate of Oman MARA, Salesforce, SAP, Tamil Virtual Academy, The University of California (Berkeley), Yat Labs, plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to https://home.unicode.org/membership/members/

For more information, please contact the Unicode Consortium https://www.unicode.org/contacts.html.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, May 19, 2021

Program Announced for IUC 45!

For over 30 years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. As we navigate the new normal, we invite you to join us in Santa Clara, CA to promote your ideas and experiences working with natural languages, multicultural user interfaces, producing and supporting multinational and multilingual products, linguistic algorithms, applying internationalization across mobile and social media platforms, or advancements in relevant standards.

Trained, Tested, Trusted: Understand best practices in process and among teams reliably delivering high quality global products. Examine how developers build, test, and deploy great global products. Explore technologies for design, localization, multilingual testing, workflow management, and content management.

Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency in supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

Track and Session Topics to Include:

Architecture	Case Studies
Fonts/Emojis	ICU/CLDR
Internationalization	Language Sustainability
Localization	Scripts

Register Today!

About The Unicode Consortium

The Unicode® Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Microsoft, Netflix, Sultanate of Oman MARA, Salesforce, SAP, Tamil Virtual Academy, The University of California (Berkeley), Yat Labs, plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to https://home.unicode.org/membership/members/ For more information, please contact the Unicode Consortium.

About the Event Producer

OMG® is the Event Producer for the Internationalization & Unicode Conferences. OMG is an international, open membership, not-for-profit computer industry standards consortium. OMG Task Forces develop enterprise integration standards for a wide range of technologies and an even wider range of industries. OMG's modeling standards, including the Unified Modeling Language™ (UML®) and Model Driven Architecture® (MDA®), enable powerful visual design, execution and maintenance of software and other processes, including IT Systems Modeling and Business Process Management. OMG's middleware standards and profiles are based on the Common Object Request Broker Architecture (CORBA®) and support a wide variety of industries. OMG has offices at 109 Highland Avenue, Needham, MA 02494 USA. This email may be considered to be commercial email, an advertisement or a solicitation.

For more information about OMG, visit us online at https://go.omgprograms.org/e/658223/2021-05-19/4hvqrv/283005991?h=Oj2eGlxYpYR7gx1lmU8Rxbrb1HmYWLHAiDyImxZoBI4.

Thursday, May 6, 2021

ICU4X 0.2 Released

Unicode® ICU4X 0.2 has just been released. This revision improves completeness of the components in ICU4X 0.1 and introduces a number of lower-level utilities.

ICU4X 0.2 adds minimal decimal formatting, time zone formatting, datetime skeleton resolution, and locale canonicalization.

This release comes with new low-level utilities for fixed decimal operations, ICU patterns, and foundational components allowing use of ICU4X from other ecosystems via Foreign Function Interfaces.

Additionally, the ICU4X team released a roadmap and a product requirements document setting sights on a stable 1.0 release.

ICU4X aims to develop a highly modular set of internationalization components for resource-constrained environments.

For details, please see changelog.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Monday, April 19, 2021

Call for Unicode 14.0 Cover Design Art

The Unicode Consortium is inviting artists and designers to submit cover design proposals for Version 14.0 of The Unicode Standard, scheduled for publications in September 2021.

The selected cover design will appear on the Unicode Standard 14.0 web pages, in the print-on-demand publications, and in associated promotional literature on the Unicode website. The artist whose design is selected for the cover will receive full credit in the colophon of the publication for which the art is used, and wherever else the design appears, and will receive $700. Two selected runner-up artists will receive $150 apiece.

Please see the announcement web page for requirements and more details.

Thursday, April 15, 2021

Now Accepting Unicode Emoji Proposals 🎉

When you last heard from the Unicode Emoji Subcommittee in April of 2020, the Unicode Consortium had just announced a 6-month delay to Unicode Version 14.0 due to COVID-19. Despite all of this :waves at the world: we’ve been busy.

What’s new? Great question!

During this pause in proposal submissions, the Unicode Emoji Subcommittee consulted with experts, developing a process that more completely reflects our criteria for inclusion in an effort to prioritize globally relevant emoji. We’ve looked for new ways to reconcile the rapid, transient nature of modern communication with the formal, methodical process required by a standards body like the Unicode Consortium.

Moving forward, the proposal review season will be open each year from April 15-August 31. To submit a proposal, first read these Guidelines and fill out this form.

Thanks to all our Unicode Emoji Subcommittee volunteers who made these improvements possible. The world would be without emoji if it weren’t for you!

Looking forward to 2021!
The Unicode Emoji Subcommittee

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Friday, April 9, 2021

ICU 69 Released

Unicode® ICU 69 has just been released. ICU 69 incorporates updates to CLDR 39 locale data with its many additions and corrections. ICU 69 also includes significant improvements to formatting for measurement units and numbers, as well as many other bug fixes and enhancements.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details, please see http://site.icu-project.org/download/69.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, April 8, 2021

Unicode CLDR Version 39 now available

Unicode CLDR version 39 is now available. Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

The scope of the data changes is small in this cycle, because there was no data submission phase. Instead the focus was on modernizing the Survey Tool software and preparing for data submission in the next release (v40). The data fixes in the release were confined to some global changes that are difficult to do during a submission cycle, and various other fixes.

However, there were some changes that could require implementations to adapt their code:

There was a major change in how Norwegian is handled, in order to align the way that the language identifiers no, nb, and nn are used.
The unit support from the last release was integrated into ICU, and some fixes resulting from that process were made to the measurement unit data.
Quite a number of fixes are made to the specification, to clarify text or fix problems in keyboards, measurement units, locale identifiers, and a few other areas.

To find out more, see the CLDR 39 Release Note, which has details on accessing the data, charts of the changes, and necessary migration changes.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, March 25, 2021

CLDR v39 Beta 2

The CLDR v39 beta has reached specification freeze, so no further changes will be made to the CLDR specification (aka LDML) except for showstoppers. For more details please see the release page.

The CLDR v39 release is planned for 2021-Apr-07.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, March 11, 2021

CLDR v39 Beta

CLDR v39 beta has reached data freeze, so no further changes will be made to the CLDR data except for showstoppers. For more details please see the release page.

The planned date for the LDML specification freeze is March 24, 2021.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, March 3, 2021

Emoji — There's more than meets the 👁️

A lot more goes into selecting and designing an emoji than you might expect. For some in-depth glimpses into the factors designers weigh when expanding the set of emoji characters, check out these videos on our Unicode Consortium YouTube channel:

When a Merperson is a Merman: Using Gender-Inclusive Design for Codepoints Which Don't Specify Gender

Race is Not a Skin Tone. Gender is Not a Haircut.

Hanmoji: Analyzing Chinese Radicals to Determine Semantic Gaps in Emoji

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Monday, March 1, 2021

Unicode CLDR v39 Alpha available for testing

The Unicode CLDR v39 Alpha is now available for testing. The alpha has already been integrated into the development version of ICU. While the scope of the changes is small in this cycle, there are some significant migration issues, so we would especially appreciate feedback from non-ICU consumers of CLDR data. Feedback can be filed at CLDR Tickets.

Unicode CLDR provides key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v39 had no submission phase. Instead the focus was on modernizing the Survey Tool software, preparing for data submission in the next release (v40). The data fixes in the release were confined to some global changes that are too difficult to do during a submission cycle, and various other fixes. There was a major change in how Norwegian is handled, in order to align the way that the locale identifiers no, nb, and nn are used. The CLDR Github repo is changing the name of “master” branch to “main” branch. The unit support from the last release was integrated into ICU, and some fixes resulting from that process were made to the measurement unit data. Quite a number of fixes are made to the specification, to clarify text or fix problems in keyboards, measurement units, locale identifiers, and a few other areas.

The public beta (data and specification) is planned for 2021-Mar-24, with the release following on 2021-Apr-07.

To find out more, see the draft CLDR 39 Release Note, which has information on accessing the date, reviewing charts of the changes, and necessary migration changes.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Friday, February 26, 2021

Unicode 14.0 Alpha Review

The repertoire for Unicode 14.0 is now open for early review and comment. During alpha review the repertoire is reasonably mature and stable, but is not yet completely locked down. Discussion regarding whether certain characters should be removed from the repertoire for publication is welcome. Character names and code point assignments are reasonably firm, but suggestions for improvement may still be entertained.

This early review is provided so that reviewers may consider the character repertoire issues prior to the start of beta review (currently scheduled to start in June, 2021). Once beta review begins, the repertoire, code points, and character names will all be locked down, and no longer be subject to changes.

Feedback for the alpha review should be reported under PRI #428 using the Unicode contact form by April 12, 2021.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, February 24, 2021

Enhancements to Unicode Regular Expressions

A Proposed Update UTS #18, Unicode Regular Expressions is now available for review and feedback.

Regular expressions are a key tool in software development. Back in 2000, few regular expression engines supported Unicode, even at a basic level. UTS #18 set out to raise the bar, describing how regular expression engines could be adapted to deal with Unicode correctly and completely. Since that time, major programming languages and libraries have adopted level 1 features (supporting all Unicode literals, basic character properties, subtraction, intersection, ...), and some also adopted some level 2 features (full character properties, grapheme clusters, ...).

A major enhancement to UTS #18 in 2020 focused on the addition of Character Classes with strings. The initial impetus for this was to handle emoji effectively in browsers, as most emoji consist of more than one code point. Supporting strings directly in character classes frees up programs from having to download large amounts of data or handle complicated syntax. Using a property like RGI_Emoji allows a regular expression to match both individual codes such as "😁" and multi-codepoint strings such as "🇫🇷". This extension to strings is also important for internationalization. For example, the alphabets used by many languages contain multi-code-point strings, so this extension allows them to be handled easily.

Additional enhancements are in progress this year, based on working with members of the ECMAScript committee, including more clarifications, better guidance on implementation, and addressing some tricky issues dealing with complementing (inverting) Character Classes. The end goal of all of these enhancements in 2020 and 2021 is to significantly raise the level of Unicode support in programming languages and libraries.

For more information, see https://www.unicode.org/review/pri427/.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Tuesday, February 2, 2021

Unicode Consortium looking to hire an Executive Director

Since its founding, the Unicode Consortium has grown and expanded its charter and scope. We’re embarking on a new chapter in the evolution of the Consortium by initiating the search for a leader with proven executive talents to fill the newly-created position of Executive Director. Learn more: https://www.unicode.org/consortium/edappinfo.html

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, January 28, 2021

Unicode Consortium Elects New Directors to its Board

The Consortium is pleased to announce the following Board of Directors election results from its annual Member’s meeting:

Elected to new 3-year board terms:

Brent Getlin, Director of Product Development and General Manager, Fonts and Type, Adobe, Inc.
Brent is the Director of Product Development and General Manager for Adobe Fonts and Type at Adobe. Previously, Brent managed Adobe's mobile gaming engineering and Macromedia Flash video encoder. Brent holds a Bachelor of Science degree in Computer Engineering from Southern Methodist University.

Teresa Marshall, VP of Globalization and Localization, Salesforce, Inc.
As VP of Globalization and Localization, Teresa drives globalization efforts across Salesforce, including internationalization, international product management and localization. She started her career as a German linguist and has held program and operational management positions at a number of Silicon Valley companies as well as academic positions in the field of language translation. Teresa holds a MA in Translation and Interpreting from the Monterey Institute of International Studies.

Re-elected to another 3-year term on the board:

David Singer, Apple, Inc.
David Singer is the senior engineer who coordinates standards activity for software engineering at Apple. In this role, he serves directly in both technical roles (multimedia systems at MPEG and 3GPP) and strategic roles (Advisory Committee and Advisory Board at the W3C, past Blu-ray Director), and indirectly oversees Apple’s involvement in a wide range of standards bodies and consortia, including ITU-T and ITU-R, SMPTE, and INCITS. David holds a BA and PhD from the University of Cambridge, England.

Newly elected to a 2-year term:

Dr. Mark Davis, Google, Inc.
Dr. Mark Davis co-founded the Unicode project and has been the president of the Unicode Consortium since its incorporation in 1991. Having held positions at IBM and Apple, Mark joined Google in 2006 where he has been working on software internationalization focusing on effective and secure use of Unicode (especially in the index and search pipeline), the software internationalization libraries (including ICU), and stable international identifiers.

“We also wish to thank retiring directors Marypat Meuli and James Robertson for their combined many years of service to the Consortium as board members.” said Davis.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, December 2, 2021

The Unicode Emoji Mirror Project

Wednesday, November 17, 2021

Wednesday, November 10, 2021

Thursday, October 28, 2021

Grammatical features (gender and case)

Emoji v14 names and search keywords

Modernized Survey Tool front end

Specification Improvements

Wednesday, October 6, 2021

Tuesday, September 14, 2021

Thursday, September 9, 2021

Tuesday, September 7, 2021

Tuesday, August 3, 2021

See What’s Happening at IUC 45

Tuesday, July 20, 2021

Board and Other Leadership Updates

Announcing Unicode Fellows

Monday, July 12, 2021

Tuesday, June 8, 2021

About the Unicode Consortium

Wednesday, May 19, 2021

About The Unicode Consortium

About the Event Producer

Thursday, May 6, 2021

Monday, April 19, 2021

Thursday, April 15, 2021

Friday, April 9, 2021

Thursday, April 8, 2021

Thursday, March 25, 2021

Thursday, March 11, 2021

Wednesday, March 3, 2021

Monday, March 1, 2021

Friday, February 26, 2021

Wednesday, February 24, 2021

Tuesday, February 2, 2021

Thursday, January 28, 2021

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog