The Unicode Blog

Wednesday, July 30, 2025

Highlights from UTC Meeting #184

The Unicode Technical Committee (UTC) meeting #184 was held last week, July 22 – 24, in Redmond, Washington, hosted by Microsoft. Here are some highlights.

Finalizing Unicode 17.0

The top priority was to finalize technical decisions for Unicode 17.0 in preparation for a September 9 release. Beta feedback and a small number of new proposals were considered, and various decisions affecting Unicode 17.0 were taken.

The most significant change from the Unicode 17.0 Beta is the removal of 44 characters, based on feedback requesting more time to review these characters and the associated proposals:

09FF BENGALI LETTER SANSKRIT BA
0B53 ORIYA SIGN DOT ABOVE
0B54 ORIYA SIGN DOUBLE DOT ABOVE
1FADD APPLE CORE
40 Chisoi script characters and the Chisoi block at 16D80..16DAF

These characters have been postponed to Unicode 18.0. With this change, the total number of new characters for Unicode 17.0 will be 4,803, including CJK Extension J and four new scripts.

Glyph changes were also approved for 21 characters, all of which were encoded in earlier versions.

Certain character property changes were also approved. These include a change to the Word_Break property for 00B8 CEDILLA to accommodate orthographic usage for SENĆOŦEN, an indigenous language spoken in Western Canada. In relation to identifiers and security, the seven scripts added in Unicode 16.0 (Garay, Gurung Khema, Kirat Rai, Ol Onal, Sunuwar, Todhri, and Tulu-Tigalari) will be classified in UAX #31 as Excluded Scripts (Table 4), which means that these will not be included in the General Security Profile for secure identifiers.

First characters approved for Unicode 18.0

The tentative plan for new characters to be added in the next Unicode version is usually decided at the fall UTC meeting. The first approvals for Unicode 18.0, however, were decided last week at UTC #184. These include the 44 characters postponed from Unicode 17.0, mentioned above, as well as u+20CE UAE DIRHAM SIGN and 16 geometric symbols used in the manuscripts of the 17th-century polymath Gottfried Wilhelm Leibniz.

As typically happens at each UTC meeting, several code points were provisionally assigned for other new characters that will be candidates for future versions.

For characters approved for 18 or provisionally assigned for future versions, see https://www.unicode.org/alloc/Pipeline.html#future.

Text Terminal Working Group progress

A temporary working group was created at UTC #175 to work on improved support for Unicode text in text-only terminal environments, particularly for scripts requiring advanced layout. Due to changes in availability of key participants early on, progress was hindered, but the working group is now meeting regularly.

To scope the project, they will prioritize scripts classified in UAX #31 as Recommended. These include a number of scripts for which examples of fixed-width text have not been readily available, and the working group would welcome contributions from anyone with knowledge of prior art for fixed-width Indic text.

For complete details from UTC #184, see the draft minutes.

About the Unicode Standard

The world relies on digital communications. The Unicode Standard is one of the building blocks for global digital communications, providing the encoding for more than 155,000 characters used by thousands of languages and scripts throughout the world.

Each character—letter, diacritic, symbol, emoji, etc.—is represented by a unique numeric code, and has defined properties data that define how characters behave in several text processing algorithms.

With this combination, The Unicode Standard provides the foundation for implementations to support the world's writing systems, enabling billions of people across the globe to seamlessly communicate with one another across platforms and devices. The Standard is also the foundation for the suite of code, libraries, data, and products that the Unicode Consortium delivers for robust language support.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Wednesday, July 16, 2025

🥳 Say Hello to the New Emoji Coming in Unicode 17.0 This Fall! ✨

From 🥹 to 🦖 to 🎸, emoji have become the world's favorite way to say anything—without saying a word. Whether you're texting your best friend, posting on social media, or cheering someone up with a perfectly-timed 😭, emoji help us connect across languages, cultures, and continents.

But have you ever wondered how a new emoji makes its way to your keyboard?

That’s where the Unicode Consortium comes in. We're the nonprofit behind the Unicode Standard—the foundation that, in short, ensures your text (and emoji!) work across all your devices, around the world. Every year, new characters including emoji are added to the Unicode Standard and after a lot of paperwork are added to your device of choice :)

And today, on World Emoji Day, we’re thrilled to share some of the new emoji that will debut as part of the Unicode 17.0 release. Say hello to…

Trombone
Treasure Chest
Distorted Face
Hairy Creature
Fight Cloud
Apple Core
Orca
Ballet Dancer
Landslide

These new emoji have long standing symbolic meanings, are visually distinctive, and contain multitudes of expression. [Update - Additional characters in the repertoire of an upcoming release are subject to final approval by the Unicode Technical Committee.]

🎨 Got the next great emoji idea?

You can submit an emoji proposal by following our guidelines and tips. And if you want to support the future of digital communication, check out our Adopt a Character program. It’s a fun way to sponsor your favorite emoji, letter, or symbol while helping fund our mission to preserve and promote global languages in the digital age.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Thursday, June 12, 2025

Registration for Unicode Technology Workshop 2025 is now open!

Join us at Microsoft’s Silicon Valley campus in Mountain View, California for one, two, or all three days of community building around the Unicode technology that makes software work for billions of people.

🗓️ November 11 - Tutorial Day on Unicode Technologies
🗓️ November 12-13 - Unicode Technology Workshop 2025

Expect workshops, seminars, free-form discussions, and lightning talks centered around i18n libraries, locale data frameworks, globalization tooling, localization pipelines, input methods, and text rendering. Network with the developers and users to help shape the future of Unicode technology.

You will come away with deeper knowledge on how to solve tough problems in the i18n and l10n space and how to engineer products that work better for global users. To encourage maximum collaboration amongst the attendees, this is an in-person-only event.

💲Discounts

An early bird registration code for a discounted price on your registration is available. The code is: UTW2025Early. ⏳

Special rates are available to current Unicode members, students, and academics. Students and academics also receive one year of Unicode individual membership with their registration. The early bird discount code is not required for students or academics.

Learn more and complete your registration here or use the QR code below.

🫶 Sponsorship Opportunities

Sponsorship opportunities are available at various levels. Sponsorship benefits include complimentary registrations, opportunities to lead a session or workshop, recognition on the event website, program and event materials, visibility on social media, and much more. Specific offerings vary by sponsorship level.

If you want to demonstrate your industry leadership, enhance your brand, share your knowledge, promote your products and services, and foster community building, contact events@unicode.org today to learn more. Sponsorship discounts are available to Unicode Full and Supporting Members.

📣 Call for Submissions is also open

Unicode is also accepting session proposals for workshops, seminars, case studies, and tutorials that center around:
- Unicode i18n libraries
- Locale data frameworks
- Globalization tooling
- Localization pipelines
- Input methods
- Character encoding
- Text rendering
…and more!

Tutorial topics might include: font design and Unicode properties, introduction to Software Internationalization (i18n), and how to best support bidirectional text.

Share your knowledge and experience with other Unicode users, and help us envision the future of Unicode technology. You will also leave with deeper knowledge on how to solve tough problems in the i18n and l10n space and how to engineer products that work better for global users. Program and product managers who work with engineering teams are also strongly encouraged to join and propose sessions.

Deadline for submissions is July 31, 2025 by 5:00PM PT. Proposals will be reviewed in July and session hosts will be notified late July.

🗓️ Mark Your Calendars for Key Dates!

July 31 - Call for Submissions Closes - All Proposals, including Tutorials, Due
August 18 - Program Committee Notifications Go Out
August 31 - Early Bird Registration for Tutorials and UTW 2025 Closes
September 1 - Regular Registration for Tutorials and UTW 2025 Opens

Please visit https://www.unicode.org/events/utw/2025/ for updates. If you have any questions, please contact us at UTW2025@unicode.org. We look forward to seeing you in November!

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Thursday, May 29, 2025

ICU4X 2.0 released!

At the intersection of human and computer languages, internationalization (i18n) continues to play a pivotal role in modern software. Evolving i18n libraries means better quality experiences, improved performance, and support for digitally disadvantaged languages.

ICU4X is Unicode's modern, lightweight, portable, and secure i18n library. Built from the ground up, its binary size and memory usage footprint is 50-90% smaller than ICU4C. It is memory-safe, written in Rust with interfaces into C++, JavaScript, and TypeScript — and Python, Dart, and Kotlin are in the pipeline. Mozilla Firefox, Google Pixel Watch, core Android, numerous Flutter apps, and more clients are already using ICU4X.

After 6 months of iterating on beta releases and a soft launch earlier this month, the ICU4X Technical Committee is happy to announce ICU4X 2.0. This release brings a new paradigm for locale objects, a rewritten DateTime component, overhauled C++/C/JS interfaces, the latest locale data, and much more.

Date, Time, and Time Zone Formatting

ICU4X 2.0 implements the new semantic datetime skeletons specification in UTS 35. An evolution from previous datetime APIs, the ICU4X DateTime component is designed from decades of experience understanding what developers need from datetime formatting.

With ICU4X 2.0, users pick a "field set" and fine-tune it with "options". There are a fixed number of field sets, which represent all valid combinations of fields.

Users of ICU and JavaScript are familiar with "classical" datetime skeletons and components bags, respectively. The following table illustrates the correlation with semantic datetime skeletons:

ICU Classical Skeleton	ECMA-402 Components Bag	ICU4X 2.0 Rust Code
yMMMd	{ year: "numeric", month: "abbreviated", day: "numeric" }	fieldsets::YMD::medium()
MdEjm	{ month: "numeric", day: "numeric", weekday: "short", hour: "numeric", minute: "numeric" }	fieldsets::MDE::short() .time_hm()
jmsV	{ hour: "numeric", minute: "numeric", second: "numeric", timeZoneName: "generic" }	fieldsets::T::hms() .zone(zone::GenericShort)

Semantic datetime skeletons, called "field sets with options" in ICU4X, have numerous advantages:

Easier to understand and harder to make mistakes. For example, a common error in ICU skeletons is to write an incorrect skeleton string such as "YMd" or "ymd" instead of the correct "yMd".
Enables new formatting options not possible with components bags or skeletons:

Year style: the era, such as "BCE", can be automatically inserted

Time precision: the minute can be hidden if it is zero

Prevents nonsensical combinations of fields and options. For example, the ICU4X API prevents "month with minute" (“December 10” for December 5 at 7:10).
Well-suited for data slicing, allowing for minimal data overhead. For example, apps won’t carry weekday names if they are formatting with only a year/month/day or time field set.

Locale Preferences

ICU4X 2.0 introduces Preferences objects, a new paradigm for locale and user preference resolution in component constructors.

The new structures enable richer, type-safe management of user preferences coming from different sources, including locales and other preferences objects. String-based locales are still supported as well.

Locale Identifier String	ICU4X 2.0 Rust Code*
en-US-u-hc-h23	let mut p = Preferences::from(LanguageIdentifier { language: language!("en"), region: region!("US"), ..Default::default() }) p.hour_cycle = HourCycle::H23;
zh-Hant-TW-u-ca-roc	let mut p = Preferences::from(LanguageIdentifier { language: language!("zh"), script: Some(script!("Hant")), region: Some(region!("TW")), ..Default::default() }) p.calendar_algorithm = CalendarAlgorithm::Roc;
ar-EG-u-nu-latn-fw-sun	let mut p = Preferences::from(LanguageIdentifier { language: language!("ar"), region: region!("EG"), ..Default::default() }) p.numbering_system = value!("latn").try_into().unwrap(); p.first_day = FirstDay::Sun;

* The type name "Preferences" is a placeholder for the formatter-specific preferences object, such as DecimalFormatterPreferences, a structured object containing all the pieces of a locale required for number formatting: information on the language, script, region, variant, and numbering system preference, but not irrelevant pieces like calendar system.

Cross Programming Language Improvements

The foreign function interface (FFI) has been overhauled with major ergonomic improvements. Key changes include:

Separate constructors in FFI for built-in compiled data and data from an explicit data provider, enabling better dead-code elimination for non-Rust clients.
C/C++

Namespacing: ICU4X types are exported in a namespace, allowing for including "icu4x::DateTimeFormatter" instead of "ICU4XDateTimeFormatter".
Smart pointers: ICU4X types are returned within std::unique_ptr instead of internally containing an allocation; allowing more flexible usage with other reference strategies.
Versioned ABI: structs that are #[non_exhaustive] in Rust (and methods that use them) are now versioned on both the ABI and in headers, allowing them to evolve safely in future versions

JavaScript

Enums: enum representation changed from strings to classes. Strings can still be used in the constructor
Structs: objects can now be used wherever structs (such as options bags) are required
Special methods: constructors, iterator, getters and setters are now exposed idiomatically
Documentation: typedoc-generated documentation is a lot more readable now (check it out)
ICU4X is now published as an NPM package: https://www.npmjs.com/package/icu

Other Cross-Cutting Changes

Additional changes you may encounter when upgrading from 1.5 to 2.0:

Many Rust types have gained separate owned and borrowed variants; for example, there are now both "Collator" and "CollatorBorrowed". The borrowed variant is slightly more efficient; it can be created statically from compiled data or derived from the owned variant.
Our internal data storage type has a more efficient binary representation (see the zerovec crate). This means that postcard data generated with ICU4X 1.5 will not work with 2.0.
The icu_locid and icu_locid_transform crates were re-organized into icu_locale and icu_locale_core. This means that icu_locid and icu_locid_transform will be forever at 1.5. If you currently depend directly on icu_locid or icu_locid_transform, you need to switch to icu_locale or icu_locale_core.
The icu_calendar crate now focuses only on calendrical calculations, and a new crate, icu_time, contains pieces from icu_calendar and icu_timezone. The icu_timezone crate will be forever at 1.5. If you currently depend directly on icu_timezone, you need to switch to icu_time.
The icu_datagen crate was split into several sub-crates. If you currently depend directly on icu_datagen, you need to switch to icu_provider_source, icu_provider_export, and/or the icu4x-datagen binary crate.
Performance improvements in multiple components. For example, the normalizer got a data rearrangement that benefits non-NFD normalizations, and the collator now has an identical prefix optimization.
Input types for formatters are now re-exported from the formatter crate to reduce the number of explicit Cargo.toml dependencies.
All crates are updated to the latest CLDR (47) and Unicode (16) versions.

Get started with ICU4X 2.0

ICU4X's new website, icu4x.unicode.org, now hosts tutorials, documentation, and more. The website reflects the current release, with previous releases also available.

Check out our quickstart tutorial, interactive demo, or C++, TypeScript, and (experimental) Dart documentation.

As before, the Rust crate is available at crates.io, with documentation at docs.rs.

Please post any questions via GitHub Discussions.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, May 20, 2025

Unicode 17.0 Beta Review Open

The beta review period for Unicode® 17.0 has started and is open until July 1, 2025.

The beta is intended primarily for review of character property data and changes to algorithm specifications (Unicode Standard Annexes and certain Unicode Technical Standards that are synchronized with the Unicode Standard). Also, a complete draft of the core specification text is available for review during the beta period.

At this phase of a release, the character repertoire is considered stable. No new characters will be added. Characters could still be removed, and character names or code points could be changed, but such changes would require strong justification.

For this release, 4,847 new characters have been added, bringing the total number of encoded characters in Unicode 17.0 to 159,845. The largest set of added characters is in the new CJK Unified Ideographs Extension J block, with 4,298 new CJK unified ideographs, which increases the number of CJK unified ideographs to over 100,000. The new additions also include characters for the following five new scripts:

Beria Erfe is a modern-use script used in central Africa.
Chisoi is a modern-use script used in northeast India.
Tolong Siki is a modern-use script used in northeast India.
Tai Yo is the traditional script of Tai Yo communities in northern Vietnam.
Sidetic is an historic script used in ancient Anatolia.

In addition to new CJK unified ideographs, nearly 2,500 already-encoded CJK ideographs were horizontally extended, adding source references and glyphs reflecting use of those ideographs in China and Korea.

Another notable character addition is the SAUDI RIYAL SIGN, recently created by the Saudi Central Bank for its riyal currency.

See The Pipeline and the delta code charts for details on all of the new characters.

In addition to new characters, there are some significant character property and algorithm changes, including the following:

A proposed update to UAX #14, Unicode Line Breaking Algorithm, introduces a new Line_Break property value and associated rule changes.
In a proposed update to UTS #39, Unicode Security Mechanisms, a large number of Identifier_Type property value changes are made. These values are used for defining a default recommended set of characters for identifiers used in secure contexts.

Also note that locations of data files for synchronized UTSes have been changed. See the Unicode 17.0 Beta landing page for other noteworthy property and algorithm changes. For full details regarding the Beta, see Public Review Issue #526. Feedback should be reported under PRI #526 using the Unicode Contact Form by July 1, 2025.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Wednesday, July 30, 2025

Highlights from UTC Meeting #184

Adopt a Character and Support Unicode’s Mission

Wednesday, July 16, 2025

🥳 Say Hello to the New Emoji Coming in Unicode 17.0 This Fall! ✨

Adopt a Character and Support Unicode’s Mission

Thursday, June 12, 2025

Registration for Unicode Technology Workshop 2025 is now open!

💲Discounts

🫶 Sponsorship Opportunities

📣 Call for Submissions is also open

🗓️ Mark Your Calendars for Key Dates!

Adopt a Character and Support Unicode’s Mission

Thursday, May 29, 2025

ICU4X 2.0 released!

Adopt a Character and Support Unicode’s Mission

Tuesday, May 20, 2025

Unicode 17.0 Beta Review Open

Adopt a Character and Support Unicode’s Mission

Links of Interest

Blog Archive

Labels

Followers

Wednesday, July 30, 2025

Highlights from UTC Meeting #184

Adopt a Character and Support Unicode’s Mission

Wednesday, July 16, 2025

🥳 Say Hello to the New Emoji Coming in Unicode 17.0 This Fall! ✨

Adopt a Character and Support Unicode’s Mission

Thursday, June 12, 2025

Registration for Unicode Technology Workshop 2025 is now open!

💲Discounts

🫶 Sponsorship Opportunities

📣 Call for Submissions is also open

🗓️ Mark Your Calendars for Key Dates!

Adopt a Character and Support Unicode’s Mission

Thursday, May 29, 2025

ICU4X 2.0 released!

Adopt a Character and Support Unicode’s Mission

Tuesday, May 20, 2025

Unicode 17.0 Beta Review Open

Adopt a Character and Support Unicode’s Mission

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog