Thursday, May 29, 2025

ICU4X 2.0 released!

At the intersection of human and computer languages, internationalization (i18n) continues to play a pivotal role in modern software. Evolving i18n libraries means better quality experiences, improved performance, and support for digitally disadvantaged languages.


ICU4X is Unicode's modern, lightweight, portable, and secure i18n library. Built from the ground up, its binary size and memory usage footprint is 50-90% smaller than ICU4C. It is memory-safe, written in Rust with interfaces into C++, JavaScript, and TypeScript — and Python, Dart, and Kotlin are in the pipeline. Mozilla Firefox, Google Pixel Watch, core Android, numerous Flutter apps, and more clients are already using ICU4X.


After 6 months of iterating on beta releases and a soft launch earlier this month, the ICU4X Technical Committee is happy to announce ICU4X 2.0. This release brings a new paradigm for locale objects, a rewritten DateTime component, overhauled C++/C/JS interfaces, the latest locale data, and much more.

Date, Time, and Time Zone Formatting

ICU4X 2.0 implements the new semantic datetime skeletons specification in UTS 35. An evolution from previous datetime APIs, the ICU4X DateTime component is designed from decades of experience understanding what developers need from datetime formatting.


With ICU4X 2.0, users pick a "field set" and fine-tune it with "options". There are a fixed number of field sets, which represent all valid combinations of fields.


Users of ICU and JavaScript are familiar with "classical" datetime skeletons and components bags, respectively. The following table illustrates the correlation with semantic datetime skeletons:


ICU Classical Skeleton

ECMA-402 Components Bag

ICU4X 2.0 Rust Code

yMMMd

{ year: "numeric", month: "abbreviated", day: "numeric" }

fieldsets::YMD::medium()

MdEjm

{ month: "numeric", day: "numeric", weekday: "short", hour: "numeric", minute: "numeric" }

fieldsets::MDE::short()
    .time_hm()

jmsV

{ hour: "numeric", minute: "numeric", second: "numeric", timeZoneName: "generic" }

fieldsets::T::hms()
    .zone(zone::GenericShort)


Semantic datetime skeletons, called "field sets with options" in ICU4X, have numerous advantages:


  1. Easier to understand and harder to make mistakes. For example, a common error in ICU skeletons is to write an incorrect skeleton string such as "YMd" or "ymd" instead of the correct "yMd".
  2. Enables new formatting options not possible with components bags or skeletons:
    • Year style: the era, such as "BCE", can be automatically inserted
    • Time precision: the minute can be hidden if it is zero
  3. Prevents nonsensical combinations of fields and options. For example, the ICU4X API prevents "month with minute" (“December 10” for December 5 at 7:10).
  4. Well-suited for data slicing, allowing for minimal data overhead. For example, apps won’t carry weekday names if they are formatting with only a year/month/day or time field set.

Locale Preferences

ICU4X 2.0 introduces Preferences objects, a new paradigm for locale and user preference resolution in component constructors.


The new structures enable richer, type-safe management of user preferences coming from different sources, including locales and other preferences objects. String-based locales are still supported as well.


Locale Identifier String

ICU4X 2.0 Rust Code*

en-US-u-hc-h23

let mut p = Preferences::from(LanguageIdentifier {
    language: language!("en"),
    region: region!("US"),
    ..Default::default()
})
p.hour_cycle = HourCycle::H23;

zh-Hant-TW-u-ca-roc

let mut p = Preferences::from(LanguageIdentifier {
    language: language!("zh"),
    script: Some(script!("Hant")),
    region: Some(region!("TW")),
    ..Default::default()
})
p.calendar_algorithm = CalendarAlgorithm::Roc;

ar-EG-u-nu-latn-fw-sun

let mut p = Preferences::from(LanguageIdentifier {
    language: language!("ar"),
    region: region!("EG"),
    ..Default::default()
})
p.numbering_system = value!("latn").try_into().unwrap();
p.first_day = FirstDay::Sun;


* The type name "Preferences" is a placeholder for the formatter-specific preferences object, such as DecimalFormatterPreferences, a structured object containing all the pieces of a locale required for number formatting: information on the language, script, region, variant, and numbering system preference, but not irrelevant pieces like calendar system.

Cross Programming Language Improvements

The foreign function interface (FFI) has been overhauled with major ergonomic improvements. Key changes include:


  • Separate constructors in FFI for built-in compiled data and data from an explicit data provider, enabling better dead-code elimination for non-Rust clients.

  • C/C++

    • Namespacing: ICU4X types are exported in a namespace, allowing for including "icu4x::DateTimeFormatter" instead of "ICU4XDateTimeFormatter".

    • Smart pointers: ICU4X types are returned within std::unique_ptr instead of internally containing an allocation; allowing more flexible usage with other reference strategies.

    • Versioned ABI: structs that are #[non_exhaustive] in Rust (and methods that use them) are now versioned on both the ABI and in headers, allowing them to evolve safely in future versions

  • JavaScript

    • Enums: enum representation changed from strings to classes. Strings can still be used in the constructor

    • Structs: objects can now be used wherever structs (such as options bags) are required

    • Special methods: constructors, iterator, getters and setters are now exposed idiomatically

    • Documentation: typedoc-generated documentation is a lot more readable now (check it out)

    • ICU4X is now published as an NPM package: https://www.npmjs.com/package/icu  

Other Cross-Cutting Changes

Additional changes you may encounter when upgrading from 1.5 to 2.0:


  1. Many Rust types have gained separate owned and borrowed variants; for example, there are now both "Collator" and "CollatorBorrowed". The borrowed variant is slightly more efficient; it can be created statically from compiled data or derived from the owned variant.
  2. Our internal data storage type has a more efficient binary representation (see the zerovec crate). This means that postcard data generated with ICU4X 1.5 will not work with 2.0.
  3. The icu_locid and icu_locid_transform crates were re-organized into icu_locale and icu_locale_core. This means that icu_locid and icu_locid_transform will be forever at 1.5. If you currently depend directly on icu_locid or icu_locid_transform, you need to switch to icu_locale or icu_locale_core.
  4. The icu_calendar crate now focuses only on calendrical calculations, and a new crate, icu_time, contains pieces from icu_calendar and icu_timezone. The icu_timezone crate will be forever at 1.5. If you currently depend directly on icu_timezone, you need to switch to icu_time.
  5. The icu_datagen crate was split into several sub-crates. If you currently depend directly on icu_datagen, you need to switch to icu_provider_source, icu_provider_export, and/or the icu4x-datagen binary crate.
  6. Performance improvements in multiple components. For example, the normalizer got a data rearrangement that benefits non-NFD normalizations, and the collator now has an identical prefix optimization.
  7. Input types for formatters are now re-exported from the formatter crate to reduce the number of explicit Cargo.toml dependencies.
  8. All crates are updated to the latest CLDR (47) and Unicode (16) versions.

Get started with ICU4X 2.0

ICU4X's new website, icu4x.unicode.org, now hosts tutorials, documentation, and more. The website reflects the current release, with previous releases also available.


Check out our quickstart tutorial, interactive demo, or C++, TypeScript, and (experimental) Dart documentation.


As before, the Rust crate is available at crates.io, with documentation at docs.rs


Please post any questions via GitHub Discussions.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock