At the intersection of human and computer languages, internationalization (i18n) continues to play a pivotal role in modern software. Evolving i18n libraries means better quality experiences, improved performance, and support for digitally disadvantaged languages.
ICU4X is Unicode's modern, lightweight, portable, and secure i18n library. Built from the ground up, its binary size and memory usage footprint is 50-90% smaller than ICU4C. It is memory-safe, written in Rust with interfaces into C++, JavaScript, and TypeScript — and Python, Dart, and Kotlin are in the pipeline. Mozilla Firefox, Google Pixel Watch, core Android, numerous Flutter apps, and more clients are already using ICU4X.
After 6 months of iterating on beta releases and a soft launch earlier this month, the ICU4X Technical Committee is happy to announce ICU4X 2.0. This release brings a new paradigm for locale objects, a rewritten DateTime component, overhauled C++/C/JS interfaces, the latest locale data, and much more.
Date, Time, and Time Zone Formatting
ICU4X 2.0 implements the new semantic datetime skeletons specification in UTS 35. An evolution from previous datetime APIs, the ICU4X DateTime component is designed from decades of experience understanding what developers need from datetime formatting.
With ICU4X 2.0, users pick a "field set" and fine-tune it with "options". There are a fixed number of field sets, which represent all valid combinations of fields.
Users of ICU and JavaScript are familiar with "classical" datetime skeletons and components bags, respectively. The following table illustrates the correlation with semantic datetime skeletons:
ICU Classical Skeleton | ECMA-402 Components Bag | ICU4X 2.0 Rust Code |
yMMMd | { year: "numeric", month: "abbreviated", day: "numeric" } | fieldsets::YMD::medium() |
MdEjm | { month: "numeric", day: "numeric", weekday: "short", hour: "numeric", minute: "numeric" } | fieldsets::MDE::short() .time_hm() |
jmsV | { hour: "numeric", minute: "numeric", second: "numeric", timeZoneName: "generic" } | fieldsets::T::hms() .zone(zone::GenericShort) |
Semantic datetime skeletons, called "field sets with options" in ICU4X, have numerous advantages:
- Easier to understand and harder to make mistakes. For example, a common error in ICU skeletons is to write an incorrect skeleton string such as "YMd" or "ymd" instead of the correct "yMd".
- Enables new formatting options not possible with components bags or skeletons:
- Year style: the era, such as "BCE", can be automatically inserted
- Time precision: the minute can be hidden if it is zero
- Prevents nonsensical combinations of fields and options. For example, the ICU4X API prevents "month with minute" (“December 10” for December 5 at 7:10).
- Well-suited for data slicing, allowing for minimal data overhead. For example, apps won’t carry weekday names if they are formatting with only a year/month/day or time field set.
Locale Preferences
ICU4X 2.0 introduces Preferences objects, a new paradigm for locale and user preference resolution in component constructors.
The new structures enable richer, type-safe management of user preferences coming from different sources, including locales and other preferences objects. String-based locales are still supported as well.
Locale Identifier String | ICU4X 2.0 Rust Code* |
en-US-u-hc-h23 | let mut p = Preferences::from(LanguageIdentifier { language: language!("en"), region: region!("US"), ..Default::default() }) p.hour_cycle = HourCycle::H23; |
zh-Hant-TW-u-ca-roc | let mut p = Preferences::from(LanguageIdentifier { language: language!("zh"), script: Some(script!("Hant")), region: Some(region!("TW")), ..Default::default() }) p.calendar_algorithm = CalendarAlgorithm::Roc; |
ar-EG-u-nu-latn-fw-sun | let mut p = Preferences::from(LanguageIdentifier { language: language!("ar"), region: region!("EG"), ..Default::default() }) p.numbering_system = value!("latn").try_into().unwrap(); p.first_day = FirstDay::Sun; |
* The type name "Preferences" is a placeholder for the formatter-specific preferences object, such as DecimalFormatterPreferences, a structured object containing all the pieces of a locale required for number formatting: information on the language, script, region, variant, and numbering system preference, but not irrelevant pieces like calendar system.
Cross Programming Language Improvements
The foreign function interface (FFI) has been overhauled with major ergonomic improvements. Key changes include:
Other Cross-Cutting Changes
Additional changes you may encounter when upgrading from 1.5 to 2.0:
- Many Rust types have gained separate owned and borrowed variants; for example, there are now both "Collator" and "CollatorBorrowed". The borrowed variant is slightly more efficient; it can be created statically from compiled data or derived from the owned variant.
- Our internal data storage type has a more efficient binary representation (see the zerovec crate). This means that postcard data generated with ICU4X 1.5 will not work with 2.0.
- The icu_locid and icu_locid_transform crates were re-organized into icu_locale and icu_locale_core. This means that icu_locid and icu_locid_transform will be forever at 1.5. If you currently depend directly on icu_locid or icu_locid_transform, you need to switch to icu_locale or icu_locale_core.
- The icu_calendar crate now focuses only on calendrical calculations, and a new crate, icu_time, contains pieces from icu_calendar and icu_timezone. The icu_timezone crate will be forever at 1.5. If you currently depend directly on icu_timezone, you need to switch to icu_time.
- The icu_datagen crate was split into several sub-crates. If you currently depend directly on icu_datagen, you need to switch to icu_provider_source, icu_provider_export, and/or the icu4x-datagen binary crate.
- Performance improvements in multiple components. For example, the normalizer got a data rearrangement that benefits non-NFD normalizations, and the collator now has an identical prefix optimization.
- Input types for formatters are now re-exported from the formatter crate to reduce the number of explicit Cargo.toml dependencies.
- All crates are updated to the latest CLDR (47) and Unicode (16) versions.
Get started with ICU4X 2.0
ICU4X's new website, icu4x.unicode.org, now hosts tutorials, documentation, and more. The website reflects the current release, with previous releases also available.
Check out our quickstart tutorial, interactive demo, or C++, TypeScript, and (experimental) Dart documentation.
As before, the Rust crate is available at crates.io, with documentation at docs.rs.
Please post any questions via GitHub Discussions.