Thursday, May 14, 2026

Unicode CLDR 49: Submission open through June 2026

Ballot Box with Ballot emoji image

The Unicode® CLDR Survey Tool is open for submission for version 49 through June (see detailed schedule below). CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort order, etc.). All major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)


Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.


The new areas in CLDR 49 are focused on:

  • Unicode 18 additions: new emoji, script names, …

  • Improvements in date and time and locale display names formatting

  • New languages available for submission in Survey Tool: Adyghe [ady], Brahui [brh], Hunsrik [hrx],  Interslavic [isv], Kabardian [kbd], Kaitag [xdq], Mara [mrh], and Susu [sus] 


General Submission for TC locales* opened recently and is slated to finish on June 10, 2026. The Survey Tool then enters a vetting phase, where contributors select the best data for each field. That vetting phase is slated to finish on June 29. The draft data will be available in a public alpha in early August, and the final release is targeted for mid-October.


Other locales, managed by the DDL Working Group, have a longer submission period to allow smaller organizations to submit data on a more flexible timeline. The Survey Tool opened earlier for these locales, and will stay in Extended Submission until the end of June, so that these organizations can contribute data for the current release.


Each new locale starts with a small set of Core data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the following submission cycle. Once a language reaches Basic coverage, it has the minimum support for use in language selection, such as on mobile devices. In the next submission cycle, the name of that language is also added for translation for all languages at Modern coverage. Locales that reach a higher level of coverage (Moderate or Modern) are suitable for general-purpose support in applications and operating systems.


If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.


* TC Locales are ones for which major organizations commit to adding data in concert over a short span of time each year.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

Wednesday, May 6, 2026

Unicode Technology Workshop 2026: Call for Sessions Now Open!

For those interested in participating in and contributing to Unicode Technology Workshop 2026: Unicode in the World, the call for submissions of session and tutorial proposals is now open. If you work on Unicode internationalization technologies or use Unicode internationalization technologies in your work, we want to hear from you. You can register your interest in contributing using the following link: Call for Submissions

Monday, April 13, 2026

ICU4X 2.2 released!

 The ICU4X Technical Committee is happy to announce ICU4X 2.2, an update to our modular, portable, and secure i18n library.

ICU4X is Unicode's modern, lightweight, portable, and secure i18n library. Built from the ground up, its binary size and memory usage footprint is 50-90% smaller than ICU4C. It is memory-safe, written in Rust with interfaces into C++, JavaScript, Dart, TypeScript, Kotlin — with other languages in the timeline. Mozilla Firefox, Google Chrome, Google Pixel Watch, core Android, numerous Flutter apps, and more clients are already using ICU4X.

Important changes in ICU4X 2.2 include:

1. Latest i18n data: This release includes an update to CLDR 48.2 and support for TZDB 2026a.
2. New and improved icu_calendar: This release contains new APIs in icu_calendar, as well as some behavior changes in icu_calendar; see the migration notes on GitHub.
a. Datetime arithmetic: It is now possible to add and subtract dates. 
b. More flexible date construction: Build dates from all kinds of constituent data: extended years, era years, ordinal months, month codes, etc., with support for different kinds of overflow handling.
c. Typed months: The new Month type replaces month codes in a type safe way.
d. Experimental third-party crate integration: We now support converting and formatting types from the jiff, chrono, and time crates. See icu_datetime::input::third_party. We’re not yet sure if these integrations should live in ICU4X, in the third party crates, or some adapter crate. We welcome your feedback!
e. Changes to Japanese and Hijri calendars: We no longer support pre-Meiji eras because CLDR removed them, and we now always use Umm al-Qura data for simulated Hijri. See the migration notes on GitHub for more details.
3. Experimental Kotlin Bindings: We now have Kotlin bindings for ICU4X (found under ffi/mvn), with the same set of supported APIs as our other cross-language bindings.
4. Experimental features:
a. Display names: Adds new internal data layout exposed via RegionDisplayName and ScriptDisplayName APIs. The old data layout, optimized for loading multiple names at once, is moved into the multi module. Please share feedback on our tracking issue.
b. Compact decimal formatter: Please share feedback in preparation for stabilization in a future release.
c. ML segmentation: Initial code for RAdaBoost word segmenter for Chinese and CNN word segmenter for Thai.
5. Better hour cycles: Adds support for Clock12 and Clock24 in datetime formatting.


Check out our quickstart tutorial, interactive demo, or C++, TypeScript, and Dart documentation.

As before, the Rust crate is available at crates.io, with documentation at docs.rs

Please post any questions via GitHub Discussions.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, March 31, 2026

Unicode ICU 78.3 and CLDR 48.2 released

Postal Horn emoji


Unicode® CLDR is the most widely used provider of locale data. It provides the essential building blocks that allow software to display dates, times, and currencies correctly in every language and region. Unicode® ICU provides widely used C/C++/Java internationalization (i18n) libraries and APIs.

We have just published new maintenance releases of ICU and CLDR, with some small but significant changes. To find out more and to download these releases, go to: 



CLDR and ICU have each published a maintenance release in March instead of a major release. The next major releases, CLDR 49 and ICU 79, are planned for October and will include the data from the next CLDR general submission period, planned to start in early Q2 2026, as well as Unicode 18.


The following issues are fixed in the CLDR 48.2 and ICU 78.3 maintenance releases:


  • Several important locale data bug fixes including:

    • Group separator for number formatting was updated to ' in fr_CH for consistency with other Swiss locales.

    • Some fixes to date and time formats including: Hv available formats were updated to match behavior in CLDR 47. The previous change caused web compatibility issues related to current JS capabilities.

    • Fixes for Emoji annotations issues, such as collisions between emoji short names.

    • Updated abbreviated and narrow AM/PM for ko and ps for consistency with how the wide forms are localized.

    • Full list of changes are available in Δ48.2

  • ICU 78.3 includes the CLDR 48.2 changes

  • ICU also fixes a C++ code point iterator bug

  • Updates for timezone data 2026a


----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

Tuesday, March 3, 2026

UTS #18: More Unicode Properties in Regular Expressions

Regular Expressions, or “Regex”, are the invisible workhorses of the digital world. Regex allows apps and computer systems to find, validate, and change text based on patterns rather than specific words. Unicode properties play a vital role in this. Rather than an application using a fixed list of characters like a-z, A-Z — and failing badly for all but English — Unicode properties take on the burden of supplying meaningful sets of characters, like letters, Greek characters, or Emoji. Properties can be combined, such as Greek letters with an expression like [\p{script=greek}&\p{letter}].

This specification has an update for now covering over 100 different properties. The following are the most important changes, with others found in the modification section.

  • Section 2.7 Full Properties lists the full set of properties recommended for support. This version adds: IDS_Unary_Operator, NFKC_Simple_Casefold, ID_Compat_Math_Start, ID_Compat_Math_Continue, Indic_Conjunct_Break, and RGI_Emoji_Qualification
  • Special rules called “matching rules” are used when looking up properties and their values by name. This version recommends the matching rules from Section 5.9 Matching Rules of UAX #44.

By expanding and refining property support in UTS #18, this update strengthens the foundation for global text processing.


----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Thursday, February 26, 2026

From Central Bank to Code Point: A Roadmap for Currency Symbol Implementation


In the past year, several new currency symbols have been proposed for encoding in the Unicode Standard:

  • February 2025: The Saudi Central Bank announced the creation of a new symbol for the Saudi riyal.

  • March 2025: The Central Bank of the U.A.E. announced creation of a new symbol for the UAE Dirham (cf. Dirham Currency Symbol Guideline).

  • May 2025: A proposal was submitted to encode the symbol for the Maldivian Rufiyaa. (The symbol was created by the Maldives Monetary Authority in 2022.)

  • November 2025: The Central Bank of Oman announced the creation of a new symbol for the Omani Rial.

The Saudi riyal sign was proposed for encoding just barely in time for it to be included in version 17.0 of the Unicode Standard, released in September 2025. Proposals for the other currency symbols were submitted too late for version 17.0, so the symbols will be encoded in version 18.0, which will be released in September 2026.

Recent currency symbol trend

Distinct currency symbols are not essential for local or international financial transactions, and most currencies are denoted with their written name or an abbreviation; e.g. “kr” for krone. However, in recent years, since the creation of the euro currency and its distinct symbol, several monetary authorities have created distinct symbols to denote their currency. A currency symbol could potentially be created only for private use of the monetary authoring — printing on bills or embossing on coins. Usually, however, currency symbols are intended for public use: to appear on shop signs, online retail sites, or anywhere that currency amounts are presented.

Such public usage leads to a need for the symbol to be encoded in the Unicode Standard and supported in commercial software and services. Standardization of a new character and subsequent support by vendors takes time: typically, at least one year, and often longer. All too often, however, monetary authorities announce creation of a new currency symbol anticipating immediate public adoption, then later discover there will be an unavoidable delay before the new symbol is widely supported in products and services.

For a contrast with another recent currency development, Bulgaria transitioned from their local lev currency to the euro in January 2026, but the transition was formally decided and announced in July 2025, several months before the change went into effect. This allowed several months for vendors to prepare for the change.

Implementing support for the new currency symbols

Vendor support for a new currency symbol can involve many different things, such as the following:

  • Updates to fonts

  • Updates to software keyboard layouts or new designs for physical keyboards

  • Updating locale data and programming interfaces for formatting currency values

  • Updating software used for generation of financial statements and reports

  • Updates to applications, online services or devices for commercial transactions

However, all of these require development time, and development can only begin after the new symbol is encoded in the Unicode Standard. People wishing to start using a new currency symbol in applications and services should anticipate that, from the time the symbol is proposed for encoding, it could take many months or even years before vendors have distributed product updates.

Because there is unavoidable delay from when a new currency symbol is proposed to when it can be supported by vendors, monetary authorities are strongly encouraged to engage with the Unicode Consortium at least one year in advance of when a new currency symbol is expected to go into public usage.

Note regarding support on devices

For many devices, including some mobile phones, many vendors do not routinely provide updates, or discontinue providing updates on older devices. For this reason, users should not be surprised if a new currency symbol is not supported natively on a device years after the symbol was introduced. Applications or online services accessed on those devices can have a different update policy however, so experience using such devices could reflect partial support.

Recommendations for implementation of Unicode 18.0 currency symbols

The following three new currency symbols have been approved for encoding in Unicode version 18.0, which will be published in September 2026:

  • U+20C2 RUFIYAA SIGN

  • U+20C3 UAE DIRHAM SIGN

  • U+20C4 OMANI RIAL SIGN

Complete details for these characters are included in the Unicode 18.0 Alpha preview release. The technical details — character names, code points, property data — are unlikely to change before Unicode 18.0 is released, but these details are not completely stable until the Unicode Technical Committee has made the final technical decisions for Unicode 18.0. For this reason, vendors can choose to start working on implementations once the Alpha preview is available, but vendors should not distribute product updates until after Unicode version 18.0 is released in September 2026.

Extending support with CLDR

Many implementations use Unicode CLDR data for currency formatting, so incorporating the new symbols is an important step for widespread support. A CLDR release will follow not long after release of Unicode version 18.0, and will contain the new currency symbols for applicable currencies and locales. 


However, the symbols will initially be listed as “alternative” symbols for the respective currencies. The reason for a symbol being an alternative, rather than the default, is to avoid the symbol being displayed in contexts in which available fonts might not yet support the new symbol, causing users to see a missing glyph for their currency; e.g.,


instead of


Later, when there is confidence that the symbols are more widely supported in platforms and fonts, a future CLDR version can update details to list the new currency symbol as the default, rather than as an alternative.

Working together to support local monetary authorities

When monetary authorities introduce a new symbol for their currency, it marks a significant milestone for financial and commercial activity in their domain. The Unicode Consortium is honored to work with monetary authorities, and would like to help make the launch of a new symbol as smooth as possible. With that in mind, we invite monetary authorities planning creation of a new currency symbol to engage with us well in advance of a planned launch.



----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, February 24, 2026

UTS #58: Making URLs Readable for Humans: From %E0%A4%AE… to महात्मा

People around the world need to use their writing systems in URLs. This is important: in writing their native languages, the majority of humanity uses characters outside of A-Z, and they expect those characters to also work seamlessly.

Browsers and other programs  generally handle Unicode in domain names well. But not all browsers and other programs do a good job with domain names, and many make the rest of the URL unreadable.  For example, consider the common practice of providing user handles such as the following two:

x.com/rihanna

www.youtube.com/@핑크퐁

The first of these works well in practice — because it is all ASCII. Copying from the address bar and pasting into text provides a readable result. However in the second example, in many browsers and other programs, copying the address bar gives an unreadable string:

www.youtube.com/@핑크
youtube.com/@%ED%95%91%ED%81%AC%ED%90%81

The names also expand in size and turn into very long, unreadable strings, such as:

hi.wikipedia.org/wiki/महात्मा_गांधी
hi.wikipedia.org/wiki/%E0%A4%AE%E0%A4%B9%E0%A4%BE%E0%A4%A4%E0%A5%8D%E0%A4%AE%E0%A4%BE_%E0%A4%97%E0%A4%BE%E0%A4%82%E0%A4%A7%E0%A5%80

The other side of the coin is making sure that when programs add links to URLs in a predictable way, linkifying the entire URL, and without extending the link to include sentence punctuation. For example, many programs don’t add links properly to:

… see
https://example.com/αβγ/δεζ?θικ#λμν.


A commonly used email program, for example, stops midway through:

… see
https://example.com/αβγ/δεζ?θικ#λμν.


Others may include the sentence period, question mark, surrounding parenthesis, etc.:

… see
https://example.com/αβγ/δεζ?θικ#λμν.


Users often insert spaces to prevent this. It should be automatic:

… see
https://example.com/αβγ/δεζ?θικ#λμν.


The new UTS #58: Unicode Link Detection and Formatting: URLs and Email Addresses specifies how to format and linkify URLs and email addresses in readable, predictable, user-friendly ways. The data files cover all of the 159,000+ characters in Unicode.

We encourage implementers to adopt this specification for a consistent experience for users worldwide.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock