Tuesday, March 31, 2026

Unicode ICU 78.3 and CLDR 48.2 released

Unicode® CLDR is the most widely used provider of locale data. It provides the essential building blocks that allow software to display dates, times, and currencies correctly in every language and region. Unicode® ICU provides widely used C/C++/Java internationalization (i18n) libraries and APIs.

We have just published new maintenance releases of ICU and CLDR, with some small but significant changes. To find out more and to download these releases, go to:

CLDR and ICU have each published a maintenance release in March instead of a major release. The next major releases, CLDR 49 and ICU 79, are planned for October and will include the data from the next CLDR general submission period, planned to start in early Q2 2026, as well as Unicode 18.

The following issues are fixed in the CLDR 48.2 and ICU 78.3 maintenance releases:

Several important locale data bug fixes including:

Group separator for number formatting was updated to ' in fr_CH for consistency with other Swiss locales.
Some fixes to date and time formats including: Hv available formats were updated to match behavior in CLDR 47. The previous change caused web compatibility issues related to current JS capabilities.
Fixes for Emoji annotations issues, such as collisions between emoji short names.
Updated abbreviated and narrow AM/PM for ko and ps for consistency with how the wide forms are localized.
Full list of changes are available in Δ48.2

ICU 78.3 includes the CLDR 48.2 changes
ICU also fixes a C++ code point iterator bug
Updates for timezone data 2026a

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

Tuesday, March 3, 2026

UTS #18: More Unicode Properties in Regular Expressions

Regular Expressions, or “Regex”, are the invisible workhorses of the digital world. Regex allows apps and computer systems to find, validate, and change text based on patterns rather than specific words. Unicode properties play a vital role in this. Rather than an application using a fixed list of characters like a-z, A-Z — and failing badly for all but English — Unicode properties take on the burden of supplying meaningful sets of characters, like letters, Greek characters, or Emoji. Properties can be combined, such as Greek letters with an expression like [\p{script=greek}&\p{letter}].

This specification has an update for now covering over 100 different properties. The following are the most important changes, with others found in the modification section.

Section 2.7 Full Properties lists the full set of properties recommended for support. This version adds: IDS_Unary_Operator, NFKC_Simple_Casefold, ID_Compat_Math_Start, ID_Compat_Math_Continue, Indic_Conjunct_Break, and RGI_Emoji_Qualification

Special rules called “matching rules” are used when looking up properties and their values by name. This version recommends the matching rules from Section 5.9 Matching Rules of UAX #44.

By expanding and refining property support in UTS #18, this update strengthens the foundation for global text processing.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Thursday, February 26, 2026

From Central Bank to Code Point: A Roadmap for Currency Symbol Implementation

In the past year, several new currency symbols have been proposed for encoding in the Unicode Standard:

February 2025: The Saudi Central Bank announced the creation of a new symbol for the Saudi riyal.
March 2025: The Central Bank of the U.A.E. announced creation of a new symbol for the UAE Dirham (cf. Dirham Currency Symbol Guideline).
May 2025: A proposal was submitted to encode the symbol for the Maldivian Rufiyaa. (The symbol was created by the Maldives Monetary Authority in 2022.)
November 2025: The Central Bank of Oman announced the creation of a new symbol for the Omani Rial.

The Saudi riyal sign was proposed for encoding just barely in time for it to be included in version 17.0 of the Unicode Standard, released in September 2025. Proposals for the other currency symbols were submitted too late for version 17.0, so the symbols will be encoded in version 18.0, which will be released in September 2026.

Recent currency symbol trend

Distinct currency symbols are not essential for local or international financial transactions, and most currencies are denoted with their written name or an abbreviation; e.g. “kr” for krone. However, in recent years, since the creation of the euro currency and its distinct symbol, several monetary authorities have created distinct symbols to denote their currency. A currency symbol could potentially be created only for private use of the monetary authoring — printing on bills or embossing on coins. Usually, however, currency symbols are intended for public use: to appear on shop signs, online retail sites, or anywhere that currency amounts are presented.

Such public usage leads to a need for the symbol to be encoded in the Unicode Standard and supported in commercial software and services. Standardization of a new character and subsequent support by vendors takes time: typically, at least one year, and often longer. All too often, however, monetary authorities announce creation of a new currency symbol anticipating immediate public adoption, then later discover there will be an unavoidable delay before the new symbol is widely supported in products and services.

For a contrast with another recent currency development, Bulgaria transitioned from their local lev currency to the euro in January 2026, but the transition was formally decided and announced in July 2025, several months before the change went into effect. This allowed several months for vendors to prepare for the change.

Implementing support for the new currency symbols

Vendor support for a new currency symbol can involve many different things, such as the following:

Updates to fonts
Updates to software keyboard layouts or new designs for physical keyboards
Updating locale data and programming interfaces for formatting currency values
Updating software used for generation of financial statements and reports
Updates to applications, online services or devices for commercial transactions

However, all of these require development time, and development can only begin after the new symbol is encoded in the Unicode Standard. People wishing to start using a new currency symbol in applications and services should anticipate that, from the time the symbol is proposed for encoding, it could take many months or even years before vendors have distributed product updates.

Because there is unavoidable delay from when a new currency symbol is proposed to when it can be supported by vendors, monetary authorities are strongly encouraged to engage with the Unicode Consortium at least one year in advance of when a new currency symbol is expected to go into public usage.

Note regarding support on devices

For many devices, including some mobile phones, many vendors do not routinely provide updates, or discontinue providing updates on older devices. For this reason, users should not be surprised if a new currency symbol is not supported natively on a device years after the symbol was introduced. Applications or online services accessed on those devices can have a different update policy however, so experience using such devices could reflect partial support.

Recommendations for implementation of Unicode 18.0 currency symbols

The following three new currency symbols have been approved for encoding in Unicode version 18.0, which will be published in September 2026:

U+20C2 RUFIYAA SIGN
U+20C3 UAE DIRHAM SIGN
U+20C4 OMANI RIAL SIGN

Complete details for these characters are included in the Unicode 18.0 Alpha preview release. The technical details — character names, code points, property data — are unlikely to change before Unicode 18.0 is released, but these details are not completely stable until the Unicode Technical Committee has made the final technical decisions for Unicode 18.0. For this reason, vendors can choose to start working on implementations once the Alpha preview is available, but vendors should not distribute product updates until after Unicode version 18.0 is released in September 2026.

Extending support with CLDR

Many implementations use Unicode CLDR data for currency formatting, so incorporating the new symbols is an important step for widespread support. A CLDR release will follow not long after release of Unicode version 18.0, and will contain the new currency symbols for applicable currencies and locales.

However, the symbols will initially be listed as “alternative” symbols for the respective currencies. The reason for a symbol being an alternative, rather than the default, is to avoid the symbol being displayed in contexts in which available fonts might not yet support the new symbol, causing users to see a missing glyph for their currency; e.g.,

instead of

Later, when there is confidence that the symbols are more widely supported in platforms and fonts, a future CLDR version can update details to list the new currency symbol as the default, rather than as an alternative.

Working together to support local monetary authorities

When monetary authorities introduce a new symbol for their currency, it marks a significant milestone for financial and commercial activity in their domain. The Unicode Consortium is honored to work with monetary authorities, and would like to help make the launch of a new symbol as smooth as possible. With that in mind, we invite monetary authorities planning creation of a new currency symbol to engage with us well in advance of a planned launch.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, February 24, 2026

UTS #58: Making URLs Readable for Humans: From %E0%A4%AE… to महात्मा

People around the world need to use their writing systems in URLs. This is important: in writing their native languages, the majority of humanity uses characters outside of A-Z, and they expect those characters to also work seamlessly.

Browsers and other programs generally handle Unicode in domain names well. But not all browsers and other programs do a good job with domain names, and many make the rest of the URL unreadable. For example, consider the common practice of providing user handles such as the following two:

x.com/rihanna

www.youtube.com/@핑크퐁

The first of these works well in practice — because it is all ASCII. Copying from the address bar and pasting into text provides a readable result. However in the second example, in many browsers and other programs, copying the address bar gives an unreadable string:

www.youtube.com/@핑크

⇩

youtube.com/@%ED%95%91%ED%81%AC%ED%90%81

The names also expand in size and turn into very long, unreadable strings, such as:

hi.wikipedia.org/wiki/महात्मा_गांधी

⇩

hi.wikipedia.org/wiki/%E0%A4%AE%E0%A4%B9%E0%A4%BE%E0%A4%A4%E0%A5%8D%E0%A4%AE%E0%A4%BE_%E0%A4%97%E0%A4%BE%E0%A4%82%E0%A4%A7%E0%A5%80

The other side of the coin is making sure that when programs add links to URLs in a predictable way, linkifying the entire URL, and without extending the link to include sentence punctuation. For example, many programs don’t add links properly to:

… see

https://example.com/αβγ/δεζ?θικ#λμν.

A commonly used email program, for example, stops midway through:

… see

https://example.com/αβγ/δεζ?θικ#λμν.

Others may include the sentence period, question mark, surrounding parenthesis, etc.:

… see

https://example.com/αβγ/δεζ?θικ#λμν.

Users often insert spaces to prevent this. It should be automatic:

… see

https://example.com/αβγ/δεζ?θικ#λμν.

The new UTS #58: Unicode Link Detection and Formatting: URLs and Email Addresses specifies how to format and linkify URLs and email addresses in readable, predictable, user-friendly ways. The data files cover all of the 159,000+ characters in Unicode.

We encourage implementers to adopt this specification for a consistent experience for users worldwide.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, February 10, 2026

Unicode 18.0 Alpha review

Unicode 18.0 Alpha Review Opens for Feedback

By: Peter Constable, Chair of the Unicode Technical Committee

The repertoire for Unicode Version 18.0 is now open for early review and comment. During alpha review, the repertoire is reasonably mature and stable but is not yet completely locked down. Discussion regarding whether certain characters should be removed from the repertoire for publication is welcome. Character names and code point assignments are reasonably firm, but suggestions for improvement may still be considered.

For the alpha review, preliminary data files are also available, with data covering existing and new character repertoire. In addition, a draft for the core specification is available, with new block descriptions for some of the newly-added blocks and scripts.

The primary focus for the alpha review should be on the new character repertoire. This early review is provided so that reviewers may consider the repertoire and data file issues prior to the start of beta review (currently scheduled to start in May 2026). Once beta review begins, the repertoire, code points, and character names will all be locked down, and no longer be subject to changes.

Sample of representative glyphs for Seal script ideographs

Notable changes

The planned repertoire for Unicode 18.0 adds 13,048 new characters, which would bring the total number of characters to 172,849 characters.

The additions include four new scripts:

Small Seal (“Seal”): This comprises the largest portion of the new characters, with 11,328 ideographs. Seal is an important precursor to modern Han ideographs (aka, “CJK”), and has important cultural significance in China and for Chinese speakers throughout the world.
Chisoi: A modern script used in eastern India.
Jurchen: A historic ideographic script that was used in northeastern China during the Jin and Ming dynasties.
Proto-cuneiform—in this version, just numeric signs (other characters have been proposed for a future version).

Other additions include nine new emoji characters, 72 historical mathematical symbols, 323 Cuneiform numeric signs, and three new currency symbols for modern currencies:

Maldivian Rufiyaa
Omani Rial
UAE Dirham

Feedback for the alpha review should be reported under PRI #536 using the Unicode contact form by March 31, 2026.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, March 31, 2026

Unicode ICU 78.3 and CLDR 48.2 released

Adopt a Character and Support Unicode’s Mission

Tuesday, March 3, 2026

UTS #18: More Unicode Properties in Regular Expressions

Adopt a Character and Support Unicode’s Mission

Thursday, February 26, 2026

From Central Bank to Code Point: A Roadmap for Currency Symbol Implementation

Recent currency symbol trend

Implementing support for the new currency symbols

Note regarding support on devices

Recommendations for implementation of Unicode 18.0 currency symbols

Extending support with CLDR

Working together to support local monetary authorities

Adopt a Character and Support Unicode’s Mission

Tuesday, February 24, 2026

UTS #58: Making URLs Readable for Humans: From %E0%A4%AE… to महात्मा

Adopt a Character and Support Unicode’s Mission

Tuesday, February 10, 2026

Unicode 18.0 Alpha review

Adopt a Character and Support Unicode’s Mission

Links of Interest

Blog Archive

Labels

Followers

Tuesday, March 31, 2026

Unicode ICU 78.3 and CLDR 48.2 released

Adopt a Character and Support Unicode’s Mission

Tuesday, March 3, 2026

UTS #18: More Unicode Properties in Regular Expressions

Adopt a Character and Support Unicode’s Mission

Thursday, February 26, 2026

From Central Bank to Code Point: A Roadmap for Currency Symbol Implementation

Recent currency symbol trend

Implementing support for the new currency symbols

Note regarding support on devices

Recommendations for implementation of Unicode 18.0 currency symbols

Extending support with CLDR

Working together to support local monetary authorities

Adopt a Character and Support Unicode’s Mission

Tuesday, February 24, 2026

UTS #58: Making URLs Readable for Humans: From %E0%A4%AE… to महात्मा

Adopt a Character and Support Unicode’s Mission

Tuesday, February 10, 2026

Unicode 18.0 Alpha review

Adopt a Character and Support Unicode’s Mission

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog