Friday, September 19, 2025

Unicode CLDR 48 Alpha available for testing

The Unicode CLDR 48 Alpha is now available for integration testing. 

CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

The alpha has already been integrated into the development versions of ICU 78 and ICU4X. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed via CLDR Tickets.

Some of the most significant changes in this release are the following (for more detail, see the CLDR 48 release note page):

  • Updated for Unicode 17, including new names and search terms for new emoji, new sort order, Han→Latin romanization additions for many characters.
  • Updated to the latest external standards and data sources, such as the language subtag registry, UN M49 macro regions, ISO 4217 currencies, etc.
  • Many enhancements of the CLDR specification (LDML) are due for addition by Oct 1.
  • Many additions to language data including:
    • Likely Subtags, for deriving the likely script and region from the language (used in many processes)
    • Language populations in countries: significant updates to improve accuracy and maintainability
  • New formatting options
    • Rational number formats added, allowing for formats like “5½”
    • For timezones, usesMetazone adds two new attributes stdOffset and dstOffset so that implementations can use either “vanguard” or “rearguard” TZDB data sources
    • Combination formats added for relative dates + times, such as “tomorrow at 12:30”
    • Additional units added for scientific contexts (coulombs, farads, teslas, etc.) and for English systems (fortnights, imperial pints, etc.)
  • Many corrections and updates for Metazone data, for calendars (including removal of eras and fixes to start dates).
  • This is the first release where the new CLDR Organization process is in place for DDL languages. As a result, several locales were able to reach higher levels (see below).


Locale Coverage Levels

Level
Count
With Script
Regional Variants
Usage
Modern
104
5
305
Suitable for full UI internationalization
Moderate
13
0
1
Suitable for “document content” internationalization, eg. in spreadsheet
Basic
57
10
22
Suitable for locale selection, eg. choice of language on mobile phone


Changes in coverage 

±

New Level

Locales

πŸ“ˆ

Modern

Akan, Bashkir, Chuvash, Kazakh (Arabic), Romansh, Shan, Quechua

πŸ“ˆ

Moderate

Anii, Esperanto

πŸ“ˆ

Basic

Buriat, Piedmontese, Sicilian, Tuvinian

πŸ“‰

Basic*

Baluchi (Latin), Kurdish


For the details, see the CLDR 48 release note page, which has information on accessing the data, reviewing charts of the changes, and — importantly — will cover Migration issues.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock



Tuesday, September 9, 2025

Unicode 17.0 Release Announcement


Announcing The Unicode® Standard, Version 17.0

The Unicode Standard is the foundation for all global digital communications, providing the encoding for text content used in all devices. The latest version of the standard, Version 17.0, is now available! This is a major update that includes new characters and code charts, updated data files, an updated Core Specification, and updated annexes and synchronized standards that cover implementation details for important aspects of text processing.     

              

This version adds 4,803 new characters, including four new scripts, eight new emoji characters, as well as many other characters and symbols, bringing the  total of encoded characters to 159,801.

One of the newly encoded symbols is SAUDI RIYAL SIGN. Addition of this in Unicode 17.0 will allow interoperable support for the new symbol announced earlier this year by the Saudi Central Bank to represent their riyal currency.

The new additions also include 4,298 additional CJK unified ideographs in a new block, CJK Unified Ideographs Extension J, as well as 18 other CJK ideographs added to the existing Extension C and Extension E blocks. This increases the number of encoded CJK ideographs to over 100,000! Also, nearly 2,500 already-encoded CJK ideographs are horizontally extended by the addition of source references and glyphs reflecting use of those ideographs in China and Korea.

The following four new scripts increase the total number of supported scripts in the Unicode Standard to 172:
  • Beria Erfe is a modern-use script used by Zaghawa communities in central Africa.
  • Tolong Siki is a modern-use script used by Kurukh communities in northeast India.
  • Tai Yo is the traditional script of Tai Yo communities in northern Vietnam.
  • Sidetic is an historic script used in ancient Anatolia.
Support for these in Unicode is the key initial step in bridging the digital divide for users of these scripts. 

See the delta code charts for details on all the new scripts and characters. For additional details regarding new emoji, see Emoji Recently Added, v17.0. For complete details on Unicode Version 17.0, see  https://www.unicode.org/versions/Unicode17.0.0/

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock