Monday, May 5, 2025

Unicode CLDR Version 48: Submission Open

[image] The Unicode CLDR Survey Tool is open for submission for version 48. CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). All major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

Version 48 is focusing on:
  • Unicode 17 additions: new emoji, script names, …
  • Changes to the root and/or English names of many exemplar cities and some metazones
  • Additional number and date formats:
    • New “relative” variant for date-time combining pattern
    • Two new currency formats
    • Rational number formats
    • New ‘Year-First’ calendar formatting for year-month-day order (Gregorian).
  • Units:
    • New units for languages in modern coverage
    • Reworking certain concentration units
  • New Languages available for submission in Survey Tool:
    • Buryat (bua)
    • Coptic (cop)
    • Haitian Creole (ht)
    • Kazakh (Latin) (kk-Latn)
    • Laz (lzz)
    • Luri Bakhtiari (bqi)
    • Nselxcin (Okanagan) (oka)
    • Pāli (pi)
    • Piedmontese (pms)
    • Q’eqchi’ (kek)
    • Samogitian (sgs)
    • Sunuwar (suz)
    • Chinese (Latin) (zh-Latn)
Submission of new data opened recently and is slated to finish on June 11. The new data then enters a vetting phase, where contributors work out which of the supplied data for each field is best. That vetting phase is slated to finish on June 30. A public alpha makes the draft data available in early August, and the final release targets mid-October.

Each new locale starts with a small set of Core Data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle.

Once a language reaches Basic coverage, it has the minimum support for use in language selection, such as on mobile devices. In the next submission cycle, the name for that language is also added for translation for all languages at Modern coverage.

If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Highlights from UTC #183

By Peter Constable, Chair of UTC

Unicode Technical Committee (UTC) meeting #183 was held April 22 – 24. Thanks to member company Microsoft for hosting at its Mountain View, CA campus. Here are some highlights.

Unicode 17.0 Beta

Unicode 17.0 is scheduled for release in September of this year. At UTC #183, technical decisions were taken for updates to be reflected in the Beta release, which will be available for public review later this month.

The most significant changes affecting Unicode 17.0 are encoding of 14 additional characters:
  • A new currency symbol, SAUDI RIYAL SIGN, was proposed by the Saudi Central Bank and will be added to Unicode 17.0. This has been assigned to code point U+20C1. 
    • Note: We know that many vendors will want to implement support for this quickly. Keep in mind that, while it's unlikely that the code point will change, this isn't completely guaranteed until Unicode 17.0 is finalized at the next UTC meeting, in July.
    • For more background, see a recent Unicode Blog article,  Support for the New Saudi Riyal Currency Symbol.
  • Thirteen new CJK unified ideographs will be added, twelve of which are needed for use in China. These were reviewed by experts in the Ideographic Research Group (IRG—a working group within ISO/IEC JTC 1/SC2), who recommended immediate encoding. For more information, see Sections 25 and 27 of the CJK & Unihan Working Group recommendations (L2/25-090).
Three characters that were to be newly-added have been removed. The Unicode 17.0 Alpha included the addition of Sidetic script, with 29 characters. (Sidetic is an historic script used in ancient Anatolia.) Based on expert feedback during the Alpha review, three of the characters were deemed not ready for encoding, and so will be removed from Unicode 17.0. Hence, the Beta will include only 26 Sidetic characters.

With these repertoire changes, Unicode 17.0 Beta will include 4,847 new characters.

There were other notable changes related to CJK Unified Ideographs. Thanks to ongoing research by IRG experts, a number of corrections will be made affecting already-encoded ideographs, including changes to the region-specific glyphs shown in the code charts and to source references (the details that map CJK Unified Ideographs to the specific ideograph forms used in different regions). One significant change being made is the horizontal extension of 2,145 existing CJK Unified Ideographs with the addition of glyphs and source data for those characters reflecting use in China. For details, see section 28 of L2/25-090.

Operational criteria for security-related classification of characters

One Unicode specification, UTS 39, Unicode Security Mechanisms, provides guidance on Unicode characters that should or should not be used in identifier systems where security is an issue, such as Internet domain names. It defines a General Security Profile for identifiers, which gives all Unicode characters a status of allowed or restricted. This is based on a classification of characters by a character property, Identifier_Type. 

Up to now, there has been a basic description of the different Identifier_Type values, but not detailed operational criteria for assigning characters to the various types. UTC reviewed a proposal for such operational criteria—see L2/25-069, Factors used in determining the Identifier_Type of characters. These criteria were informed by work done in ICANN in defining rules used for determining permitted DNS and second-level domain name labels. UTC approved these criteria to be incorporated into UTS #39 and used for this purpose going forward. 

Related to this, the Identifier_Type classifications of over 1000 characters will be revised in Unicode 17.0, in line with these criteria. (Similar changes were made during UTC #182 for a large number of CJK Unified Ideographs.)

New Unicode Technical Standards in development

When I sent email mentioning highlights from UTC #182, I mentioned two technical documents in early stages of development that were available for public review:
  • PRI #509, Proposed Draft UTS #58, Unicode Link Detection and Serialization
  • PRI #510, Proposed Draft UTR #59, East Asian Spacing
UTC #183 advanced both of these from Proposed Draft to Draft status.

Also, the specification for East Asian spacing will be changed from a Unicode Technical Report (UTR) to a Unicode Technical Standard (UTS). Technical reports are used to provide technical information, which could include potential algorithms that could be useful for implementations. But they are not used as a basis for specifying data or algorithms where interoperability between implementations is required. As pointed out in document L2/25-138, this new Unicode technical document will be referenced by CSS specifications for the text-autospace property which is in development and being implemented in browsers. Hence, it is appropriate for this Unicode document to be designated as a UTS.

In addition, UTC reviewed a proposal for another UTS and authorized its development: Proposed Draft UTS #61, Unicode Set Notation. Unicode specs for properties and algorithms often need to refer to sets of code points or strings using property assignments. Certain conventions have been used in UTC specs as well as in certain Unicode-provided tools and implementations, including the Unicode Utilities and ICU, and in the Unicode CLDR LDML spec. However, the conventions used in these various contexts have not been mutually consistent and interoperable. The proposed new UTS is a first step toward convergence of the conventions across these contexts. The proposed draft UTS has been posted for public review, and UTC invites feedback on it:
  • PRI #523, Proposed Draft UTS #61, Unicode Set Notation
Note: some working group reports are referred to for background details, but be sure to check the minutes for definitive outcomes, which sometimes differ from what working groups recommended. For complete details, see the draft UTC #183 minutes

Internationalization & Unicode Technologies: Learn the Concepts, Apply the Tools. @LocWorld Malmö

We are pleased to announce that the Unicode Consortium will be onsite at LocWorld53 from June 3-5, 2025 in Malmö, Sweden. LocWorld is a premier conference for localization professionals, networking, and industry innovation.  We hope to see you there!

On June 3rd, Unicode will offer two training sessions designed specifically for localization specialists. These sessions provide a comprehensive introduction to software internationalization (i18n) and Unicode technologies.

While each session stands independently, they are also complementary.  Session 1 offers a beginner-friendly overview, while Session 2 dives deeper into practical implementation. Whether you're new to i18n or looking to refine your skills, these sessions will equip you with the knowledge and tools to collaborate effectively with developers and create globally accessible software.


June 3rd - Global Toolbox Sessions Highlights

  • Registration for LocWorld Malmö is not required to attend a Global Toolbox session.
  • Register Early - Limited seating at each session.

Session 1

A Friendly Introduction to Software Internationalization (i18n) and Unicode Technologies

Ideal for: Localization Program & Project Managers (15-40 attendees)  

Join us for an engaging session that simplifies the complex world of software internationalization (i18n) and Unicode technologies. This session is tailored for non-technical localization specialists who want to bridge the gap between their expertise and the needs of software development teams. By the end, you’ll feel empowered to contribute to the creation of globally accessible, easily localizable applications and services.

Why You Should Attend  

Whether you’re new to i18n or looking to deepen your understanding, this session will demystify the topic and leave you equipped to champion internationalization in your organization. Plus, you'll get practical tips and insights straight from a Spotify expert!

Ready to expand your horizons?  Reserve your spot today and take the first step toward mastering software i18n!

Session 2

Practical Software Internationalization (i18n) with Unicode Technologies

Ideal for: Localization Program & Project Managers (15-40 attendees)

Take your internationalization (i18n) knowledge to the next level with this practical, hands-on session focused on applying Unicode i18n technologies. Whether you’re continuing from Session 1 or joining as a standalone attendee, this session is perfect for localization specialists looking to guide developers in implementing effective i18n solutions.

Why You Should Attend

Localization is about more than just translating words—it’s about creating seamless, culturally relevant experiences. This session breaks down the practical side of i18n and gives you the tools to turn concepts into action.  Whether you're tackling a global project or simply want to enhance your skill set, this session equips you with the knowledge to make an impact.

Ready to get practical? Sign up today and learn how to bring software i18n to life with confidence!

Pricing and Member Discounts

Attendees from Unicode Organization Members are being offered a discount for attending - €250 for each session, €450 for both (for a single attendee). The cost for Non-Members will be €300 for each session, €550 for both (for a single attendee). This discount is also available to Unicode individual members along with regular contributors and volunteers of the Unicode Technical Committees and Working Groups. Please contact jill@localizationinstitute.com from your company email address for your discount code, if eligible. Registration to attend LocWorld is not required.  

About Our Session Host: Joel Sahleen

Joel Sahleen most recently led the Internationalization Engineering team at Spotify, and is the volunteer lead for the Unicode Education Initiative. Trained as a classical Chinese linguist and a cross-cultural ethical theorist, for the last decade and a half, he has been working in the field of software internationalization and localization as an engineer, an architect, a team lead and a manager. A regular speaker at both internationalization and localization conferences, he is passionate about the technical, practical, and ethical aspects of developing software for a global audience.


If you have any questions, please contact us at events@unicode.org.