Tuesday, February 10, 2026

Unicode 18.0 Alpha review

 Unicode 18.0 Alpha Review Opens for Feedback

By: Peter Constable, Chair of the Unicode Technical Committee


The repertoire for Unicode Version 18.0 is now open for early review and comment. During alpha review, the repertoire is reasonably mature and stable but is not yet completely locked down. Discussion regarding whether certain characters should be removed from the repertoire for publication is welcome. Character names and code point assignments are reasonably firm, but suggestions for improvement may still be considered.


For the alpha review, preliminary data files are also available, with data covering existing and new character repertoire. In addition, a draft for the core specification is available, with new block descriptions for some of the newly-added blocks and scripts. 


The primary focus for the alpha review should be on the new character repertoire. This early review is provided so that reviewers may consider the repertoire and data file issues prior to the start of beta review (currently scheduled to start in May 2026). Once beta review begins, the repertoire, code points, and character names will all be locked down, and no longer be subject to changes.


Sample of representative glyphs for Seal script ideographs


Notable changes

The planned repertoire for Unicode 18.0 adds 13,048 new characters, which would bring the total number of characters to 172,849 characters.


The additions include four new scripts:


  • Small Seal (“Seal”): This comprises the largest portion of the new characters, with 11,328 ideographs. Seal is an important precursor to modern Han ideographs (aka, “CJK”), and has important cultural significance in China and for Chinese speakers throughout the world.

  • Chisoi: A modern script used in eastern India.

  • Jurchen: A historic ideographic script that was used in northeastern China during the Jin and Ming dynasties.

  • Proto-cuneiform—in this version, just numeric signs (other characters have been proposed for a future version).


Other additions include nine new emoji characters, 72 historical mathematical symbols, 323 Cuneiform numeric signs, and three new currency symbols for modern currencies:


  • Maldivian Rufiyaa

  • Omani Rial

  • UAE Dirham


Feedback for the alpha review should be reported under PRI #536 using the Unicode contact form by March 31, 2026.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


Tuesday, February 3, 2026

Highlights from UTC Meeting #186

By: Peter Constable, Chair of the Unicode Technical Committee

The Unicode Technical Committee (UTC) met January 21 – 23 in Sunnyvale, CA. Thanks to Unicode member organization, Google, for hosting. Here are some highlights.

Progress on Unicode 18.0

Version 18.0 of the Unicode Standard is being prepared for publication in September of this year. At meetings 184 and 185, UTC had approved 12,995 characters for encoding in version 18.0. At this meeting, some additional characters were approved for this version. One of these new characters is the Omani rial sign, a currency symbol recently created by the Omani Central Bank. Other additions include 51 mathematical symbols and 10 standardized variation sequences proposed by the PHILIUMM Project.

UTC authorized the Unicode 18.0 Alpha preview release, which will be available February 10 for public review.

Future additions

A typical step in the process for encoding new characters is provisional assignment of code points for characters that UTC has deemed eligible for encoding. This allows working groups to begin development of content — property data, code charts, text for the core spec — for a future version. At this meeting, code points were provisionally assigned for several characters including three new scripts: Leke script, used in SE Asia; and Mwangwego and Shaaldaa scripts, used in Eastern Africa.

New UTS on links

UTC approved a new Unicode Technical Standard, UTS #58 Unicode Link Detection and Serialization. This standard includes character data, and this first version includes data for characters in Unicode 17.0. Starting with Unicode 18.0, this will become a synchronized standard, with a new version released together with each new version of the Unicode Standard.

New joint working group for orthographic sequences

At UTC #185, the Government of India proposed that Unicode develop specifications for orthographically valid cluster sequences for Hindi and other language orthographies. (See L2/26-061.) Such work would overlap the scopes of both the Unicode Technical Committee and the CLDR Technical Committee: Specs would deal with character sequences in a manner similar UAX #29, Unicode Text Segmentation, which is maintained by UTC; but each document would be for the orthography of a specific language, which puts this in the scope of CLDR-TC.

After UTC #185, Unicode leaders discussed options and proposed formation of a joint working group (JWG) between CLDR-TC and UTC. (See L2/26-045.) At this UTC meeting, this JWG was approved by UTC. It was similarly approved by CLDR-TC at one of their recent meetings. This new JWG will get organized and begin working during the next quarter.

Metadata embedded in “plain” text

It recently came to light that another organization has developed a specification to embed AI-related metadata into “unstructured” (i.e., “plain”) text. (See L2/26-042.) This has been motivated by the EU AI Act (AIA), which goes into enforcement in August of this year. Article 50 of the AIA obligates vendors to “mark” AI-generated content with machine-readable metadata so that content can be detectable as being artificially generated. This requirement applies to text content as well as other content types. However, Article 50 doesn’t specify what would count as “marking” of text, neither does it distinguish between different text formats: does it apply to generated source code? SMS messages? file names? But C2PA has taken a conservative approach, anticipating that the EU might enforce the requirement on any AI-generated text.

Unfortunately, the scheme added to the C2PA specification embeds sequences of Unicode variation selector characters in a manner that does not conform to the Unicode Standard.

UTC discussed this situation together with a representative from C2PA. On the one hand, it brought to light that the text of the Unicode Standard wasn’t sufficiently clear about conformance requirements in relation to variation sequences. But UTC was clear that this scheme is non-conformant. Other concerns were mentioned, including that it is a contradiction of terms to say that “unstructured” text can contain metadata. An outcome of this discussion was to recommend that Unicode establish a liaison relationship with C2PA, and that the topic be discussed further between the two organizations.

For complete details on outcomes from UTC #186, see the draft minutes.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


Friday, January 30, 2026

Unicode ICU 78.2 and CLDR 48.1 released

Postal Horn emoji


There are new maintenance releases of ICU and CLDR, with some small but significant changes. To find out more and to download these releases, go to: 

CLDR and ICU are planning an additional maintenance release in March instead of a major release.

The next major releases, CLDR 49 and ICU 79, are planned for October and will include the data from the next CLDR general submission period which is planned to start in early Q2 2026, as well as Unicode 18.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock