Tuesday, February 3, 2026

Highlights from UTC Meeting #186

By: Peter Constable, Chair of the Unicode Technical Committee

The Unicode Technical Committee (UTC) met January 21 – 23 in Sunnyvale, CA. Thanks to Unicode member organization, Google, for hosting. Here are some highlights.

Progress on Unicode 18.0

Version 18.0 of the Unicode Standard is being prepared for publication in September of this year. At meetings 184 and 185, UTC had approved 12,995 characters for encoding in version 18.0. At this meeting, some additional characters were approved for this version. One of these new characters is the Omani rial sign, a currency symbol recently created by the Omani Central Bank. Other additions include 51 mathematical symbols and 10 standardized variation sequences proposed by the PHILIUMM Project.

UTC authorized the Unicode 18.0 Alpha preview release, which will be available February 10 for public review.

Future additions

A typical step in the process for encoding new characters is provisional assignment of code points for characters that UTC has deemed eligible for encoding. This allows working groups to begin development of content — property data, code charts, text for the core spec — for a future version. At this meeting, code points were provisionally assigned for several characters including three new scripts: Leke script, used in SE Asia; and Mwangwego and Shaaldaa scripts, used in Eastern Africa.

New UTS on links

UTC approved a new Unicode Technical Standard, UTS #58 Unicode Link Detection and Serialization. This standard includes character data, and this first version includes data for characters in Unicode 17.0. Starting with Unicode 18.0, this will become a synchronized standard, with a new version released together with each new version of the Unicode Standard.

New joint working group for orthographic sequences

At UTC #185, the Government of India proposed that Unicode develop specifications for orthographically valid cluster sequences for Hindi and other language orthographies. (See L2/26-061.) Such work would overlap the scopes of both the Unicode Technical Committee and the CLDR Technical Committee: Specs would deal with character sequences in a manner similar UAX #29, Unicode Text Segmentation, which is maintained by UTC; but each document would be for the orthography of a specific language, which puts this in the scope of CLDR-TC.

After UTC #185, Unicode leaders discussed options and proposed formation of a joint working group (JWG) between CLDR-TC and UTC. (See L2/26-045.) At this UTC meeting, this JWG was approved by UTC. It was similarly approved by CLDR-TC at one of their recent meetings. This new JWG will get organized and begin working during the next quarter.

Metadata embedded in “plain” text

It recently came to light that another organization has developed a specification to embed AI-related metadata into “unstructured” (i.e., “plain”) text. (See L2/26-042.) This has been motivated by the EU AI Act (AIA), which goes into enforcement in August of this year. Article 50 of the AIA obligates vendors to “mark” AI-generated content with machine-readable metadata so that content can be detectable as being artificially generated. This requirement applies to text content as well as other content types. However, Article 50 doesn’t specify what would count as “marking” of text, neither does it distinguish between different text formats: does it apply to generated source code? SMS messages? file names? But C2PA has taken a conservative approach, anticipating that the EU might enforce the requirement on any AI-generated text.

Unfortunately, the scheme added to the C2PA specification embeds sequences of Unicode variation selector characters in a manner that does not conform to the Unicode Standard.

UTC discussed this situation together with a representative from C2PA. On the one hand, it brought to light that the text of the Unicode Standard wasn’t sufficiently clear about conformance requirements in relation to variation sequences. But UTC was clear that this scheme is non-conformant. Other concerns were mentioned, including that it is a contradiction of terms to say that “unstructured” text can contain metadata. An outcome of this discussion was to recommend that Unicode establish a liaison relationship with C2PA, and that the topic be discussed further between the two organizations.

For complete details on outcomes from UTC #186, see the draft minutes.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock