The Unicode Blog: Highlights from UTC #182

Friday, January 31, 2025

Highlights from UTC #182

By Peter Constable, UTC Chair

A huge thank you to Google for hosting Unicode Technical Committee (UTC) meeting #182 last week, January 22 – 24th in Sunnyvale, CA!

For complete details, see the draft UTC #182 minutes.

Unicode 17.0 alpha repertoire

UTC took technical decisions for the Unicode 17.0 alpha, which will be released for public review next week. No changes were made to the new character repertoire approved for Unicode 17.0 at UTC #181, but changes were made to some details for certain characters.

Some character name changes were approved for some new characters (three Arabic honorific characters and Tolong Siki letters).
For Tangut, the glyph and stroke count will be changed for one character, and the default UCA ordering for Tangut components will be changed.
Four variation sequences will be added for “Sibe” forms of quotation marks (U+2018, U+2019, U+201C, U+201D).
For CJK, some representative glyphs for 11 characters will be changed (one “G” source and ten “V” source”). Also, 1,685 “G” source references will be updated. Various Unihan property value changes were also approved.

Data files

A significant change was approved for data files for The Unicode Standard and the other version-synchronized standards, UTS #10, UTS #39, UTS #46 and UTS #51. Up through Version 16.0, data files for The Unicode Standard have been published in version folders on the Website in the /Public/ folder (e.g., https://www.unicode.org/Public/16.0.0/), while the detail files for synchronized UTSes have been in separate, UTS-specific, folders, each with version subfolders — for example, https://www.unicode.org/Public/emoji/16.0/. Starting with Unicode 17.0, data files for the synchronized UTSes will also be published within a version subfolder under /Public/.

For example, instead of UCD and UTS #51 data files being organized as follows,

https://www.unicode.org/Public/17.0/ucd

https://www.unicode.org/Public/emoji/17.0/

They will instead be organized like this:

https://www.unicode.org/Public/17.0/ucd

https://www.unicode.org/Public/17.0/emoji/

This is close to what has been done in the Public/draft folder for pre-release data files. The organization in that folder will be adjusted for the Unicode 17.0 alpha to match what will be used for the release.

Property/data changes

Some significant property changes were approved for Unicode 17.0, including the following:

The Identifier_Type property defined in UTS #39 is used by some identifier systems to limit the set of valid identifiers. In Version 16.0, all CJK ideographs have had a property value that makes them valid in such identifier systems. UTC #182 approved a change to the Identifier_Type value for a large number of CJK ideographs to make them invalid, matching what ICANN has done for IDNA root zone labels.

The Extended_Pictographic code point property was created to make segmentation behaviours defined in UAX #14 and UAX #29 forward compatible for future emoji characters. When it was created in Unicode 11.0, all unassigned code points in the range 1F000..1FFFD were given this property. When non-emoji characters are assigned in that range, they should not have that property, but UTC has not been consistent to remove that property for those code points. This will be corrected in Unicode 17.0.

UTC #181 also authorized a proposed draft for a possible new UAX #60 to document data for non-CJK ideographs based on L2/25-052; a public review issue for this will be posted soon. This would be analogous to UAX #38 but apply to ideographic scripts such as Nüshu and Tangut.

Please review!

UTC invites feedback on the following proposed specs:

PRI #509, Proposed Draft UTS #58, Unicode Link Detection and Serialization
PRI #510, Proposed Draft UTR #59, East Asian Spacing

As mentioned above, Identifier_Type property values for CJK characters are being changed based on analysis provided by ICANN. Other documents submitted to UTC propose other Identifier_Type changes based on similar analysis. UTC invites review and feedback on these documents; see the following public review issue for details:

PRI 517, Review of Identifier_Type for existing characters

Friday, January 31, 2025

Highlights from UTC #182

Unicode 17.0 alpha repertoire

Data files

Property/data changes

Please review!

UTC invites feedback on the following proposed specs:

Links of Interest

Blog Archive

Labels

Followers

Friday, January 31, 2025

Highlights from UTC #182

Unicode 17.0 alpha repertoire

Data files

Property/data changes

Please review!

UTC invites feedback on the following proposed specs:

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog