Friday, March 8, 2024

Breaking the Cycle πŸ”—πŸ’₯

by Jennifer Daniel

(This article was originally published on Jennifer’s Substack, January 17, 2023. Republished here with minor revision.)

Phoenix image
In the fall of 2022, the Unicode Technical Committee announced that the 2023 release of the Unicode Standard would be a “dot” release with limited character additions, with the next major release in 2024. This wasn’t without precedent — COVID slowed down the release of Unicode 14.0 in 2020 and the world seemed to survive πŸ˜‰. Subcommittees were well prepared and adjusted accordingly, discussing what this meant for their respective areas of expertise.

For the Emoji Subcommittee (ESC) — the group responsible for defining the rules, algorithms, and properties necessary to achieve interoperability between different platforms for those smiley faces that appear on your keyboard (Shout out 😁πŸ₯°πŸ₯ΉπŸ€”πŸ«£πŸ«‘😡‍πŸ’«!) — this delay presented an opportunity. Sure, we were so close to exhaling a sigh of relief (the intake period for Emoji 16.0 proposals had just completed). But upon learning we couldn’t ship any new codepoints until 2024 we turned our energy towards recommending new emoji based on existing ones. (These are called emoji ZWJ sequences. That's when a combination of multiple emoji display as a single emoji … like πŸ‘© 🏽 +🏭 = πŸ§‘πŸ½‍🏭).

When Less is More

An incredibly powerful aspect of written language is that it consists of a finite number of characters that can "do it all". And yet, as the emoji ecosystem has matured over time our keyboards have ballooned and emoji categories are about to hit or have hit a level of saturation. Upon reflecting on how emoji are used, the ESC has entered a new era where the primary way for emoji to move forward is not merely to add more of them to the Unicode Standard. Instead, the ESC approves fewer and fewer emoji proposals every year.

But our work is not done. Not by a longshot. Language is fluid and doesn’t stand still. There is more to do! This “off-cycle” gives us a chance to address some long-standing major pain points using emoji. The first one that came to mind: skin-tone.

What is a family?

The encoding of multi-person multi-tone support has matured over the years; However, the implementation can seem random to the average person: While it’s true, all people emoji have toned options (with the exception of characters where you can’t see skin like 🀺) there are … misfits. Some two people emoji offer tone support ( πŸ§‘πŸ»‍❤️‍πŸ§‘πŸΏ) others do not ( πŸ‘―). A few non RGI emoji render with tone but with no affordance to change one of the two characters (For example, 🀼🏾‍♂ renders with skintone on Android but as gold on iOS. WHY. This is why we standardize these things, people).

And then ... There is the suite of family emoji (πŸ‘¨‍πŸ‘¦πŸ‘¨‍πŸ‘¦‍πŸ‘¦πŸ‘¨‍πŸ‘§πŸ‘¨‍πŸ‘§‍πŸ‘¦πŸ‘¨‍πŸ‘§‍πŸ‘§πŸ‘©‍πŸ‘¦πŸ‘©‍πŸ‘¦‍πŸ‘¦πŸ‘©‍πŸ‘§πŸ‘©‍πŸ‘§‍πŸ‘¦πŸ‘©‍πŸ‘§‍πŸ‘§ πŸ‘¨‍πŸ‘¨‍πŸ‘¦πŸ‘¨‍πŸ‘¨‍πŸ‘¦‍πŸ‘¦πŸ‘¨‍πŸ‘¨‍πŸ‘§πŸ‘¨‍πŸ‘¨‍πŸ‘§‍πŸ‘¦πŸ‘¨‍πŸ‘¨‍πŸ‘§‍πŸ‘§πŸ‘©‍πŸ‘©‍πŸ‘¦πŸ‘©‍πŸ‘©‍πŸ‘¦‍πŸ‘¦πŸ‘©‍πŸ‘©‍πŸ‘§πŸ‘©‍πŸ‘©‍πŸ‘§‍πŸ‘¦πŸ‘©‍πŸ‘©‍πŸ‘§‍πŸ‘§πŸ‘¨‍πŸ‘©‍πŸ‘¦πŸ‘¨‍πŸ‘©‍πŸ‘¦‍πŸ‘¦πŸ‘¨‍πŸ‘©‍πŸ‘§πŸ‘¨‍πŸ‘©‍πŸ‘§‍πŸ‘¦πŸ‘¨‍πŸ‘©‍πŸ‘§‍πŸ‘§πŸ‘ͺ). These characters include two people, three people, sometimes four and none of them have any tone support (!). We seem to have a lot of family emoji and yet simultaneously not enough.

The 26 “family” emoji can be broken down into four groups:

Families image

Despite the Unicode Standard containing 26 “family” emoji, each one of these glyphs is overly prescriptive with regard to delivering on a visual representation of a family. The inclusion of many permutations of families was well intentioned. But we can’t list them all, and by listing some of the combinations, it calls attention to the ones that are excluded.

What even is a family? For some, family is the people you were raised with. Others have embraced friends as their chosen family. Some families have children, other families have pets. There are multi-generational families, mutliracial families and of course many families are any combination of all of these characteristics and more.

Fortunately, we don’t need to add 7000 variants to your keyboards (even this would fall short of capturing the breadth of "family" as a concept). Instead we can juxtapose individual emoji together to capture a concept with some reasonable level of specificity — not too unlike arranging letters together to create words to convey concepts πŸ˜‰

Different families image
For emoji keyboards to advance in creating more intuitive and personalized experiences the Emoji Subcommittee is recommending a visual deprecation of the family emoji. This small set of emoji will be redesigned as part of a multi-phase effort to “complete the set” of toned variants for the remaining multi-person emoji. This of course begs the question: when there are as many families as there are people in the world, is there an effective way at conveying the concept of “family” without being overly prescriptive in defining what is and is not a family? Well, thankfully icons can do a lot of heavy lifting without requiring very much detail.

Famiy, symbol image

When is an emoji running for the police or getting chased by them?

Another area the ESC is actively exploring is how the semantics of emoji sequences can differ when writing directionality changes. Some emoji characters have semantics that encode implicit directionality but when the string is mirrored and their meaning may be unintentionally lost or changed.

Left to Right emoji image
Left to Right Emoji Sequence: Quickly running towards an “exciting” police chase

Right to Left emoji image
Right to Left Emoji Sequence: Running away from the coppers

What, if anything, can we do to aid in ensuring that messages are meaningfully translated be them tiny pictures or tiny letters? As part of 15.1 we’re proposing a small set of emoji with strong directionality — with an initial focus on people — to face the opposite direction. Soon you too can run towards or away from ... excitement.

Emoji 15.1

Given that the intake cycle of emoji proposals for Unicode 16.0 ended last July, the Emoji Subcommittee has also decided to temporarily delay the intake of Unicode Version 17.0 proposals until April 2024. Fortunately, you won’t have to wait until then to get new emoji. (Note: I know it sounds like I’m talking about the past and future simultaneously ... the emoji lifecycle is looooong and as a result overlaps with multiple releases. Expect a future blog post about the Emoji 15.0 candidates landing early this year (Shout out goose, pink heart, and pushing hands). I’ve been holding off writing about this set until you can actually see them on your phones but given that we’re already talking about 2024 maybe it’s time I dust that blog post off).

Emoji 2023 timeline image

Anyways, among the list of Emoji 15.1 recommendations for 2024 includes 578 characters (most of them the candidates described above to support directionality). The list also includes a few humble additions including a broken chain, a lime, a non-poisonous mushroom, a nodding and shaking face, and a phoenix bird. Each one of these leverages a unique valid ZWJ sequence of emoji so while they look like atomic characters made of a single codepoint they are composed of two or more codepoints.

Broken chain and other emoji image

Broken chain is the result of a πŸ”—πŸ’₯ ZWJ and contains a variety of meanings, such as freedom, breaking a cycle, or perhaps a broken url ;-). Nodding face and shaking face are composed of arrows to imply movement in a still image (πŸ™‚↔️) and (πŸ™‚↕️). Oh, and of course there is a phoenix rising from the ashes (🐦πŸ”₯), an ancient metaphor that captures the zeitgeist of today.

The Unicode Technical Committee (UTC) will review the required documents at its first meeting of 2023 in January – and if these candidates move forward, you can expect an update from the UTC later this Spring and Summer.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, March 5, 2024

Unicode CLDR v45 Alpha available for testing

[image]

The Unicode CLDR v45 Alpha is now available for integration testing. 

CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)


Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.


The alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Tickets.


CLDR 45 is a closed release with no submission period, focusing on just a few areas:

MessageFormat 2.0 Tech Preview

Software needs to construct messages that incorporate various pieces of information. The complexities of the world's languages make this challenging. The goal for MessageFormat 2.0  is to allow developers and translators to create natural-sounding, grammatically-correct, user interfaces that can appear in any language and support the needs of diverse cultures.


The new MessageFormat defines the data model, syntax, processing, and conformance requirements for the next generation of dynamic messages. It is intended for adoption by programming languages, software libraries, and software localization tooling. It enables the integration of internationalization APIs (such as date or number formats), and grammatical matching (such as plurals or genders). It is extensible, allowing software developers to create formatting or message selection logic that add on to the core capabilities. Its data model provides a means of representing existing syntaxes, thus enabling gradual adoption by users of older formatting systems.


See also:

Keyboard 3.0 stable version

Keyboard support for digitally disadvantaged languages is often lacking or inconsistent between platforms. The updated LDML Keyboard 3.0 format specifies an interchange format for keyboard data. This will allow keyboard authors to create a single mapping file for their language, which implementations can use to provide that language’s keyboard mapping on their own platform. This format allows both physical and virtual (that is, on-screen or touch) keyboard layouts for a language to be defined in a single file.


See also:

Tooling changes

Many tooling changes are difficult to accommodate in a data-submission release, including performance work and UI improvements. The changes in v45 provide faster turn-around for linguists and higher data quality. They are targeted at the v46 submission period, starting in May, 2024.

For more information

See the draft CLDR v45 release page, which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.



Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock

Tuesday, February 6, 2024

Unicode 16.0 Alpha Review Opens for Feedback

The repertoire for Unicode Version 16.0 is now open for early review and comment until April 2. As a reminder, during alpha review the repertoire is reasonably mature and stable, but is not yet completely locked down. Discussion regarding whether certain characters should be removed from the repertoire for publication is welcome. Character names and code point assignments are reasonably firm, but suggestions for improvement may still be entertained.

This early review is provided so that reviewers may consider the character repertoire and data file issues prior to the start of beta review (currently scheduled to start in May 2024). Once beta review begins, the repertoire, code points, and character names will all be locked down, and no longer be subject to changes.

Notable Changes

Unicode Version 16.0 adds 5,187 new characters, bringing the total number of characters to 155,000. The most significant addition for this release is 3,995 additional Egyptian Hieroglyph characters. There are also seven new scripts and many new symbols. See The Pipeline and the delta code charts for details.


In addition, new “Moji Jōhō Kiban” (ζ–‡ε­—ζƒ…ε ±εŸΊη›€) Japanese source references will be added for over 36,000 CJK unified ideographs. This will be reflected in the code charts for virtually all CJK unified ideograph blocks by additional representative glyphs in the “J” column.


Unicode Emoji 16.0 will include eight new emoji—see PRI #498 Unicode Emoji 16.0 Alpha Candidates.


Some of the new scripts in Unicode 16.0 (Kirat Rai, Tulu-Tigalari, Gurung Khema) include characters that have normalization behavior not seen in earlier versions, which could affect optimized implementations of Unicode normalization, and implementations using “quick check” properties. The relevant data files are available as part of the Unicode 16.0 alpha to allow early review.


Feedback for the alpha review should be reported under PRI #497 using the Unicode contact form by April 2, 2024.


____________________________

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock




Monday, February 5, 2024

Highlights from UTC Meeting #178

Unicode Technical Committee (UTC) meeting #178 was held January 23 to 25 in Sunnyvale, California. Many thanks for Google for hosting. Here are some highlights from the meeting.

 

Preparing Unicode 16.0 Alpha

UTC made final decisions regarding the draft character repertoire for Unicode 16.0 and approved the alpha release. The alpha will be available for public review on February 6th.
 
UTC had previously approved 1,192 characters for Unicode 16.0, but also anticipated inclusion of a large set of Egyptian Hieroglyph extensions. Those were approved at this meeting — 3,995 additional characters, bringing the number of new characters for Unicode 16.0 to 5,187. See the Pipeline page for all characters currently approved for Unicode 16.0, along with code points provisionally assigned for future encoding.
 
There was some discussion about certain of the characters being added in Unicode 16.0 for new scripts (Kirat Rai, Tulu-Tigalari, Gurung Khema) because of normalization behaviour not previously seen that affected normalization optimizations in ICU and could affect other normalization implementations. This had raised a question as to whether to revisit the encoding model for those scripts, or to keep the encoding that UTC had already accepted and make adjustments in ICU. For various reasons, it was decided to do the latter. For more info, see section F.1 in L2/24-009.
 
UTC also approved a new data file to be added to UCD: DoNotEmit.txt will capture in machine-readable form information already included in various chapters of the core spec  regarding characters or sequences of characters that could occur in data but, in fact, should not be used. For example, certain sequences of Devanagari character could appear visually identical to a Devanagari letter but not be canonically equivalent and should not be used. See section 19 of L2/24-013 for more information.
 
Future of UCD #42 UCDXML
Because the people who previously maintained UCDXML were no longer going to be continue that going forward, UTC #177 decided on a plan to stabilize UCDXML at version 15.1. However, there was public review feedback that several projects continue to depend on UCDXML. Seeing that, John Wilcock of Microsoft volunteered to take over maintenance of UCDXML. Thus, UCDXM will be updated for Unicode 16.0 with the latest character repertoire and properties, and will continue to be maintained for future versions, as long as John is available to do that.
 
Text Terminal Working Group
The Text Terminal Working Group was created by UTC in April 2023 to develop specifications for supporting Unicode in text terminal environments. After a few months, however, the chair of the group no longer had time available to chair the group. During last week’s UTC meeting, a new chair was nominated and has since been confirmed by Unicode officers: Fraser Gordon. Fraser’s work involving Unicode began many years ago, extending LiveCode to support Unicode. He is currently also active in the C++ standards committee’s Unicode Study Group (ISO/IEC JTC 1/SC 22/SG 16).
 
Full details on these and other outcomes are provided in the minutes—see L2/24-006.



_______________________________________

Adopt a Character and Support Unicode’s Mission


Looking to give that special someone a special something?
Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock




Wednesday, January 31, 2024

NEW Event on February 20 – Virtual Open House on MessageFormat

 Registration is Now Open!

MessageFormat is a critical API for anyone interested in building fluent, accessible, and well-localized applications. Any part of the user interface that displays data or varies dynamically at runtime needs to provide for the formatting requirements of the locale and the grammatical needs of the user’s language. As such, MessageFormat is “table stakes” for internationalizing applications.

The MessageFormat Working Group is a part of the CLDR Technical Committee of Unicode. After several years of work, they have produced a Technical Preview for MessageFormat 2.0, a next generation specification designed to address critical gaps in current formatting solutions, provide access to new internationalization APIs rooted in CLDR data, and build a syntax that is portable across many programming languages and runtime environments.


Now that the specification is close to being stabilized, the MessageFormat Working Group would like to engage with interested members of the internationalization, developer, localization, and translation communities.


Who: If you are a platform, framework, and programming language developer, localization manager, engineer, or translator, you will want to join us for this virtual Open House event to hear more about the progress achieved, and to bring your questions to the people involved. 


When: Tuesday, February 20, 2024 starting at 9am (San Francisco), 12pm (New York), and 6pm (Berlin).


Register Now! Please note this session will be recorded and available via the Unicode YouTube channel.

Getting Started with Message Formatting


MessageFormat GitHub Repo

Why MessageFormat v2?

Goals & Deliverables

Draft Specification and Syntax

UTW MessageFormat v2 (Video)





About the Unicode Consortium

The Unicode Consortium is the premier non-profit open source, open standards body for the internationalization of all software and services. 


For more than 30 years, the Unicode Consortium has coordinated the efforts of a worldwide team of volunteer programmers and linguists to standardize, evolve, and maintain a global software foundation that allows virtually every computer system and service to help people connect using their native language. 


For additional information about Unicode, visit home.unicode.org.



Adopt a Character and Support Unicode's Mission


Looking to give that special someone a special something? Or maybe 

something to treat yourself?

πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±₿♜πŸ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode's mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause


You can also donate funds or gift stock.


Wednesday, January 17, 2024

Unicode Welcomes New Board Members!

Giammarresi (left) and Chilana (right)
The Unicode Consortium is pleased to welcome Salvatore “Salvo” Giammarresi of Airbnb and Kulpreet Chilana of Apple to its Board of Directors effective this month.

At its annual member meeting last November, the representatives of Unicode’s Full Level members unanimously elected Salvo and Kulpreet and renewed the terms of Brent Getlin (Adobe) and Teresa Marshall (Salesforce).

Salvo is the Head of Localization at Airbnb and a Board Member at Clear Global (formerly known as Translators without Borders). Previously he held global leadership roles at several technology companies including PayPal and Yahoo. He is a published author and speaker on numerous topics, including localization, internationalization, global program management, and international product management.

Kulpreet has worked in software localization at Apple for 8 years and has more than 12 years of experience in the localization and internationalization industry. He is passionate about using all parts of the software stack to preserve the richness of human culture. He currently manages a team of software engineers that evangelize localization across Apple’s platforms, build features for Apple’s international users and own the localization infrastructure in Xcode.

“We’re excited to have Salvo and Kulpreet join us — they both bring extensive experience in localization and internationalization to the Unicode board,” said Mark Davis, Unicode’s board chair and co-founder, “as well as providing different perspectives on technology and priorities. Speaking for the board, I’d also like to thank David Singer, who has retired from the board after 6 years. Aside from many other contributions, David has helped immensely with pivotal transitions in governance.”

Further information on the Unicode Board can be found here.


Adopt a Character and Support Unicode's Mission

Looking to give that special someone a special something? Or maybe something to treat yourself?
πŸ•‰️πŸ’—πŸŽ️🐨πŸ”₯πŸš€ηˆ±πŸ€
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode's mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.




Wednesday, November 15, 2023

Looking to give back differently for this #GivingTuesday?

[image of 3 badges]
Adopt a Character or Emoji to give it the attention it deserves!

Now you can adopt a character and show off your hobby or business, favorite sport, or love. For that special someone who seems to have everything, you can also give a unique gift.

Allergies? 🀧 Traveling? ✈️ No worries, the cat emoji 😺 has no fur and requires no feeding! The dog emoji 🐢? No need to go out for a 3 am walk! Looking to be a Scrabble champ? The strong and fast letter Z is right for you!

​Your good friend is studying to be a doctor. How about the stethoscope emoji as a gift? 🩺Or even an emoji to support your favorite college football team this season! 🏈

With nearly 150,000 characters there's something for everyone! The possibilities are endless! It's also a tax-deductible donation in the United States, to the extent allowed by law. Your company may also provide matching funds.

☯🏏 🏈 ⚽ πŸ”₯πŸŽπŸ’ηˆ±ζˆ€πŸ₯³ πŸ™Œ πŸŽ‚πŸ’—πŸ’Ÿ₨ ₪ € ₭ ₱πŸ₯° 😍♕Ωπ

About Adopt-a-Character

The Adopt-a-Character program was launched in 2015 to support Unicode's mission to ensure everyone can communicate in their own languages. Adopt-a-Character funds have supported work on historic scripts, including Old Uyghur, Old Sogdian, Sogdian, Seal Script (China), and Mayan Hieroglyphs, and Egyptian Hieroglyphs. Additional support has been provided to encode the modern scripts Hanifi Rohingya, Tolong Siki, and Sunuwar, among others.

Characters can be adopted at three levels:

Gold - $5,000
For any particular character there can only be one Gold adoption! Be the only!

Silver - $1,000
For any particular character there can only be five Silver adoptions! Be one of the five to adopt your favorite characters as a Silver adopter!

Bronze - $100
For any character, there are an unlimited number of Bronze-level adoptions! Also a wonderful option!

Each adoption is recognized with a digital badge that you (or your recipient!) can proudly share via your social channels and via websites. Adoptions also come with a digital certificate that you can print to display or email to your giftee!

About the Unicode Consortium

The Unicode Consortium is the premier 501(c)3 non-profit, open source, open standards body for the Internationalization of software and services. It is arguably the most widely deployed software in the world available across 20 billion devices and counting! At its core, Unicode enables people around the world to communicate in any language.

And - if you want to simply make a donation to support Unicode’s work, you can do that, too!

This Giving Tuesday, let's come together to continue to celebrate and preserve linguistic diversity. Adopt a character and make a difference!

Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Monday, November 13, 2023

UTC #177 Highlights

by Peter Constable, UTC Chair

Unicode Technical Committee (UTC) meeting #177 was held November 1 to 3 in Cupertino, California, hosted by Apple. Here are some highlights from the meeting.

Starting the Unicode 16.0 cycle

UTC approved a plan and timeline for the Unicode 16.0 release. Here’s a summary of the timeline:
  • January 2024: UTC #178 will finalize content for the alpha release
  • February – March: alpha release for public review
  • April: UTC #179 will finalize content for the beta release
  • May – June: beta release for public review
  • July: UTC #180 will finalize 16.0 content
  • September: Unicode 16.0 release
UTC is still adjusting to changes in how work for each release is managed. So, while this will be a “full” release, UTC will be conservative about taking on too many changes, particularly to algorithm specifications (UAXes, UTSes). Also, a new format for the core text will be used in this release: instead of PDF, it will be published using Web technologies (HTML, etc.) To get early validation on format changes, the alpha release will include a sampling of content from the core text.

Unicode 16.0 character and emoji repertoire

UTC had previously approved 1,179 characters for encoding in Unicode 16.0. At this UTC meeting, 15 additional characters were approved for version 16.0, including seven emoji characters. UTC has been planning to include nearly 4,000 additional Egyptian Hieroglyphs in Unicode 16.0. The proposal was discussed, and a small revision was requested. It’s expected these will be approved for Unicode 16.0 at the next UTC meeting. Apart from the additional hieroglyphs, we expect no further characters will be added to the Unicode 16.0 repertoire.

Beside characters approved for Unicode 16.0, code points were provisionally assigned for 184 new characters that are candidates for encoding in a future Unicode version.

See the Pipeline page for all characters currently approved for Unicode 16.0, along with code points provisionally assigned for future encoding.

Future of UAX #42, UCD in XML

UAX #42, Unicode Character Database in XML (UCDXML), was originally developed by Eric Muller. He and Laurentiu Iancu maintained UCDXML through many versions, and we’re very grateful for this contribution. Eric and Laurentiu are no longer available to maintain this, however, and no others have volunteered to take over maintenance. After discussion over several months in UTC and in the Properties and Algorithms working group, UTC has concluded the best option for the future of UAX #42 is to stabilize it, with data frozen at Unicode 15.1. A Public Review Issue will be posted to get feedback on this plan.

Future maintenance of UCS repertoire

UTC discussed a proposal for ISO/IEC JTC 1/SC 2 to adopt different process for future maintenance of the repertoire of ISO/IEC 10646 using a maintenance agency rather than the process that is used for developing entirely new standards, as done in the past. It was felt that this would be more agile and would align better to how expert input has guided actual encoding decisions for several years now. This proposal will be formally submitted to JTC 1/SC 2 as a proposal from the US national standards body.

Full details on these and other outcomes are provided in the minutes—see L2/23-231.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]