Thursday, February 23, 2023

The Unicode CLDR v43 Alpha is now available for integration testing

[image] CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

The Alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Tickets.

Alpha means that the main data and charts are available for review, but the specification, JSON data, and other components are not yet ready for review. Data may change if release-blocking bugs are found. The planned schedule is:
  • 2023 Mar 15, Wed — public Beta (data)
  • 2023 Mar 29, Wed — public Beta2 (data & spec)
  • 2023 Apr 12, Wed — Release
CLDR 43 is a limited-submission release, focusing on just a few areas:
  1. Formatting Person Names
    • Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
  2. Adding substantially to the LikelySubtags data
    • This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance.
    • The data has been contributed by SIL.
  3. Other data updates
    • Alternate names for Turkey / TΓΌrkiye
    • Name for the new timezone Ciudad JuΓ‘rez
  4. Structure
    • Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
    • Cleanup of the inheritance structure in CLDR
  5. Collation & Searching
    • Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.

To find out more about these and other changes, see the draft CLDR v43 release page, which has information on accessing the date, reviewing charts of the changes, and — importantly — Migration issues.

Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.


Tuesday, February 7, 2023

Unicode 15.1 Alpha Review Opens for Feedback

[image] The repertoire for Unicode 15.1 is now open for early review and comment. As a reminder, during alpha review the repertoire is reasonably mature and stable, but is not yet completely locked down. Discussion regarding whether certain characters should be removed from the repertoire for publication is welcome. Character names and code point assignments are reasonably firm, but suggestions for improvement may still be entertained.

This early review is provided so that reviewers may consider the character repertoire and data file issues prior to the start of beta review (currently scheduled to start in May 2023). Once beta review begins, the repertoire, code points, and character names will all be locked down, and no longer be subject to changes.

Notable Changes

Unicode 15.1 adds exactly five characters, for a total of 149,191 characters. The five new characters are Ideographic Description Characters that are used in Ideographic Description Sequences, which represent a mechanism to visually describe the structure of ideographs.

In addition, the code charts for the CJK Unified Ideographs, CJK Unified Ideographs Extension A, and CJK Unified Ideographs Extension B blocks now include representative glyphs and source references for nearly 24,000 KP-source ideographs. Furthermore, the format of the code charts for the CJK Unified Ideographs block was updated to accommodate KP-source ideographs through the addition of a seventh column.

Version 15.1 does not add new emoji characters, however, 118 new RGI emoji ZWJ sequences will be defined.

Feedback for the alpha review should be reported under PRI #473 using the Unicode contact form by April 4, 2023.

Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.


Monday, February 6, 2023

Announcing New Unicode Adopt-a-Character Site

The Adopt-a-Character program was launched in 2015. Since that time, AAC funds have supported Unicode's mission to ensure everyone can communicate in their own language. This includes preserving historical scripts such as Egyptian hieroglyphics and providing better language support for digitally disadvantaged and under-resourced languages such as Hanifi Rohingya used in Myanmar and Bangladesh.

Now you can more easily adopt a character and show off your hobby or business, favorite sport, or love – while also supporting a good cause. You can also give the gift of a letter to someone in your life. The possibilities are endless – and each adoption helps Unicode’s goal to support the world’s languages.

All character adoptions are permanent. Adoption of a specific character at the limited gold and silver levels is on a first-come-first-served basis. All sponsors receive a digital badge and are recognized on Unicode’s website, Twitter feed, and Friends of Unicode Facebook page.

To start your adoption, visit our new page!

Unicode, Inc. is a non-profit, 501(c)3 organization and contributions may be eligible for a tax deduction. Please consult with a tax expert for details.


Monday, January 23, 2023

New Unicode Consortium CEO

— Mark Davis, President & Unicode Cofounder

In January 1991 I became the first president of the Unicode Consortium, and in that position have presided over the board of directors since then. I’ve had the honor of occupying those roles for just over a gigasecond now, and it's time for a change.

Over time, it became apparent to me, the Consortium’s other officers, and the Board of Directors that our management model was no longer sufficient for what the organization had become over time, and what it needed to be in the future. So, we began to explore a new, more sustainable governance and management model. And an important part of that was succession planning

Among the first major steps in implementing this model was the hiring of Toral Cowieson as our first Executive Director and COO in 2021. Since then, Toral has helped professionalize the management of the Consortium. Working with the Board and the other Officers, Toral has also contributed to strengthening the Consortium’s governance.

The Board and I have also recognized that, as President, I have effectively occupied two distinct roles — CEO and CTO — and that these two different roles require the full attention of two different people. Accordingly, the Board has decided to split these two roles, formally creating the positions of CEO and CTO, while retiring the title of President.

And as its next step — I am delighted to announce — the Board has elected Toral Cowieson as CEO to replace me.

Toral has brought a wealth of experience in leadership across non-profits, corporations, and board service to Unicode. As executive director, she has connected with the people in the organization, provided thoughtful leadership, and instituted and guided changes in our operations and governance.

I’m not stepping off the stage completely. The Board has re-elected me as Chair of the Board, and elected me to the new position of CTO. I’ll also be continuing as chair of the CLDR technical committee as well as contributing to ICU and the UTC in focused areas.

The Unicode Consortium is the forum for companies, countries and other groups to work together on interoperable standards, code, and data — to support internationalizing software around the world. As a simple example, whenever you glance at the date on your cell phone, the text you see is Unicode characters, is formatted for your language according to CLDR language data (including for English), and uses ICU code libraries to make that all work.

As CTO, my main goal this year will be to work with the board, technical groups, and invited experts to continue maintaining and extending that foundation for so much of the world’s software, while formulating a strategy for meeting upcoming requirements and taking advantage of new technologies.

In addition, I am also pleased to announce some additional changes. I’ve worked extensively with each of these people, and have the fullest confidence that they will do great work in these new roles.

  • Peter Constable is a Technical Vice President and the Chair of the UTC. Since 2003, Peter has worked for Microsoft on various projects related to Unicode, internationalization, text display and fonts. He became a Unicode technical director in 2008 and later served as Treasurer.
  • Addison Phillips is the new Chair of the Message Formatting Working Group. Addison is also the chair of the W3C Internationalization Working Group and an active participant in the creation of internationalization standards such as Unicode. He and I are co-authors of IETF BCP 47, which is the standard for language and locale identifiers.
  • Elango Cheran is the Vice-Chair of the recently formed Community Engagement team and an internationalization engineer at Google. He actively contributes to the ICU and ICU4X projects, and to the MessageFormat Working Group.
Additional information available here:
Unicode Executive Officers
Unicode Fellows, Staff and Support
Unicode Technical Committee Chairs
Unicode Organization Chart

Photo by Michael Dziedzic on Unsplash

Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.


Tuesday, January 17, 2023

What’s New in Emoji 15.1?

Doing more, with less

By: Jennifer Daniel, Chair of the Emoji Subcommittee

[image phoenix]

This past Fall, the Unicode Technical Committee announced the delay of Unicode 16.0. This wasn’t without precedent — COVID slowed down the release of Unicode 14.0 in 2020 and the world seemed to survive πŸ˜‰. Subcommittees were well prepared and adjusted accordingly, discussing what this meant for their respective areas of expertise.

For the Emoji Subcommittee (ESC) — the group responsible for defining the rules, algorithms, and properties necessary to achieve interoperability between different platforms for those smiley faces that appear on your keyboard (Shout out 😁πŸ₯°πŸ₯ΉπŸ€”πŸ«£πŸ«‘😡‍πŸ’«!) — this delay presented an opportunity. Sure, we were so close to exhaling a sigh of relief (the intake period for Emoji 16.0 proposals had just completed). But upon learning we couldn’t ship any new codepoints until 2024 we turned our energy towards recommending new emoji based on existing ones. (These are called emoji ZWJ sequences. That's when a combination of multiple emoji display as a single emoji … like πŸ‘© 🏽 +🏭 = πŸ§‘πŸ½‍🏭).

When Less is More

An incredibly powerful aspect of written language is that it consists of a finite number of characters that can "do it all". And yet, as the emoji ecosystem has matured over time our keyboards have ballooned and emoji categories are about to hit or have hit a level of saturation. Upon reflecting on how emoji are used, the ESC has entered a new era where the primary way for emoji to move forward is not merely to add more of them to the Unicode Standard. Instead, the ESC approves fewer and fewer emoji proposals every year.

But our work is not done. Not by a longshot. Language is fluid and doesn’t stand still. There is more to do! This “off-cycle” gives us a chance to address some long-standing major pain points using emoji. The first one that came to mind: skin-tone.

What is a family?

The encoding of multi-person multi-tone support has matured over the years; however, the implementation can seem random to the average person: While it’s true, all people emoji have toned options (with the exception of characters where you can’t see skin like 🀺) there are … misfits. Some two people emoji offer tone support ( πŸ§‘πŸ»‍❤️‍πŸ§‘πŸΏ) others do not ( πŸ‘―). A few non RGI emoji render with tone but with no affordance to change one of the two characters (For example, 🀼🏾‍♂).

And then … There is the suite of family emoji (πŸ‘¨‍πŸ‘¦πŸ‘¨‍πŸ‘¦‍πŸ‘¦πŸ‘¨‍πŸ‘§πŸ‘¨‍πŸ‘§‍πŸ‘¦πŸ‘¨‍πŸ‘§‍πŸ‘§πŸ‘©‍πŸ‘¦πŸ‘©‍πŸ‘¦‍πŸ‘¦πŸ‘©‍πŸ‘§πŸ‘©‍πŸ‘§‍πŸ‘¦πŸ‘©‍πŸ‘§‍πŸ‘§ πŸ‘¨‍πŸ‘¨‍πŸ‘¦πŸ‘¨‍πŸ‘¨‍πŸ‘¦‍πŸ‘¦πŸ‘¨‍πŸ‘¨‍πŸ‘§πŸ‘¨‍πŸ‘¨‍πŸ‘§‍πŸ‘¦πŸ‘¨‍πŸ‘¨‍πŸ‘§‍πŸ‘§πŸ‘©‍πŸ‘©‍πŸ‘¦πŸ‘©‍πŸ‘©‍πŸ‘¦‍πŸ‘¦πŸ‘©‍πŸ‘©‍πŸ‘§πŸ‘©‍πŸ‘©‍πŸ‘§‍πŸ‘¦πŸ‘©‍πŸ‘©‍πŸ‘§‍πŸ‘§πŸ‘¨‍πŸ‘©‍πŸ‘¦πŸ‘¨‍πŸ‘©‍πŸ‘¦‍πŸ‘¦πŸ‘¨‍πŸ‘©‍πŸ‘§πŸ‘¨‍πŸ‘©‍πŸ‘§‍πŸ‘¦πŸ‘¨‍πŸ‘©‍πŸ‘§‍πŸ‘§πŸ‘ͺ). These characters include two people, three people, sometimes four and none of them have any tone support (!). We seem to have a lot of family emoji and yet simultaneously not enough.

The 26 “family” emoji can be broken down into four groups:

[image families]

Despite the Unicode Standard containing 26 “family” emoji, each one of these glyphs is overly prescriptive with regard to delivering on a visual representation of a family. The inclusion of many permutations of families was well intentioned. But we can’t list them all, and by listing some of the combinations, it calls attention to the ones that are excluded.

What even is a family? For some, family is the people you were raised with. Others have embraced friends as their chosen family. Some families have children, other families have pets. There are multi-generational families, mutli-racial families and of course many families are any combination of all of these characteristics and more.

Fortunately, we don’t need to add 7000 variants to your keyboards (even this would fall short of capturing the breadth of "family" as a concept). Instead we can juxtapose individual emoji together to capture a concept with some reasonable level of specificity — not too unlike arranging letters together to create words to convey concepts πŸ˜‰

[image toned families]

For emoji keyboards to advance in creating more intuitive and personalized experiences the Emoji Subcommittee is recommending a visual deprecation of the family emoji. This small set of emoji will be redesigned as part of a multi-phase effort to “complete the set” of toned variants for the remaining multi-person emoji. This of course begs the question: when there are as many families as there are people in the world, is there an effective way at conveying the concept of “family” without being overly prescriptive in defining what is and is not a family? Well, thankfully icons can do a lot of heavy lifting without requiring very much detail.
[image before-after]

When is an emoji running for the police or getting chased by them?

Another area the ESC is actively exploring is how the semantics of emoji sequences can differ when writing directionality changes. Some emoji characters have semantics that encode implicit directionality but when the string is mirrored and their meaning may be unintentionally lost or changed.

[image rightwards]
Left to Right Emoji Sequence
Quickly running towards an “exciting” police chase

[image leftwards]
Right to Left Emoji Sequence
Running away from the coppers

What, if anything, can we do to aid in ensuring that messages are meaningfully translated be them tiny pictures or tiny letters? As part of 15.1 we’re proposing a small set of emoji with strong directionality — with an initial focus on people — to face the opposite direction. Soon you too can run towards or away from … excitement.

Emoji 15.1

Given that the intake cycle of emoji proposals for Unicode 16.0 ended last July, the Emoji Subcommittee has also decided to temporarily delay the intake of Unicode Version 17.0 proposals until April 2024. Fortunately, you won’t have to wait until then to get new emoji. Among the list of recommendations includes 578 characters (most of them the candidates described above to support directionality). The list also includes a few humble additions including a broken chain, a lime, a non-poisonous mushroom, a nodding and shaking face, and a phoenix bird. Each one of these leverages a unique valid ZWJ sequence of emoji so while they look like atomic characters made of a single codepoint they are composed of two or more codepoints.

[image candidates]

Broken chain is the result of πŸ”—πŸ’₯, with a variety of meanings, such as freedom, breaking a cycle, or perhaps a broken url ;-). Like the bi-directional emoji touched on above, nodding face and shaking face are the result of πŸ™‚↔️and πŸ™‚↕️ respectively. Oh, and of course there is a phoenix rising from the ashes (🐦πŸ”₯), a perfect metaphor to capture where we are today.

The Unicode Technical Committee (UTC) will review the required documents at its first meeting of 2023 in January – and if these candidates move forward, you can expect an update from the UTC later this Spring and Summer.

Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.