Thursday, March 30, 2023

The Unicode CLDR v43 Beta is now available for integration testing

[image] CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.). For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

It is important to review the Migration section for changes that might require action by implementations using CLDR directly or indirectly (eg, via ICU), and the Specification changes, since those are new since the Alpha.

We appreciate feedback from both ICU and non-ICU consumers of CLDR data. (The Beta has already been integrated into the development version of ICU.) Feedback can be filed at CLDR Tickets. Any tickets should be filed as soon as possible, because the target release date is 2023 Apr 12, Wed.

CLDR 43 is a limited-submission release, focusing on just a few areas:
  1. Formatting Person Names
    • Completing the data for formatting person names, allowing it to advance out of “tech preview”. For more information on the benefits of this feature, see Background.
  2. Locales
    • Adding substantially to the LikelySubtags data: This is used to find the likely writing system and country for a given language, used in normalizing locale identifiers and inheritance. The data has been contributed by SIL
    • Inheritance: Adding components to parentLocales, and documenting the different inheritance for rgScope data, which inherits primarily by region
  3. Other data updates
    • Alternate names for Turkey / Türkiye
    • Name for the new timezone Ciudad Juárez
  4. Structure
    • Adding some structure and data needed for ICU4X & JavaScript, for calendar eras and parentLocales.
  5. Collation & Searching
    • Treat various quote marks as equivalent at a Primary strength, also including Geresh and Gershayim.
To find out more about these and other changes, see the draft CLDR v43 release page, which has information on accessing the date, reviewing charts of the changes, and — importantly — Migration issues.



Support Unicode
To support Unicode’s mission to ensure everyone can communicate in their languages across all devices, please consider adopting a character, making a gift of stock, or making a donation. As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

[badge]

Wednesday, March 22, 2023

Remembering John H. Jenkins (井作恆)

The Unicode community is greatly saddened and affected by the recent and sudden loss of John H. Jenkins, a long-time colleague and friend. John was most recently the Vice-Chair of the Unicode CJK & Unihan Group. The vast majority of characters in the Unicode Standard are Chinese, Japanese, and Korean (aka Han) ideographs, which are historically used with a broader range of languages. These have been challenging characters to deal with in script encoding, because of significant regional drift over hundreds of years. As an expert in Han ideographs, John contributed a non-trivial amount of work and effort, sometimes needing to make difficult character encoding decisions for the benefit of the large user community.

Many people have worked with John and appreciated his substantial contributions. Here are some reflections from two people who worked with him most closely.

From Lee Collins:

I met John when he joined our team at Apple in 1991. He came from an internship in Apple's Advanced Technology Group (ATG), having graduated in math and ancient Greek at UC Berkeley. In addition to his technical skills, he could read, write and speak Cantonese. All in all, he was a perfect addition to the team, since one of our main tasks was completion of the first version of the Unicode standard, in particular the Unified Han character set. A key component was the database we had built to track all the different Han character encodings, beginning with Xerox, later adding Mac OS version of JIS, GB, Big5, and KSC, then the unified simplified and traditional mappings provided by Mr Zhang Zhoucai of China. The database was a Hypercard stack that ran on a version of Mac OS I cobbled together to allow Chinese, Japanese and Korean text to be edited and displayed simultaneously. John took over management of that system and database and began to learn the arcane art of Chinese character encoding. He also found time to write a Risk-like game based on the classical world. I don't remember the name of that game, but it was a nice diversion from work.

I had been the primary Unicode representative at the first meetings of international experts to refine what became the ISO 10646 Unified Repertoire and Ordering / Unicode V1.0. The group, initially known as the CJK-JRG (Chinese Japanese, Korean Joint Research Group) later became the current IRG. Hoping he would take over my work, I invited John to join one of the early meetings in Hong Kong, November 1991, and he later became the primary representative. John continued to contribute to the IRG and the Unihan database for the rest of his career.

We both joined the ill-fated Taligent effort, where we developed the internationalization classes that later became the foundation for ICU. Those designs were probably one of the few things of value that came out of Taligent. I left Taligent and went back to Apple. John came back sometime later after IBM took it over completely. I was manager of the team charged with developing Apple's first Unicode-based text library, which we called ATSUI (Apple Type Services for Unicode Imaging). It was largely based on the model of text layout developed for Quickdraw GX. John was the engineer charged with developing the library. That role was not a good fit for John's talents, so he moved to the Typography group where he was responsible for the font tools Apple used to develop our Truetype fonts. My team also developed support for complex scripts like Hindi and Thai, so I often used John's tools to create fonts with the required layout tables.

I moved on to other areas of Apple, ceased to work directly with John, and eventually left Apple. But, since 2015 or so, I again became involved in the IRG as the representative for Vietnam. That allowed me to work with John once more in his various capacities on the Unicode Technical Committee, especially his responsibility for the Unihan database and participation in the IRG. I enjoyed being able to work with him again. Knowing the size and complexity of the work he did for Unicode, he will not be easily replaced.

While we had our differences on technical and work issues at times, he was always a kind and thoughtful person. The world is a lesser place without him.

John was much more familiar with Cantonese than Mandarin due to his missionary work in Hong Kong. I think John’s characters, 井作恆, satisfied two criteria: they are close to his name phonetically (zeng2 zok3 hang4) and look like an actual Chinese name. Purely phonetic transcriptions often use a limited set of characters that look obviously foreign. These don't.

From Ken Lunde:

Nothing brought more joy to John than attending IRG (Ideographic Research Group) meetings, particularly when they took place in Chinese-speaking regions, especially Hong Kong, which held a special place in John’s heart. For those who are unaware, the IRG is responsible for reviewing and preparing the thousands of characters in the growing number of CJK Unified Ideographs blocks, which comprise approximately one-third of the total number of characters in the Unicode Standard.

Fun fact: John and I had an unwritten and informal agreement that he would attend these one-week IRG meetings when they took place in Chinese-speaking regions, and I would attend those hosted elsewhere, in a quasi yin and yang relationship. This would completely explain why I have never attended an IRG meeting in a Chinese-speaking region. This relationship was also evident in John’s focus on all things Chinese and my focus on all things Japanese, though both of us performed sufficiently dangerous dabbling in the other language.

John and I began working much more closely together as a result of COVID-19, which necessitated the formation of the Unicode CJK & Unihan Group, with me serving as the Chair, and John serving as the Vice-Chair. This group, which was formed in early 2020, pre-digests proposals and public feedback, interacts with the IRG, and provides its recommendations to the UTC.

[Photo of Ken Lunde and John Jenkins, October 2022]
Please visit John’s obituary to read more about his extraordinary life, or to express condolences to John’s family:

https://www.larkinmortuary.com/obituary/view/john-howard-jenkins/
[Silver badge]