Wednesday, September 21, 2022

New Online Event – Overview of Internationalization and Unicode Projects

The Unicode Consortium is excited to invite you to our upcoming online event, “Overview of Internationalization and Unicode Projects.”

During this ~2-hour event, hear pre-recorded sessions from some of the experts working to ensure that everyone can fully communicate and collaborate in their languages across all software and services. Unicode representatives will be available for live Q&A for the last 30-40 minutes and our emcee throughout will be Elango Cheran of Google.

Topics and speakers include:
  1. An Introduction to Internationalization (i18n) - Addison Phillips, Internationalization Engineer
  2. Overview of the Unicode Consortium: History and Future - Mark Davis, Cofounder and President
  3. Scripts and Character Encoding - Deborah Anderson, Chair of the Script Ad Hoc Committee
  4. The Common Locale Data Repository (CLDR) - Mark Davis and Annemarie Apple, Chair and Vice Chair of the CLDR Committee
  5. International Components for Unicode (ICU) - Markus Scherer, Chair of ICU Committee
  6. Bringing Internationalization to More Programming Languages and Resource-Constrained Environments (ICU4X) - Shane Carr, Chair of ICU4X Subcommittee
Date Wednesday, September 28th, 2022
Time 9:30am (California)/12:30pm (New York)/16:30 (UTC)/17:30 (London)
Location
and Cost
Online, free to attend
Registration    Register here. Please freely share this link with colleagues and anyone else who may be interested. Registration will also ensure you will receive updates for future Unicode events.

The recording and a playlist will be available on YouTube later this year for anyone who is unable to attend or if attendees want to share the information with others. Depending on community interest, Unicode project leaders will also be available in November and December for virtual “Office Hours” to talk more in depth and answer specific questions.

The link to share with your networks is: https://us06web.zoom.us/webinar/register/WN_ViDf3YFyS7WiAXnHYp88kw

Thanks and hope to see many of you on the 28th!


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, September 13, 2022

Announcing The Unicode® Standard, Version 15.0

[Nag Mundari image] Version 15.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 4,489 characters, bringing the total to 149,186 characters. These additions include two new scripts, for a total of 161 scripts, along with 20 new emoji characters, and 4,193 CJK (Chinese, Japanese, and Korean) ideographs. The new scripts and characters in Version 15.0 add support for modern language groups including:
  • Nag Mundari, a modern script used to write Mundari, a language spoken in India
  • A Kannada character used to write Konkani, Awadhi, and Havyaka Kannada in India
  • Kaktovik numerals, devised by speakers of Iñupiaq in Kaktovik, Alaska for the counting systems of the Inuit and Yupik languages
Among the popular symbol additions are 20 new emoji, including hair pick, maracas, jellyfish, khanda, and pink heart. For the full list of new emoji characters, see emoji additions for Unicode 15.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.

[Image credit Noto Emoji]

Other symbol and notational additions include:
Support for other languages and scholarly work includes:
  • Kawi, a historical script found in Southeast Asia, used to write Old Javanese and other languages
  • Three additional characters for the Arabic script to support Quranic marks used in Turkey
  • Three Khojki characters found in handwritten and printed documents
  • Ten Devanagari characters used to represent auspicious signs found in inscriptions and manuscripts
  • Six Latin letters used in Malayalam transliteration
  • Sixty-three Cyrillic modifier letters used in phonetic transcription
Important chart font updates include:
  • A set of updated glyphs for Egyptian hieroglyphs, in addition to standardized variation sequences to support rotated glyphs found in texts
  • Improved glyphs for Unified Canadian Aboriginal Syllabics, which provide better support for Carrier and other languages
  • A new Wancho font, with improved and simplified shapes
Updates to the CJK blocks add:
  • 4,192 ideographs in the new CJK Unified Ideographs Extension H block
  • One ideograph in the CJK Unified Ideographs Extension C block
Unicode properties and specifications determine the behavior of text on computers and phones. The following six Unicode Standard Annexes and Technical Standards have noteworthy updates for Version 15.0:
  • UAX #9, Unicode Bidirectional Algorithm, amends the note in UAX9-C2 to emphasize the use of higher-level protocols to mitigate potential source code spoofing attacks.
  • UAX #31, Unicode Identifier and Pattern Syntax, provides more guidance on profiles for default identifiers, clarifies the use of default ignorable code points in identifiers, and discusses the relationship between Pattern_White_Space and bidirectional ordering issues in programming languages.
  • UAX #38, Unicode Han Database, adds the kAlternateTotalStrokes property. The kCihaiT property’s category was changed to Dictionary Indices, the kKangXi property was expanded, and Sections 3.0, 3.10, and 4.5 were added.
  • UTS #39, Unicode Security Mechanisms, changes the zero width joiner (ZWJ) and zero width non-joiner (ZWNJ) characters from Identifier_Status=Allowed to Identifier_Status=Restricted; they are therefore no longer allowed by the General Security Profile by default.
  • UAX #45, U-Source Ideographs, has records for new ideographs in its data file, “ExtH” was added as a new status, the status identifiers for the existing CJK Unified Ideographs blocks were improved, and Section 2.5 was added.
  • UTS #46, Unicode IDNA Compatibility Processing, clarified the edge case of the empty label in ToASCII and added documentation regarding the new IDNA derived property data files.

About the Unicode Standard

The Unicode Standard provides the basis for processing, storage and seamless data interchange of text data in any language in all modern software and information technology protocols. It provides a uniform, universal architecture and encoding for all languages of the world, with over 140,000 characters currently encoded.

Unicode is required by modern standards such as XML, Java, C#, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is a fundamental component of all modern software.

For additional information on the Unicode Standard, please visit https://home.unicode.org/.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. For a complete member list go to https://home.unicode.org/membership/members/.
For more information, please contact the Unicode Consortium https://home.unicode.org/connect/contact-unicode/.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Friday, August 26, 2022

Unicode CLDR v42 Alpha available for testing

[image] The Unicode CLDR v42 Alpha is now available for integration testing.

CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

The alpha has already been integrated into the development version of ICU. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. Feedback can be filed at CLDR Tickets.

Alpha means that the main data and charts are available for review, but the specification, JSON data, and other components are not yet ready for review. Some data may change if showstopper bugs are found. The planned schedule is:
  • Sep 14 — Beta (data)
  • Sep 28 — Beta2 (spec)
  • Oct 19 — Release
In CLDR 42, the focus is on:
  1. Locale coverage. The following locales now have higher coverage levels:
    1. Modern: Igbo (ig), yo (Yoruba)
    2. Moderate: Chuvash (cv), Xhosa (xh)
    3. Basic: Haryanvi (bgc), Bhojpuri (bho), Rajasthani (raj), Tigrinya (ti)
  2. Formatting Person Names. Added data and structure for formatting people's names. For more information on why this feature is being added and what it does, see Background.
  3. Emoji 15.0 Support. Added short names, keywords, and sort-order for the new Unicode 15.0 emoji.
  4. Coverage, Phase 2. Added additional language names and other items to the Modern coverage level, for more consistency (and utility) across platforms.
  5. Unicode 15.0 additions. Made the regular additions and changes for a new release of Unicode, including names for new scripts, collation data for Han characters, etc.
There are many other changes: to find out more, see the draft CLDR v42 release page, which has information on accessing the date, reviewing charts of the changes, and — importantly — Migration issues.

In version 42, the following levels were reached:

Level Languages Locales* Notes
Modern 94 366 Suitable for full UI internationalization
Afrikaans‎, ‎… Čeština‎, ‎… Dansk‎, ‎… Eesti‎, ‎… Filipino‎, ‎… Gaeilge‎, ‎… Hrvatski‎, ‎Indonesia‎, ‎… Jawa‎, ‎Kiswahili‎, ‎Latviešu‎, ‎… Magyar‎, ‎…Nederlands‎, ‎… O‘zbek‎, Polski‎, ‎… Română‎, ‎Slovenčina‎, ‎… Tiếng Việt‎, ‎… Ελληνικά‎, Беларуская‎, ‎… ‎ᏣᎳᎩ‎, ‎ Ქართული‎, ‎Հայերեն‎, ‎עברית‎, ‎اردو‎, … አማርኛ‎, ‎नेपाली‎, … ‎অসমীয়া‎, ‎বাংলা‎, ‎ਪੰਜਾਬੀ‎, ‎ગુજરાતી‎, ‎ଓଡ଼ିଆ‎, தமிழ்‎, ‎తెలుగు‎, ‎ಕನ್ನಡ‎, ‎മലയാളം‎, ‎සිංහල‎, ‎ไทย‎, ‎ລາວ‎, မြန်မာ‎, ‎ខ្មែរ‎, ‎한국어‎, ‎… 日本語‎, ‎…
Moderate
7
11
Suitable for full “document content” internationalization, such as formats in a spreadsheet.
Binisaya, … ‎Èdè Yorùbá, ‎Føroyskt, ‎Igbo, ‎IsiZulu, ‎Kanhgág, ‎Nheẽgatu, ‎Runasimi, ‎Sardu, ‎Shqip, ‎سنڌي, …
Basic
29
43
Suitable for locale selection, such as choice of language in mobile phone settings.
Asturianu, ‎Basa Sunda, ‎Interlingua, ‎Kabuverdianu, ‎Lea Fakatonga, ‎Rumantsch, ‎Te reo Māori, ‎Wolof, ‎Босански (Ћирилица), ‎Татар, ‎Тоҷикӣ, ‎Ўзбекча (Кирил), ‎کٲشُر, ‎कॉशुर (देवनागरी), ‎…, ‎মৈতৈলোন্, ‎ᱥᱟᱱᱛᱟᱲᱤ, ‎粤语 (简体)‎

* Locales are variants for different countries or scripts.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Thursday, June 30, 2022

Working with Local Communities to Revitalize and Preserve Indigenous Languages in Canada

By Kevin King, Typotheque

The Typotheque Syllabics Project, an initiative based out of Toronto and The Hague, Netherlands, undertook research with language keepers across various Syllabics-using Indigenous communities in Canada to document and address both local typographic preferences, as well as technical barriers they faced.

This research contributed to two proposals to amend the Unicode Standard for the Syllabics, which is an important step in the preservation and revitalization of Indigenous languages.

[Map, Image provided by Typotheque https://www.typotheque.com/, used with permission.]

The local Indigenous communities were given a voice in reclaiming ownership over the use of their language, as well as the resources for self-determined expression in the writing system that they identify with. By working in collaboration with Nattilik language keepers Nilaulaaq, Janet Tamalik, Attima and Elisabeth Hadlari, and elders in the community, key issues the Nattilik community of Western Nunavut faced were identified, and it was discovered that there were 12 missing syllabic characters from the Unicode Standard. The Nattilik community was unable to use their language reliably for even simple, everyday digital text exchanges such as email or text messaging.

[Syllables Block, Image provided by Typotheque https://www.typotheque.com/, used with permission.]
The Nattilik Kutaiřřutit (Nattilik special characters), required for representing sounds unique to the Nattilingmiutut dialect of Inuktut.


It was also revealed that the glyphs of the Carrier (Dakelh) community of central British Columbia were incorrectly represented in the UCAS code charts. Additionally, 4 characters for a now-obsolete sp series were successfully proposed to Unicode for representing and digitally-preserving historical texts in the Cree and Ojibway languages. These important alterations meant that all Syllabics typefaces that are fully Unicode-compliant – including system level typefaces on common operating systems – would be capable of accurately and legibly representing text for the Carrier, Sayisi, and Ojibway Syllabics-using communities moving forward.

When the comprehensive glyph set was produced by the project, the results provided not only a stable environment for the local Indigenous communities to use their languages on their devices, but it also changed the standards for the development of all future Syllabics fonts, and ensured that writing systems of all communities will be accurately represented.

[Syllables, Image provided by Typotheque https://www.typotheque.com/, used with permission.]
Above, a representation of the missing characters for Nattilingmiutut, a dialect of Inuktut in Western Nunavut.


Where to learn more:

Acknowledgements

Special thanks to Liang Hai, Deborah Anderson, and Sarah Rivera for their contributions to this blog.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, June 8, 2022

Unicode CLDR Version 42 Submission Open

[ballot box image] The Unicode CLDR Survey Tool is open for submission for version 42. CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)

Via the online Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.

Version 42 is focusing on:
  • Additional Coverage
    • Unicode 15.0 additions: new emoji, script names, collation data (Chinese & Japanese), …
    • New Languages: Adding Haryanvi, Bhojpuri, Rajasthani at a Basic level.
    • Up-leveling: Xhosa, Hinglish (Hindi-Latin), Nigerian Pidgin, Hausa, Igbo, Yoruba, and Norwegian Nynorsk.
  • Person Name Formatting: for handling the wide variety in the way that people’s names work in different languages.
    • People may have a different number of names, depending on their culture--they might have only one name (“Zendaya”), two (“Albert Einstein”), or three or more.
    • People may have multiple words in a particular name field, eg “Mary Beth” as a given name, or “van Berg” as a surname.
    • Some languages, such as Spanish, have two surnames (where each can be composed of multiple words).
    • The ordering of name fields can be different across languages, as well as the spacing (or lack thereof) and punctuation.
    • Name formatting need to be adapted to different circumstances, such as a need to be presented shorter or longer; formal or informal context; or when talking about someone, or talking to someone, or as a monogram (JFK).
Submission of new data opened recently, and is slated to finish on June 22. The new data then enters a vetting phase, where contributors work out which of the supplied data for each field is best. That vetting phase is slated to finish on July 6. A public alpha makes the draft data available around August 17, and the final release targets October 19.

Each new locale starts with a small set of Core data, such as a list of characters used in the language. Submitters of those locales need to bring the coverage up to Basic level (very basic basic dates, times, numbers, and endonyms) during the next submission cycle. In version 41, the following levels were reached:

Level Languages Locales* Notes
Modern 89 361 Suitable for full UI internationalization
Afrikaans‎, ‎… Čeština‎, ‎… Dansk‎, ‎… Eesti‎, ‎… Filipino‎, ‎… Gaeilge‎, ‎… Hrvatski‎, ‎Indonesia‎, ‎… Jawa‎, ‎Kiswahili‎, ‎Latviešu‎, ‎… Magyar‎, ‎…Nederlands‎, ‎… O‘zbek‎, ‎Polski‎, ‎… Română‎, ‎Slovenčina‎, ‎… Tiếng Việt‎, ‎… Ελληνικά‎, ‎Беларуская‎, ‎… ‎ᏣᎳᎩ‎, ‎ Ქართული‎, ‎Հայերեն‎, ‎עברית‎, ‎اردو‎, … አማርኛ‎, ‎नेपाली‎, … ‎অসমীয়া‎, ‎বাংলা‎, ‎ਪੰਜਾਬੀ‎, ‎ગુજરાતી‎, ‎ଓଡ଼ିଆ‎, ‎தமிழ்‎, ‎తెలుగు‎, ‎ಕನ್ನಡ‎, ‎മലയാളം‎, ‎සිංහල‎, ‎ไทย‎, ‎ລາວ‎, ‎မြန်မာ‎, ‎ខ្មែរ‎, ‎한국어‎, ‎… 日本語‎, ‎…
Moderate 13 32 Suitable for full “document content” internationalization, such as formats in a spreadsheet.
Binisaya, … ‎Èdè Yorùbá, ‎Føroyskt, ‎Igbo, ‎IsiZulu, ‎Kanhgág, ‎Nheẽgatu, ‎Runasimi, ‎Sardu, ‎Shqip, ‎سنڌي, …
Basic 22 21 Suitable for locale selection, such as choice of language in mobile phone settings.
Asturianu, ‎Basa Sunda, ‎Interlingua, ‎Kabuverdianu, ‎Lea Fakatonga, ‎Rumantsch, ‎Te reo Māori, ‎Wolof, ‎Босански (Ћирилица), ‎Татар, ‎Тоҷикӣ, ‎Ўзбекча (Кирил), ‎کٲشُر, ‎कॉशुर (देवनागरी), ‎…, ‎মৈতৈলোন্, ‎ᱥᱟᱱᱛᱟᱲᱤ, ‎粤语 (简体)‎
* Locales are variants for different countries or scripts.

If you would like to contribute missing data for your language, see Survey Tool Accounts. For more information on contributing to CLDR, see the CLDR Information Hub.



Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, May 31, 2022

Unicode 15.0 Beta Review

[Kawi beta chart image] The beta review period for Unicode 15.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones-plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 15.0 includes a number of changes and 4,489 new characters, including another major extension of CJK unified ideographs. A number of the Unicode Standard Annexes have significant modifications for Unicode 15.0. Two new scripts have been added, and there are also 20 additional emoji characters in Unicode 15.0.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by July 12, 2022. The review period will only be for six weeks, so prompt feedback is appreciated. Feedback instructions are on the beta page.

See https://www.unicode.org/versions/beta-15.0.0.html for more information about testing the 15.0.0 beta.

See https://www.unicode.org/versions/Unicode15.0.0/ for the current draft summary of Unicode 15.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Amazon, Apple, Emojipedia, Google, Government of Bangladesh, International Emerging Technology Company (ETCO), Meta, Microsoft, Netflix, Salesforce, SAP, Tamil Virtual Academy, The University of California (Berkeley), Yat Labs, plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to https://home.unicode.org/membership/members/.

For more information, please contact the Unicode Consortium https://home.unicode.org/connect/contact-unicode/.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, May 4, 2022

Out of this World: New Astronomy Symbols Approved for the Unicode Standard

Five Trans-Neptunian Objects to Join Character Set

By Deborah Anderson, Chair of Unicode Script Ad Hoc Committee

In January 2022, the Unicode Technical Committee approved five new symbols to be published in Unicode 15.0. With the projected release date of September 2022, these symbols are based on newly discovered trans-Neptunian objects (TNOs) in the Solar System. They resulted from research efforts such as those led by astronomer and professor Dr. Michael Brown at the California Institute of Technology (CalTech).

These five objects orbit the Sun at a distance far larger than the major planets. They are currently believed to be large enough to be round, planetary worlds, in a category of objects called “dwarf planets” that also includes Ceres, Pluto, Eris and probably Sedna. The most famous trans-Neptunian object is Pluto, which historically had been considered to be the ninth planet from the Sun, but was reclassified as a dwarf planet in 2006 by the International Astronomical Union (IAU).[1]

[Pluto image]

How did this happen?

Individuals or organizations who want to propose new characters have to check existing characters to avoid duplicates, find out if there are equivalent forms already in existence, and most critically, determine the need for a digital interchange of them, such as symbols that have been encoded for use by NASA and other agencies. The proposal authors then must submit a proposal that articulates how their request meets the criteria.

Once a proposal is submitted, the Unicode Technical Committee determines whether to review the proposal and accept or decline it. This process can take a couple of years or more. In the case of these five characters, the proposers demonstrated the need, clearing the path for approval. 

Tell me more about these new characters. What are their names?

The International Astronomical Union (IAU) has standard conventions for naming objects both within and outside of the solar system. Objects orbiting the Sun outside the orbit of Neptune are named after mythological figures, particularly those associated with creation. But the subset that orbit in a two-to-three resonance with Neptune — the so-called “plutinos”, such as Pluto and Orcus — are named after figures associated with the underworld. In this case, the five TNOs, ordered by distance from the sun, are named:
  • Orcus: the Etruscan and Roman god of the underworld.
  • Haumea: the Hawaiian goddess of fertility; the telescope used to discover this object is located on Hawaiʻi.
  • Quaoar: an important mythological figure of the Tongva, the indigenous people who originally occupied the land where CalTech is located.
  • Makemake: the creator god of the Rapanui of Easter Island.
  • Gonggong: a destructive Chinese water god.
What information is there on the actual symbols that will be available?

All five symbols were designed by Denis Moskowitz, a software engineer in Massachusetts who had previously designed the Unicode symbol for Sedna. He drew inspiration from existing symbols and the “native name or culture” of the objects’ namesakes [2] to create the characters.

[TNO glyphs image]

Denis explains his inspiration for each symbol below:
  • Orcus: The symbol for Orcus is a combination of the Latin letters “O” and “R”, stylized to resemble a skull and an orca’s grin.
  • Haumea: The symbol created for Haumea was a combination and simplification of Hawaiian petroglyphs for “childbirth” and “woman”.
  • Quaoar: The symbol is the Latin letter “Q” with the tail fashioned into the shape of a canoe. The angular shape is intended to reflect Tongva rock art.
  • Makemake: The Makemake symbol is a traditional petroglyph of the face of the creator god Makemake, stylized to suggest an “M”. The design was a collaboration with John T. Whelan.
  • Gonggong: Gonggong’s symbol was based on the first Chinese character in the god’s name, 共 gòng, with a snaky tail replacing the lower section.
What else should we know?

The five symbols supplement a set of other characters for planetary objects that were published in 2018 (Unicode 11.0) and earlier. Two of the newly approved characters appear in a NASA poster. Other people have used the symbols in various media, including tattoos and art. Ultimately, these five new characters will join the 149,180 other characters in the Unicode Standard Version 15.0 and be accessible to anyone, anywhere in the world, who is using a computer or mobile device.

Where can I learn more?
Acknowledgments

Special thanks to Sarah Rivera and Kirk Miller for their contributions to this blog.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Friday, April 8, 2022

ICU 71 Released

ICU LogoUnicode® ICU 71 has just been released. ICU is the premier library for software internationalization, used by a wide array of companies and organizations to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR). ICU 71 updates to CLDR 41 locale data with various additions and corrections.

ICU 71 adds phrase-based line breaking for Japanese. Existing line breaking methods follow standards and conventions for body text but do not work well for short Japanese text, such as in titles and headings. This new feature is optimized for these use cases.

ICU 71 adds support for Hindi written in Latin letters (hi_Latn). The CLDR data for this increasingly popular locale has been significantly revised and expanded. Note that based on user expectations, hi_Latn incorporates a large amount of English, and can also be referred to as “Hinglish”.

ICU 71 and CLDR 41 are minor releases, mostly focused on bug fixes and small enhancements. (The fall CLDR/ICU releases will update to Unicode 15 which is planned for September.) We are also working to re-establish continuous performance testing for ICU, and on development towards future versions.

ICU 71 updates to the time zone data version 2022a. Note that pre-1970 data for a number of time zones has been removed, as has been the case in the upstream tzdata release since 2021b.

For details, please see https://icu.unicode.org/download/71.


Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]