Friday, October 23, 2020

Announcing ICU4X 0.1

ICU LogoWe are thrilled to announce the first pre-release version of the ICU4X internationalization components. ICU4X aims to provide high quality internationalization components with a focus on:
  • Modularity
  • Flexible data management
  • Performance, memory, safety and size
  • Universal access from programming languages and ecosystems (FFI)
ICU4X draws from the experience of projects such as ICU4C, ICU4J, ECMA-402, CLDR, and Unicode.

Target

ICU4X is initially focusing on a subset of internationalization APIs standardized in ECMA-402 in order to cover the needs of client-side ecosystems and thin clients.

ICU4X targets a wide range of programming languages and environments, aiming to expose its APIs to languages such as Javascript, WebAssembly, Dart, C++, Python, PHP, and others.

With our focus on client-side ecosystems a lot of effort will be placed on minimizing the size, memory, and CPU utilization, and allowing for asynchronous data management.

More information on the design can be found in the project’s Announcement article.

Status

This first pre-release 0.1 version is written in Rust and introduces a small subset of APIs and scaffolding for flexible data management.

We would like to invite everyone to try it out. Take a look at the documentation and provide feedback on the API design. We’re also looking for feedback on the algorithms and data structures we use, especially from contributors with experience in Rust and ICU algorithms

More information on the release can be found in the Release Notes.

Roadmap

The next version, 0.2, will focus on validating the ability to expose ICU4X APIs to other programming environments and extending the data management system to be asynchronous.

The project is fully open source and invites all interested parties to join the effort of designing and developing a modular internationalization components system in Rust.

To learn more on how to contribute to the project, visit the CONTRIBUTE document in the project’s repository.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Friday, October 9, 2020

Unicode CLDR Locale Data v38 beta available for testing

[beta image] The beta version of Unicode CLDR version 38 is now available. The data will not be changed except for showstoppers, but the LDML v38 spec can still be changed. The final release of v38 is planned for October 28, 2020. If you find any problems, please file a ticket.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 includes:
  • Enhancements to existing locale data: adding support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for Unicode symbols that are non-emoji (~400), and annotations for  Emoji v13.1. 
  • Survey Tool upgrades: substantial performance improvements, plus structured forum entries to improve coordination among translators.
LDML v38 includes:
  • To make the canonicalization of locale identifiers clear and unambiguous, provided major restructuring of the specification for it. (This was done in concert with fixes to the alias data to work better with the specification.)
  • To support inflected units of measurement:
    • minimalPairs adds new elements
      caseMinimalPairs and genderMinimalPairs
    • unit adds a new element gender
    • grammaticalData adds new elements
      grammaticalDerivations, deriveCompound, and deriveComponent
    • unitPattern adds a new attribute case
    • grammaticalCase, grammaticalGender, grammaticalDefiniteness add a new attribute scope
    • compoundUnitPattern1 adds new attributes case and gender
    • compoundUnitPattern adds a new attribute case
  • To allow for overriding dictionary-based segmentation breaks, added the Unicode Dictionary Break Exclusion Identifier, with the new key “dx”.
  • For picking the correct units of measurement for locales, defined the userPreferences skeleton more precisely.
  • For accurate plural categories in compact numbers, added the 'c' operand to plural rules to provide formatting for languages such as French.
See additional details in the draft CLDR v38 Release note.

The overall changes to the data items were:

Added Deleted Changed Total
155,131 33,805 45,895 2,175,821


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Friday, September 18, 2020

Emoji 13.1 — Now final, to be widely available in 2021

Emoji 13.1 is now final with 217 new emoji sequences! Of these, 210 are skin tone variants; the other seven new emoji are:

Most of the skin tone variants are for the multi-person emoji groupings couples with heart and couples kissing.
This minor release was created to add new emoji before 2022. The Unicode Consortium is a volunteer organization and we would be completely without new emoji in 2021 if it weren’t for the dedication of many volunteers who make this possible. Thank you! ✨

The new emoji are listed in Emoji Recently Added v13.1. The images provided on that page are just samples: vendors for mobile phones, PCs, and web platforms create their own images.

New emoji in this release should begin appearing on devices in the coming months. These new emoji will also be available for adoption. Donations for adoptions help the Unicode Consortium’s work on digitally disadvantaged languages.

For implementers:
  1. There are no new atomic characters. Instead, each emoji is a sequence of existing characters.
  2. UTS #51 and associated data files have been updated for Emoji 13.1.
  3. CLDR v38 alpha has also been updated for Emoji 13.1. This includes names, search keywords, and sort orderings for the new emoji, available for over 80 languages. It is scheduled for release at the end of October.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, September 15, 2020

Unicode CLDR Locale Data v38 alpha available for testing

The alpha version of Unicode CLDR version 38 is now available for data testing. The final release of v38 is planned for October 22, 2020. If you find any problems with the data, please file a ticket.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 includes:
  • Enhancements to existing locale data: adding support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for Unicode symbols that are non-emoji (~400), and annotations for Emoji v13.1.
  • New locales added: Dogri and Sanskrit.
  • Survey Tool upgrades: substantial performance improvements, plus structured forum entries to improve coordination among translators.
See additional details in the draft CLDR v38 Release note

The overall changes to the data items were:

Added Deleted Changed
155,131 33,805 45,895



Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, September 1, 2020

Emoji 15.0 Submissions Re-Open April 2, 2021

Emoji15 The Unicode Consortium is postponing the submissions of new emoji for Unicode version 15.0 until April 2, 2021. This delay follows on the postponement of the release of the upcoming Unicode 14.0 version from March to September 2021.

This delay impacts related specifications and data, such as new emoji characters. As a consequence, the deadline for submission of new emoji character proposals for Emoji 14.0 was extended until September 1, 2020.

Pausing Processing of New Emoji Proposals ⏸️

The Emoji Subcommittee is in the process of revising the submission form. Until the new submission form is ready on April 2, 2021, proposals will be returned to sender. During this period the committee will also be prioritizing Emoji 15.0 initiatives as described in document L2/20-197.

Submissions for Emoji 15.0 Open April 2021 ▶️

The Emoji Subcommittee will be accepting new emoji character proposals for Emoji 15.0 from April 2, 2021 onward. Any new emoji characters incorporated into Emoji 15.0 can be expected to appear on devices such as computers, phones, and tablets in 2023.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]