Tuesday, September 15, 2020

Unicode CLDR Locale Data v38 alpha available for testing

The alpha version of Unicode CLDR version 38 is now available for data testing. The final release of v38 is planned for October 22, 2020. If you find any problems with the data, please file a ticket.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 includes:
  • Enhancements to existing locale data: adding support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for Unicode symbols that are non-emoji (~400), and annotations for Emoji v13.1.
  • New locales added: Dogri and Sanskrit.
  • Survey Tool upgrades: substantial performance improvements, plus structured forum entries to improve coordination among translators.
See additional details in the draft CLDR v38 Release note

The overall changes to the data items were:

Added Deleted Changed
155,131 33,805 45,895

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages


Tuesday, September 1, 2020

Emoji 15.0 Submissions Re-Open April 2, 2021

Emoji15 The Unicode Consortium is postponing the submissions of new emoji for Unicode version 15.0 until April 2, 2021. This delay follows on the postponement of the release of the upcoming Unicode 14.0 version from March to September 2021.

This delay impacts related specifications and data, such as new emoji characters. As a consequence, the deadline for submission of new emoji character proposals for Emoji 14.0 was extended until September 1, 2020.

Pausing Processing of New Emoji Proposals ⏸️

The Emoji Subcommittee is in the process of revising the submission form. Until the new submission form is ready on April 2, 2021, proposals will be returned to sender. During this period the committee will also be prioritizing Emoji 15.0 initiatives as described in document L2/20-197.

Submissions for Emoji 15.0 Open April 2021 ▶️

The Emoji Subcommittee will be accepting new emoji character proposals for Emoji 15.0 from April 2, 2021 onward. Any new emoji characters incorporated into Emoji 15.0 can be expected to appear on devices such as computers, phones, and tablets in 2023.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages


Thursday, August 20, 2020

Tableaux des caractères Unicode 13.0 désormais disponibles en langue française

françaisLes tableaux des caractères Unicode 13.0 en langue française sont désormais disponibles sur le site web d’Unicode. Après un long travail de traduction réalisé par des experts francophones (du Canada, de France et de Belgique), une grande partie du système proposé aux locuteurs anglophones pour l’accès en ligne aux tableaux de caractères (https://www.unicode.org/charts/) a été reproduite en langue française pour les utilisateurs francophones et est disponible sous ce lien : https://www.unicode.org/charts/fr/. Cette page du site propose un accès aux différents blocs définis dans les tableaux des caractères Unicode 13.0, rangés par catégorie (écritures, symboles, ponctuation, etc.). La recherche par code hexadécimal d’un caractère est également proposée sur cette page. Et une recherche par nom de caractère est possible sur cette autre page : https://www.unicode.org/charts/fr/charindex.html (un clic sur le lien intitulé « Index des noms » vous y conduira directement).

Les tableaux des caractères Unicode 13.0 en langue française sont également disponibles sous la forme d’un fichier unique à cette adresse : https://www.unicode.org/Public/13.0.0/charts/fr/ ; il n’est toutefois pas prévu de fournir des tableaux en langue française mettant en lumière les caractères ajoutés au répertoire de la version actuelle (c’est-à-dire des fichiers équivalents à ceux que l’on trouve sous ce lien : https://www.unicode.org/charts/PDF/Unicode-13.0/).

Ces tableaux sont également accessibles depuis : https://www.unicode.org/versions/Unicode13.0.0/#Code_Charts.

Marc Lodewijck a été le principal contributeur à la réalisation des tableaux de caractères en langue française pour la version 13.0 d’Unicode, un travail auquel ont largement participé, en particulier, Patrick Andries, Alain LaBonté, Michel Suignard et François Yergeau, ainsi que quelques autres personnes.

Avertissement : la fourniture des tableaux des caractères Unicode 13.0 en langue française n’implique nullement que le Consortium Unicode créera de tels tableaux (en français ou dans d’autres langues que l’anglais) pour les versions à venir du standard Unicode. Contrairement aux noms des caractères Unicode en langue anglaise, leurs équivalents en langue française ne constituent pas un élément normatif du standard Unicode.

Unicode 13.0 code charts now available in French

The Unicode 13.0 code charts are now also available in French on the Unicode web site. Following an extensive translation work by French-speaking experts (from Canada, France, and Belgium), a large part of the online code chart mechanism available to English speakers at https://www.unicode.org/charts/ has been duplicated in French at https://www.unicode.org/charts/fr/. That link allows the access to the various blocks defined in the Unicode 13.0 code charts, based on their categories (scripts, symbols, punctuation, etc.). The search by hex code is also available on the same page. And you may access an index of character names on the following page: https://www.unicode.org/charts/fr/charindex.html (clicking on the link labeled “Index des noms” will take you straight to it).

Access to the Unicode 13.0 version of the French-language archival code charts (single file) is also available at https://www.unicode.org/Public/13.0.0/charts/fr/; however there is no plan to provide a French version of the delta code charts (equivalent to https://www.unicode.org/charts/PDF/Unicode-13.0/).

These code charts are also accessible from: https://www.unicode.org/versions/Unicode13.0.0/#Code_Charts.

Marc Lodewijck has been the main contributor to the creation of the French-language Unicode 13.0 code charts, and more have helped in making this possible, including Patrick Andries, Alain LaBonté, Michel Suignard, François Yergeau, and a few other people.

Disclaimer: Providing these French-language code charts for Unicode 13.0 does not imply that the Unicode Consortium will create such code charts (in French or other languages other than English) for future versions of the Unicode Standard. Unlike Unicode character names in English, their French-language equivalents are not a normative part of the Unicode Standard.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages


Thursday, June 18, 2020

Unicode Regular Expressions v21 Released

Regex image Regular expressions are a powerful tool for using patterns to search and modify text, and are vital in many programs, programming languages, databases, and spreadsheets.

Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance levels for supporting Unicode in regular expressions. The new version 21 broadens the scope of properties for regular expressions (regex) to allow for properties of strings (such as for emoji sequences). For example, the following matches all emoji flags except the French flag:


Among the improvements are:
  • Provides a new Annex D: Resolving Character Classes with Strings for handling negations of sets of strings.
  • Updates the full property list to include the latest UCD properties, plus Emoji properties and UTS #39 properties.
  • Removes obsolete text passages, and makes editorial changes for clarity.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages


Unicode Consortium Announces New Additions to Leadership Team

Logo image We are pleased to announce the following leadership additions at the Unicode Consortium. “Each of these individuals brings deep expertise in their field,” said Mark Davis, president of the Consortium. “They have already made significant improvements in their new roles.”

Unicode Emoji Subcommittee

Chair: Jennifer Daniel
Jennifer Daniel’s first contribution to Unicode was standardizing gender inclusive representations in emoji. As a designer, author and former graphics editor at the New York Times, she now explores communication and messaging through verbal, written, auditory and visual expression at a small ad company called Google. Jennifer is a co-author and illustrator of a number of graphics books including How to Be Human, Space!, and the Origins of Almost Everything. Her work has been recognized by the Walker Art Museum, Society of Illustrators and published in the New Yorker, The Washington Post, and Time Magazine to name a few. She has had the honor to serve as a judge for the Society of News Design, Online News Association, Society of Illustrators, American Illustration, Data is Beautiful and the Art Director's Club. She lives in Berkeley, California but also in cyberspace.

Vice Chair: Ned Holbrook
Ned Holbrook is a typographic engineer at Apple, specializing in text layout and fonts. He was one of the participants in the industry-wide effort to standardize variable font technology in OpenType. He previously worked on wireless networking, virtualization, digital audio, embedded graphics, and remote filesystems.

Unicode CLDR Committee

Vice Chair: Kristi Lee
Kristi Lee is the CLDR technical committee vice-chair, and she represents Microsoft in the CLDR technical committee. She joined Microsoft in 1997 and has worked in a number of different divisions and product development groups. Her focus has been delivering solutions to international customers in localization and internationalization. She holds a mathematics degree from University of Washington. Currently, she is in the Corporate division in Microsoft and works with engineering groups across Microsoft including Windows, .NET, Office, and others on topics relating to CLDR and i18n.

Executive Officer

General Counsel: Anne Gundelfinger
Anne is an experienced legal executive with 30 years in private practice and in-house legal roles. From 2013-2019 she served as vice president for global intellectual property for Swarovski, a global fashion jewelry brand based in central Europe. Before that she held various positions over a decade in the Intel legal department including vice president for global public policy, vice president for global sales & marketing legal affairs, and director of trademarks & brands. Early in her career she was an associate at Fenwick & West and director of trademarks at Sun Microsystems. Since retiring from Swarovski, Anne has been a consultant and has served as a World Intellectual Property Organization domain name panelist under the Uniform Dispute Resolution Policy of ICANN. Anne has long been a leader in the global IP bar. She served on the Board of Directors of the International Trademark Association for nearly a decade and served as the Association’s president in 2005.

Mark Davis, the former chair of the emoji subcommittee, will continue to contribute to the emoji subcommittee and serve as president of the Unicode Consortium. “I’d also like to thank John Emmons for his many years of service as chair and vice chair of the CLDR technical committee,” said Davis. “Especially for his work in promoting support for digitally disadvantaged languages.”

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages