Thursday, June 18, 2020

Unicode Regular Expressions v21 Released

Regex image Regular expressions are a powerful tool for using patterns to search and modify text, and are vital in many programs, programming languages, databases, and spreadsheets.

Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance levels for supporting Unicode in regular expressions. The new version 21 broadens the scope of properties for regular expressions (regex) to allow for properties of strings (such as for emoji sequences). For example, the following matches all emoji flags except the French flag:

/[\p{RGI_Emoji_Flag_Sequence}--\q{🇫🇷}]/

Among the improvements are:
  • Provides a new Annex D: Resolving Character Classes with Strings for handling negations of sets of strings.
  • Updates the full property list to include the latest UCD properties, plus Emoji properties and UTS #39 properties.
  • Removes obsolete text passages, and makes editorial changes for clarity.


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Unicode Consortium Announces New Additions to Leadership Team

Logo image We are pleased to announce the following leadership additions at the Unicode Consortium. “Each of these individuals brings deep expertise in their field,” said Mark Davis, president of the Consortium. “They have already made significant improvements in their new roles.”

Unicode Emoji Subcommittee

Chair: Jennifer Daniel
Jennifer Daniel’s first contribution to Unicode was standardizing gender inclusive representations in emoji. As a designer, author and former graphics editor at the New York Times, she now explores communication and messaging through verbal, written, auditory and visual expression at a small ad company called Google. Jennifer is a co-author and illustrator of a number of graphics books including How to Be Human, Space!, and the Origins of Almost Everything. Her work has been recognized by the Walker Art Museum, Society of Illustrators and published in the New Yorker, The Washington Post, and Time Magazine to name a few. She has had the honor to serve as a judge for the Society of News Design, Online News Association, Society of Illustrators, American Illustration, Data is Beautiful and the Art Director's Club. She lives in Berkeley, California but also in cyberspace.

Vice Chair: Ned Holbrook
Ned Holbrook is a typographic engineer at Apple, specializing in text layout and fonts. He was one of the participants in the industry-wide effort to standardize variable font technology in OpenType. He previously worked on wireless networking, virtualization, digital audio, embedded graphics, and remote filesystems.

Unicode CLDR Committee

Vice Chair: Kristi Lee
Kristi Lee is the CLDR technical committee vice-chair, and she represents Microsoft in the CLDR technical committee. She joined Microsoft in 1997 and has worked in a number of different divisions and product development groups. Her focus has been delivering solutions to international customers in localization and internationalization. She holds a mathematics degree from University of Washington. Currently, she is in the Corporate division in Microsoft and works with engineering groups across Microsoft including Windows, .NET, Office, and others on topics relating to CLDR and i18n.

Executive Officer

General Counsel: Anne Gundelfinger
Anne is an experienced legal executive with 30 years in private practice and in-house legal roles. From 2013-2019 she served as vice president for global intellectual property for Swarovski, a global fashion jewelry brand based in central Europe. Before that she held various positions over a decade in the Intel legal department including vice president for global public policy, vice president for global sales & marketing legal affairs, and director of trademarks & brands. Early in her career she was an associate at Fenwick & West and director of trademarks at Sun Microsystems. Since retiring from Swarovski, Anne has been a consultant and has served as a World Intellectual Property Organization domain name panelist under the Uniform Dispute Resolution Policy of ICANN. Anne has long been a leader in the global IP bar. She served on the Board of Directors of the International Trademark Association for nearly a decade and served as the Association’s president in 2005.

Mark Davis, the former chair of the emoji subcommittee, will continue to contribute to the emoji subcommittee and serve as president of the Unicode Consortium. “I’d also like to thank John Emmons for his many years of service as chair and vice chair of the CLDR technical committee,” said Davis. “Especially for his work in promoting support for digitally disadvantaged languages.”


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Friday, June 12, 2020

Unicode 13.0 Paperback Available

[U13 cover image] The Unicode 13.0 core specification is now available in paperback book form with a new, original cover design by Huijun Shan. This edition consists of a pair of modestly priced print-on-demand volumes containing the complete text of the core specification of Version 13.0 of the Unicode Standard.

Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. Please visit the separate description pages for Volume 1 and Volume 2 to order each volume in the set. The cost for the pair is US $29.58, plus shipping and taxes (if applicable).

Note that these volumes do not include the Version 13.0 code charts, nor do they include the Version 13.0 Standard Annexes and Unicode Character Database, which are all freely available on the Unicode website.

Purchase The Unicode Standard, Version 13.0 - Core Specification Volume 1 and Volume 2


Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, June 10, 2020

PRI #418: Registration of additional sequences in the MSARG collection

[IVD image] The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #418: A submission for the “Registration of additional sequences in the MSARG collection” has been received by the IVD registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2020-09-11. Please see the submission page for details and instructions on how to review this issue and provide comments:

https://www.unicode.org/ivd/pri/pri418/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for ideographs, which enables standardized interchange in plain text, in accordance with UTS #37.

Friday, April 24, 2020

ICU 67 Released

ICU LogoUnicode® ICU 67 has just been released. ICU 67 updates to CLDR 37 locale data with many additions and corrections. This release also includes the updates to Unicode 13, subsuming the special CLDR 36.1 and ICU 66 releases. ICU 67 includes many bug fixes for date and number formatting, including enhanced support for user preferences in the locale identifier. The LocaleMatcher code and data are improved, and number skeletons have a new “concise” form that can be used in MessageFormat strings.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details, please see http://site.icu-project.org/download/67.