Unicode® ICU 64 has just been released. It updates to
Unicode 12 and to
CLDR 35 locale data with many additions and corrections, and some new languages. ICU adds a data filtering/subsetting mechanism, improved formatting API, and a C++ LocaleBuilder.
ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).
For details please see http://site.icu-project.org/download/64.
Friday, March 29, 2019
Wednesday, March 27, 2019
Unicode CLDR Version 35 Language/Locale Data Released
Unicode CLDR 35 provides an update to the key building blocks for
software supporting the world's languages. CLDR data is used by all
major software
systems for their software internationalization and localization, adapting
software to the conventions of different languages for such common software
tasks.
CLDR 35 included a limited Survey Tool data collection phase. The following summarizes the changes in the release.
A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.
For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes.
CLDR 35 included a limited Survey Tool data collection phase. The following summarizes the changes in the release.
Data | 70,000+ new data fields, 13,400+ revised data fields |
Basic coverage | New languages at Basic coverage: Cebuano (ceb), Hausa (ha), Igbo (ig), Yoruba (yo) |
Modern coverage | Languages Somali (so) and Javanese (jv) increased coverage from Moderate to Modern |
Emoji 12.0 | Names and annotations (search keywords) for 90+ new emoji; Also includes fixes for previous names & keywords |
Collation | Collation updated to Unicode 12.0, including new emoji; Japanese single-character (ligature) era names added to collation and search collation |
Measurement units | 23 additional units |
Date formats | Two additional flexible formats, and 20 new interval formats |
Japanese calendar | In Japanese locale, updated to use Gannen (元年) year numbering for non-numeric formats (which include 年), and to consistently use narrow eras in numeric date formats such as “H31/3/27”. |
Region Names | Many names updated to local equivalents of “North Macedonia” (MK) and “Eswatini” (SZ). |
Segmentation | Enhanced Grapheme Cluster Boundary rules for 6 Indic scripts: Gujr, Telu, Mlym, Orya, Beng, Deva. |
A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.
For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes.
Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages
Tuesday, March 19, 2019
Adopt-A-Character Grant to Support Maya Inscriptional Hieroglyphs
The Adopt-a-Character Program is awarding the third in a series of grants to support the encoding of Maya hieroglyphs and their study by researchers. This grant is a continuation of earlier AAC-funded efforts to incorporate information from Maya codices into a multidimensional database. The work in 2019 will focus on hieroglyphs inscribed on monuments, and will fund work to advance the understanding of the corpus of inscriptional hieroglyphs by including this dataset in the multidimensional database developed for the Maya script. This work will further understanding of an appropriate encoding model for these complex hieroglyphs and will also provide support for new research work on the Maya script through the updated database.
The work will be led by Dr. Gabrielle Vail (Research Labs of Archaeology, University of North Carolina, Chapel Hill and Anthropology Program, University of South Florida, St. Petersburg) under the direction of Dr. Deborah Anderson (SEI, UC Berkeley).
The image included in this announcement is text from a lintel from the Maya site of Yaxchilan, Mexico. Photo by Gabrielle Vail.
The work will be led by Dr. Gabrielle Vail (Research Labs of Archaeology, University of North Carolina, Chapel Hill and Anthropology Program, University of South Florida, St. Petersburg) under the direction of Dr. Deborah Anderson (SEI, UC Berkeley).
The image included in this announcement is text from a lintel from the Maya site of Yaxchilan, Mexico. Photo by Gabrielle Vail.
Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages
Thursday, March 14, 2019
Emoji 12.0 Now Available for Adoption
The latest
Unicode Emoji can now be adopted. See
Emoji
Recently Added, v12.0 for a full list.
You can adopt one of the new emoji yourself, or for friends, family, and so on. While the new emoji will appear on mobile phones and other devices later this year, you can adopt them right now! Gold level adoptions are special — if you adopt an emoji at the gold level, you are guaranteed to be the only sponsor at that level.
Your sponsorship helps to support the Unicode Consortium’s mission to enable a growing number of languages to be used on computers. The Adopt-a-Character program funds work on digitally disadvantaged languages, both modern and historic. In 2018 and 2019 the program awarded grants to support work on improved keyboard layouts, additional work on Mayan hieroglyphs, and more historic Indic scripts, among others.
You can now also adopt any of the nearly 500 other characters in Unicode 12.0, and of course you can adopt from any of the over 136,000 characters already in Unicode.
For more information on the program, or to adopt a character, see the Adopt-a-Character Page.
sloth |
otter |
waffle |
ice cube |
ringed planet |
flamingo |
You can adopt one of the new emoji yourself, or for friends, family, and so on. While the new emoji will appear on mobile phones and other devices later this year, you can adopt them right now! Gold level adoptions are special — if you adopt an emoji at the gold level, you are guaranteed to be the only sponsor at that level.
Your sponsorship helps to support the Unicode Consortium’s mission to enable a growing number of languages to be used on computers. The Adopt-a-Character program funds work on digitally disadvantaged languages, both modern and historic. In 2018 and 2019 the program awarded grants to support work on improved keyboard layouts, additional work on Mayan hieroglyphs, and more historic Indic scripts, among others.
You can now also adopt any of the nearly 500 other characters in Unicode 12.0, and of course you can adopt from any of the over 136,000 characters already in Unicode.
For more information on the program, or to adopt a character, see the Adopt-a-Character Page.
Over 136,000 characters are available for
adoption
to help the Unicode Consortium’s work on digitally disadvantaged languages
Tuesday, March 5, 2019
Announcing The Unicode® Standard, Version 12.0
Version 12.0 of the Unicode Standard is now available, including the core
specification, annexes, and data files. This version adds 554 characters, for a
total of 137,929 characters. These additions include four new scripts, for a
total of 150 scripts, as well as 61 new emoji characters.
The new scripts and characters in Version 12.0 add support for lesser-used languages and unique written requirements worldwide, including:
Popular symbol additions include:
Also in Version 12.0, the following Unicode Standard Annexes have notable modifications, often in coordination with changes to character properties. In particular, there are changes to:
The new scripts and characters in Version 12.0 add support for lesser-used languages and unique written requirements worldwide, including:
- Elymaic, historically used to write Achaemenid Aramaic in the southwestern portion of modern-day Iran
- Nandinagari, historically used to write Sanskrit and Kannada in southern India
- Nyiakeng Puachue Hmong, used to write modern White Hmong and Green Hmong languages in Laos, Thailand, Vietnam, France, Australia, Canada, and the United States
- Wancho, used to write the modern Wancho language in India, Myanmar, and Bhutan
- Miao script additions to write several Miao and Yi dialects in China
- Hiragana and Katakana small letters, used to write archaic Japanese
- Tamil historic fractions and symbols, used in South India
- Lao letters used to write Pali
- Latin letters used in Egyptological and Ugaritic transliteration
- Hieroglyph format controls, enabling full formatting of quadrats for Egyptian Hieroglyphs
Popular symbol additions include:
- 61 emoji characters, including several new emoji for accessibility
- Marca registrada sign
- Heterodox and fairy chess symbols
Also in Version 12.0, the following Unicode Standard Annexes have notable modifications, often in coordination with changes to character properties. In particular, there are changes to:
- UAX #14, Unicode Linebreaking Algorithm
- UAX #29, Unicode Text Segmentation
- UAX #31, Unicode Identifier and Pattern Syntax
- UAX #38, Unicode Han Database (Unihan)
- UAX #45, U-Source Ideographs
- UTS #10, Unicode Collation Algorithm—sorting Unicode text
- UTS #39, Unicode Security Mechanisms—reducing Unicode spoofing
- UTS #46, Unicode IDNA Compatibility Processing—compatible processing of non-ASCII URLs
Over 130,000 characters are available for
adoption
to help the Unicode Consortium’s work on digitally disadvantaged languages