Wednesday, June 12, 2019

Unicode 12.0 Paperback Available

Unicode 12.0 POD image The Unicode 12.0 core specification is now available in paperback book form with a new, original cover design by Monica Tang. This edition consists of a pair of modestly priced print-on-demand volumes containing the complete text of the core specification of Version 12.0 of the Unicode Standard.

Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. The cost for the pair is US $23.46, plus shipping and taxes (if applicable). Please visit the description page to order.

Note that these volumes do not include the Version 12.0 code charts, nor do they include the Version 12.0 Standard Annexes and Unicode Character Database, which are all freely available on the Unicode website.

Purchase The Unicode Standard, Version 12.0 - Core Specification


Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Monday, June 10, 2019

Unicode Adlam Chart Font Updated

Adlam chart image The Unicode Consortium has recently updated the current code charts for the Adlam script specifically to provide improved reference glyphs that align with current community practice. The new font is an updated Ebrima font, with the update coordinated by Judy Safran-Aasen and the font designed by Jamra Patel.

The new Adlam code chart can be viewed along with all of the current code charts.

The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard and its associated standards and data form the foundation for CLDR and ICU releases.


Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Monday, June 3, 2019

CLDR v36 open for data submission

The Unicode CLDR Technical Committee is pleased to announce the opening of the CLDR Survey Tool for general data submission. CLDR relies on community contributions for its ongoing data refinement and to offer new data to the CLDR user community. The collected data will be released as Version 36 on October 15.

Unicode CLDR provides key building blocks for software to support the world's languages, and is used by much of the world’s software — for example, all major browsers and all modern mobile phones use CLDR for language support.

Version 36 is focusing on:
  • New measurement units and patterns
  • New names and search keywords for the draft candidate emoji for Emoji 13.0 (scheduled for release in 2020Q1)
  • Adding more locales for data contributions
  • Fleshing out Islamic calendar support
  • Improving translation quality in general
For more information on contributing to CLDR, see the CLDR Information Hub. If you would like to contribute missing data for your language, see Survey Tool Accounts.

The Common Locale Data Repository (CLDR) provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks as:
  • Locale-specific patterns for formatting and parsing: dates, times, timezones, numbers and currency values, measurement units,…
  • Translations of names: languages, scripts, countries and regions, currencies, eras, months, weekdays, day periods, time zones, cities, and time units, emoji characters and sequences (and search keywords),…
  • Language & script information: characters used; plural cases; gender of lists; capitalization; rules for sorting & searching; writing direction; transliteration rules; rules for spelling out numbers; rules for segmenting text into graphemes, words, and sentences; keyboard layouts;…
  • Country information: language usage, currency information, calendar preference, week conventions,…
  • Validity: Definitions, aliases, and validity information for Unicode locales, languages, scripts, regions, and extensions,…



Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Monday, May 13, 2019

IUC 43 Program Announced


Join Us At IUC 43!

Trained, Tested, Trusted: Understand best practices in process and among teams reliably delivering high quality global products. Examine how developers build, test, and deploy great global products. Explore technologies for design, localization, multilingual testing, workflow management, and content management.

Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

For over 28 years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. Join us in Santa Clara to promote your ideas and experiences working with natural languages, multicultural user interfaces, producing and supporting multinational and multilingual products, linguistic algorithms, applying internationalization across mobile and social media platforms, or advancements in relevant standards.

Track and Session Topics to Include:
  • Automation
  • Emojis
  • Internationalization
  • Programming
  • Case Studies
  • ICU/CLDR
  • Localization
  • Scripts

GOLD SPONSOR:
Adobe
MEDIA SPONSOR:
Multilingual

Tuesday, May 7, 2019

Unicode コンソーシアムは「令和」をサポートする Unicode 12.1 を正式リリースしました

reiwa era image English (英語) here

バージョン 12.1 に関する情報とデータファイルがこちらに公開されています。12.1 では日本の新年号「令和」の合字一文字のみを加えており、総文字数は 137,929 となりました。

追加された文字は、日付の和暦表示に合字を使う場合に必須です。ユニコードコンソーシアムメンバー一同は、日本政府の新年号発表後直ちに、この合字に対応できたことを喜んでいます。なお、Unicode Common Locale Data Repository(CLDR)と International Components for Unicode(ICU)も、「令和」に対応するようアップデートされています。

以下を含む、主要ファイルがすべてアップデートされています。
  • Unicode 文字データファイル—UCD データファイル
  • Unicode 照合アルゴリズム—DUCET データファイル
Unicode 標準は、オペレーティングシステム、ブラウザー、モバイル機器、さらにインターネットと Web(URLs, HTML, XML, CSS, JSON 等)を含む、最新のソフトウェアやグローバル通信の基盤となるものです。Unicode 標準は、文字セット標準に関するデータと、CLDR、ICU から構成されています。

(Translated into Japanese by Mina Nishimura, Adobe Inc.)


ユニコードコンソーシアムは不利な条件下にある言語のサポートにも取り組んでいます。「Adopt a Character」を通して、ご支援ください。13万6千個以上もの文字から気に入った文字を選べます。

[badge]

Unicode Version 12.1 released in support of the Reiwa Era

reiwa era image 日本語 (Japanese) はこちら

Version 12.1 of the Unicode Standard is now available with updated data files. This version adds a single important character, the square ligature for the name of the new Japanese era, Reiwa (令和), for a total of 137,929 characters.

This new character in Version 12.1 adds critical support for those Japanese implementations that depend on the ligature form of Japanese era names when presenting calendar information. The Unicode Consortium and its members are pleased to add support for this important new character with a timely release of Unicode Version 12.1, shortly after the selection of the name was finalized by the Japanese government. The Unicode Common Locale Data Repository (CLDR) and International Components for Unicode (ICU) have also been updated for the new calendar data during this time.

Critical data files have been updated for Version 12.1, including:
  • Unicode Character Database—UCD data files
  • Unicode Collation Algorithm—DUCET data files
The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard and its associated standards and data form the foundation for CLDR and ICU releases.


Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, April 17, 2019

CLDR Version 35.1 Language/Locale Data Released for Reiwa Era, Unicode 12.1

reiwa era image Unicode CLDR 35.1 is a dot release focusing on calendar and date format support for the new Reiwa era in Japan, including support for the upcoming Unicode 12.1. Version 35.1 is the latest version of CLDR, the core open-source language data that major software systems use to adapt software to the conventions of over 80 different languages. The open-source Unicode ICU library incorporates the CLDR Version 35.1 data as part of its ICU 64.2 release. ICU code is used by many products for Unicode and language support, including Android, Cloudant, ChromeOS, Db2, iOS, macOS, Windows, and many others.

In addition to updates related to the new Reiwa era, the CLDR 35.1 release includes a small number of other updates, including more localized name updates for North Macedonia, and support for tzdata 2019a.

For further details and links to documentation, see the CLDR 35.1 Release Notes.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Friday, March 29, 2019

ICU 64 Released

ICU LogoUnicode® ICU 64 has just been released. It updates to Unicode 12 and to CLDR 35 locale data with many additions and corrections, and some new languages. ICU adds a data filtering/subsetting mechanism, improved formatting API, and a C++ LocaleBuilder.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details please see http://site.icu-project.org/download/64.

Wednesday, March 27, 2019

Unicode CLDR Version 35 Language/Locale Data Released

mechanical arm emoji image Unicode CLDR 35 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 35 included a limited Survey Tool data collection phase. The following summarizes the changes in the release.

Data 70,000+ new data fields, 13,400+ revised data fields
Basic coverage New languages at Basic coverage: Cebuano (ceb), Hausa (ha), Igbo (ig), Yoruba (yo)
Modern coverage Languages Somali (so) and Javanese (jv) increased coverage from Moderate to Modern
Emoji 12.0 Names and annotations (search keywords) for 90+ new emoji; Also includes fixes for previous names & keywords
Collation Collation updated to Unicode 12.0, including new emoji; Japanese single-character (ligature) era names added to collation and search collation
Measurement units 23 additional units
Date formats Two additional flexible formats, and 20 new interval formats
Japanese calendar In Japanese locale, updated to use Gannen (元年) year numbering for non-numeric formats (which include 年), and to consistently use narrow eras in numeric date formats such as “H31/3/27”.
Region Names Many names updated to local equivalents of “North Macedonia” (MK) and “Eswatini” (SZ).
Segmentation Enhanced Grapheme Cluster Boundary rules for 6 Indic scripts: Gujr, Telu, Mlym, Orya, Beng, Deva.

A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.

For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, March 19, 2019

Adopt-A-Character Grant to Support Maya Inscriptional Hieroglyphs

Image from Maya site of Yaxchilan, Mexico The Adopt-a-Character Program is awarding the third in a series of grants to support the encoding of Maya hieroglyphs and their study by researchers. This grant is a continuation of earlier AAC-funded efforts to incorporate information from Maya codices into a multidimensional database. The work in 2019 will focus on hieroglyphs inscribed on monuments, and will fund work to advance the understanding of the corpus of inscriptional hieroglyphs by including this dataset in the multidimensional database developed for the Maya script. This work will further understanding of an appropriate encoding model for these complex hieroglyphs and will also provide support for new research work on the Maya script through the updated database.

The work will be led by Dr. Gabrielle Vail (Research Labs of Archaeology, University of North Carolina, Chapel Hill and Anthropology Program, University of South Florida, St. Petersburg) under the direction of Dr. Deborah Anderson (SEI, UC Berkeley).

The image included in this announcement is text from a lintel from the Maya site of Yaxchilan, Mexico. Photo by Gabrielle Vail.


Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Thursday, March 14, 2019

Emoji 12.0 Now Available for Adoption

The latest Unicode Emoji can now be adopted. See Emoji Recently Added, v12.0 for a full list.


sloth

otter

waffle

ice cube

ringed planet

flamingo

You can adopt one of the new emoji yourself, or for friends, family, and so on. While the new emoji will appear on mobile phones and other devices later this year, you can adopt them right now! Gold level adoptions are special — if you adopt an emoji at the gold level, you are guaranteed to be the only sponsor at that level.

Your sponsorship helps to support the Unicode Consortium’s mission to enable a growing number of languages to be used on computers. The Adopt-a-Character program funds work on digitally disadvantaged languages, both modern and historic. In 2018 and 2019 the program awarded grants to support work on improved keyboard layouts, additional work on Mayan hieroglyphs, and more historic Indic scripts, among others.

You can now also adopt any of the nearly 500 other characters in Unicode 12.0, and of course you can adopt from any of the over 136,000 characters already in Unicode.

For more information on the program, or to adopt a character, see the Adopt-a-Character Page.


Over 136,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Tuesday, March 5, 2019

Announcing The Unicode® Standard, Version 12.0

Medinet Habu Temple Ceiling (Wikipedia)_with Text Version 12.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 554 characters, for a total of 137,929 characters. These additions include four new scripts, for a total of 150 scripts, as well as 61 new emoji characters.

The new scripts and characters in Version 12.0 add support for lesser-used languages and unique written requirements worldwide, including:
  • Elymaic, historically used to write Achaemenid Aramaic in the southwestern portion of modern-day Iran
  • Nandinagari, historically used to write Sanskrit and Kannada in southern India
  • Nyiakeng Puachue Hmong, used to write modern White Hmong and Green Hmong languages in Laos, Thailand, Vietnam, France, Australia, Canada, and the United States
  • Wancho, used to write the modern Wancho language in India, Myanmar, and Bhutan
Additional support for lesser-used languages and scholarly work was extended worldwide, including:
  • Miao script additions to write several Miao and Yi dialects in China
  • Hiragana and Katakana small letters, used to write archaic Japanese
  • Tamil historic fractions and symbols, used in South India
  • Lao letters used to write Pali
  • Latin letters used in Egyptological and Ugaritic transliteration
  • Hieroglyph format controls, enabling full formatting of quadrats for Egyptian Hieroglyphs
The Egyptian temple ceiling painting shown above (from the Wikipedia article on Medinet Habu) includes a line of hieroglyphic text. That exact text is rendered again below the painting, represented in Unicode plain text, illustrating the use of the new hieroglyphic format controls, as well as cartouche brackets and directional controls. The example was developed by Andrew Glass, based on Microsoft’s Segoe UI Historic font, with outlines designed by James P. Allen.

Popular symbol additions include:
  • 61 emoji characters, including several new emoji for accessibility
  • Marca registrada sign
  • Heterodox and fairy chess symbols
For the full list of new emoji characters, see emoji additions for Unicode 12.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji. Version 12.0 also includes additional guidelines on gender and skin tone included in UTS #51 and data files.

Also in Version 12.0, the following Unicode Standard Annexes have notable modifications, often in coordination with changes to character properties. In particular, there are changes to:
Three other important Unicode specifications have been updated for Version 12.0:
The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.



Over 130,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

[badge]

Wednesday, February 27, 2019

Unicode CLDR 35 alpha available for testing

The alpha version of Unicode CLDR 35 is available for testing. The alpha period lasts until the beta release on March 13, which will include updates to the LDML spec. The final release is expected on March 27.

Unicode CLDR 35 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 35 included a limited Survey Tool data collection phase, adding approximately 54 thousand new translated fields:

Basic coverage New languages at Basic coverage: Cebuano (ceb), Hausa (ha), Igbo (ig), Yoruba (yo)
Modern coverage Languages Somali (so) and Javanese (jv) has additional coverage from Moderate to Modern
Emoji 12.0 Names and annotations (search keywords) for 90+ new emoji;
Also includes fixes for previous names & keywords
Collation Collation updated to Unicode 12.0, including new emoji;
Japanese single-character (ligature) era names added to collation and search collation
Measurement units  23 additional units
Date formats Two additional flexible formats, and 20 new interval formats
Japanese calendar Updated to Gannen (元年) number format
Region Names Many names updated to local equivalents of  “North Macedonia” (MK) and “Eswatini” (SZ)

A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.

For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes, Growth.

Tuesday, February 5, 2019

Unicode Emoji 12.0 — final for 2019

emoji 12 image Emoji 12.0 data has been released, with 59 new emoji such as:

mechanical arm image
mechanical arm
deaf person image
deaf person
people holding hands image
people holding hands
otter image
otter
waffle image
waffle
ice cube image
ice cube
ringed planet image
ringed planet
drop of blood image
drop of blood

With 171 variants for gender and skin tone, this makes a total of 230 emoji including variants, such as:

The new emoji are listed in Emoji Recently Added v12.0, with sample images. These images are just samples: vendors for mobile phones, PCs, and web platforms will typically use images that fit their overall emoji designs. In particular, the Emoji Ordering v12.0 chart shows how the new emoji sort compared to the others, with new emoji marked with rounded-rectangles. The other Emoji Charts for Version 12.0 have been updated to show the emoji.

The new emoji typically start showing up on mobile phones in September/October — some platforms may release them earlier. The new emoji will soon be available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

For implementers:
  1. The new Emoji 12.0 set includes the data needed for vendors to begin working on their emoji fonts and code ahead of the release of Unicode 12.0, scheduled for March 5.
  2. The emoji specification (UTS #51) has additional guidelines on gender and skin tone, and other clarifications. The definitions in UTS #51 and data files and have been enhanced to be more consistent and useful. For details, see Modifications
  3. The people holding hands emoji now have four combinations of gender and all the various combinations of skin tones, for a total of 71 new variants. Implementations may optionally support skin-tone combinations for other multi-person emoji.
  4. The CLDR names and search keywords for the new emoji characters in over 80 languages, and the sort order for emoji, will be finalized by the end of March with the release of CLDR v35.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Thursday, January 31, 2019

Membership Fee Changes

The Unicode Consortium is announcing changes to membership fees and categories. These changes include both the addition of a new membership category as well as a periodic adjustment of membership fees for inflation. These fee changes put the Consortium in a stronger position to continue its mission to enable people around the world to use computers in any language by providing freely-available specifications and data. Note also that a new category for Supporting, non-profit has been created. All other existing non-profit and individual memberships will have no change.

As of June 1, 2019, the annual membership fee will change as described in the following table:

Membership Levels Current Fee New Fee
Full $18,000 $21,000
Institutional, governmental $12,000 $14,000
Supporting, for-profit organization $7,500 $8,750
Supporting, non-profit organization N/A $5,000
Associate, for-profit organization $2,500 $2,900

Existing members may renew their membership early at the current fee if they renew by May 31, 2019.

The Consortium continues to offer a multi-year discount option for all membership levels, when renewal fees are paid in advance:
  • 10 years, 20% discount
  • 5 years, 10% discount
  • 3 years, 6% discount
The Consortium also offers lifetime memberships to individual members. For further information please contact the Unicode office.


Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]