The Unicode Blog: 2019

Friday, November 22, 2019

Unicode solicits feedback on open emoji definition: QID Emoji

The QID Emoji Tag Sequences (or QID emoji, for short) have been proposed to further open up the process of defining new emoji.

The proposal is intended to provide a well-defined mechanism for implementations to support additional valid emoji that are not representable by Unicode characters or standard emoji sequences. This proposed new mechanism would allow for the interchange of emoji whose meaning is discoverable, and which should be correctly parsed by all conformant implementations (although only displayed by implementations that support it). The meaning of each of these valid emoji would be established by reference to a Wikidata QID.

The Unicode Consortium would appreciate feedback on this proposal.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Thursday, November 21, 2019

Call for feedback on UTS #18: Unicode Regular Expressions

Regular expressions are a powerful tool for using patterns to search and modify text. They are a key component of many programming languages, databases, and spreadsheets.

Starting in 1999, UTS #18: Unicode Regular Expressions has supplied guidelines and conformance levels for supporting Unicode in regular expressions. A proposed update of that specification is now available for public review and comment. The following are the main modifications in this draft:

Broadened the scope of properties to allow for properties of strings (as well as properties of code points).
Added 11 Emoji properties including RGI sets as Full Properties in Level 2.
Added other new properties as Full Properties in Level 2: Equivalent_Unified_Ideograph, Vertical_Orientation, Regional_Indicator, Indic_Positional_Category, Indic_Syllabic_Category.
Provided a draft data file with property metadata for matching and validating non-UCD properties and their values for syntax such as \p{pname=pvalue}, so that such properties can be used in the same way as UCD properties. See Annex D.

There are a number of review notes requesting feedback on these and other possible changes. In particular, the Unicode Technical Committee would appreciate feedback on the discussion of and syntax for properties of strings, and on the recommended properties to be supported at Level 2.

The review period closes on 2020-01-06. For more information on reviewing and supplying feedback, see Proposed Update UTS #18, Unicode Regular Expressions.

Tuesday, November 19, 2019

Unicode 13.0 Beta Review

The beta review period for Unicode 13.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 13.0 includes a number of changes and 5,930 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 13.0, often in coordination with changes to character properties. For the first time, a CJK extension has been encoded in plane 3, the Tertiary Ideographic Plane. Four new scripts have been added in Unicode 13.0. There are also 55 additional emoji characters and many other new emoji, including the transgender flag and polar bear.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by January 6, 2020. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-13.0.0.html for more information about testing the 13.0.0 beta.

See http://unicode.org/versions/Unicode13.0.0/ for the current draft summary of Unicode 13.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to https://home.unicode.org/membership/members/.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Monday, October 28, 2019

Unicode 2019 Bulldog Awards

Image of Heninger (left) and Lindenberg (right)

The Unicode Consortium announces the 2019 Bulldog Award recipients: Andy Heninger and Norbert Lindenberg.

Andy Heninger is recognized for many years of contributions to the work of the Consortium, including providing crucial implementations of segmentation and regular expression support in International Components for Unicode (ICU). Prior to having these functions in ICU, support for them in Unicode implementations was very limited. Both contributions are key to robust text support. For example, correct segmentation is what keeps family emoji from splitting apart!

Norbert Lindenberg has made significant contributions over the years to internationalizing the Web and has brought deep script expertise to the Unicode Script Ad Hoc group. He has contributed to the models of many of the Unicode Standard’s complex scripts, including Thai, Myanmar, Khmer, Javanese, and Tamil. His work has been used by organizations such as Mozilla, Yahoo!, Sun Microsystems, and Apple.

For many years, both have been bulldogs for robust Unicode text support.

More details of their many contributions can be found on the Unicode Bulldog Award page.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Monday, October 21, 2019

Emoji 12.1 release: 168 Emoji added

Emoji 12.1, with 168 new emoji, has been released. There are 138 new gender-neutral forms, so you will soon be able to text about people without specifying their gender. Thirty new combinations of people holding hands with various skin tones were also added.

The new emoji are listed in Emoji Recently Added v12.1, along with sample images. These images are merely samples: vendors for mobile phones, PCs, and web platforms will typically design their own fonts for emoji. In particular, the Emoji Ordering v12.1 chart shows how the new emoji should be sorted within the order of existing emoji, with new emoji marked with rounded rectangles. The other Emoji Charts for Version 12.1 have been updated to show the emoji.

Initial names and search keywords are available in different languages in Unicode CLDR 36, such as health worker (doctor, nurse, ...). Those will be refined during this quarter.

The new Emoji 12.1 data is available for vendors to use for their emoji fonts and code. These new emoji should start showing up on mobile phones in this quarter and next quarter. The new emoji will soon be available for adoption to help the Unicode Consortium’s work to support digitally-disadvantaged languages.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Wednesday, October 9, 2019

The Most Frequent Emoji

How does the Unicode Consortium choose which new emoji to add? One important factor is data about how frequently the current emoji are used. Patterns of usage help to inform decisions about future emoji. The Consortium has been working to assemble this information and make it available to the public.

And the two most frequently used emoji in the world are...

😂 and ❤️

The new Unicode Emoji Frequency page shows a list of the Unicode v12.0 emoji ranked in order of how frequently they are used.

“The forecasted frequency of use is a key factor in determining whether to encode new emoji, and for that it is important to know the frequency of use of existing emoji,” said Mark Davis, President of the Unicode Consortium. “Understanding how frequently emoji are used helps prioritize which categories to focus on and which emoji to add to the Standard.”

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Friday, October 4, 2019

ICU 65 Released

Unicode® ICU 65 has just been released. It updates to CLDR 36 locale data with many additions and corrections, and some new measurement units. The Java LocaleMatcher API is improved, and ported to C++. For building ICU data, there are new filtering options, and new tracing support for data loading in ICU4C.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details please see http://site.icu-project.org/download/65.

Unicode CLDR Version 36 Language/Locale Data Released

Unicode CLDR 36 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 36 included a full Survey Tool data collection phase, adding approximately 32K new translated fields, with significant increases in moderate and/or modern coverage for: ceb (Cebuano), ha (Hausa / Latin script), ig (Igbo), kok (Konkani), qu (Quechua), to (Tongan), yo (Yoruba). Seed data was added for several new languages: cic (Chickasaw), mus (Muscogee), osa (Osage, Osage script); an (Aragonese), su (Sundanese, Latin script), szl (Silesian).

Enhancements in v36 include:

New Emoji 13 draft candidates’ names and search keywords are included in this release to support smooth adoption of the upcoming Emoji release (scheduled for release in 2020Q1 as part of Unicode 13)
New measurement units and patterns: dot-per-centimeter, dot-per-inch, em, megapixel, pixel, pixel-per-centimeter, pixel-per-inch; decade; therm-us; bar, pascal; and a pattern for combining units in a multiplicative relationship, such as “newton-meter”.
Locale IDs:
- Extended Language Matching to have fallbacks for many encompassed languages.
- Added more languageAliases from the BCP47 language subtag registry, for deprecated languages.
A new test directory added for localeIdentifiers, graphemeClusters (for currently supported Indic languages) and transliterations.

There are some infrastructure changes to be aware of, including:

The cldr repository has moved from subversion to git, and queries using Trac no longer work. See CLDR Change Requests for new information.
The data in the cldr repository now preserves votes for inherited data, indicated with “↑↑↑”. In order to generate CLDR in the previous form without “↑↑↑” and with proper minimization, a new tool GenerateProductionData is available.
Note: Release data that has been processed with GenerateProductionData is available in a parallel cldr-staging repository, with the same release tags.

The Common Locale Data Repository (CLDR) provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks as:

Locale-specific patterns for formatting and parsing: dates, times, time zones, numbers and currency values, measurement units,…
Translations of names: languages, scripts, countries and regions, currencies, eras, months, weekdays, day periods, time zones, cities, and time units, emoji characters and sequences (and search keywords),…
Language & script information: characters used; plural cases; gender of lists; capitalization; rules for sorting & searching; writing direction; transliteration rules; rules for spelling out numbers; rules for segmenting text into graphemes, words, and sentences; keyboard layouts;…
Country information: language usage, currency information, calendar preference, week conventions, …
Validity: Definitions, aliases, and validity information for Unicode locales, languages, scripts, regions, and extensions,…

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Monday, September 30, 2019

Call for Unicode 13.0 Cover Design Art

The Unicode Consortium is inviting artists and designers to submit cover design proposals for Version 13.0 of The Unicode Standard, scheduled for publications in March 2020.

The selected cover design will appear on the Unicode Standard 13.0 web pages, in the print-on-demand publications, and in associated promotional literature on the Unicode website. The artist whose design is selected for the cover will receive full credit in the colophon of the publication for which the art is used, and wherever else the design appears, and will receive $700. Two selected runner-up artists will receive $150 apiece.

Please see the announcement web page for requirements and more details.

Thursday, September 19, 2019

New Public Review Issues for Unicode Technical Reports

The Unicode Consortium has recently opened several Public Review Issues for proposed updates to Unicode Standard Annexes and other technical reports . The closing date for comments on these open issues is September 30, 2019, for feedback to be reviewed at the UTC meeting.

Highlights include a major proposed update to UTS #51, Unicode Emoji as well as significant updates to UAX #14, Unicode Line Breaking Algorithm, UTS #18, Unicode Regular Expressions, UAX #29, Unicode Text Segmentation, and UAX #38, Unicode Han Database.

Please see the Public Review Issues page for a full list of the items for review and links to the documents.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Friday, August 30, 2019

Internationalization & Unicode Conference #43: Keynote Speaker Announced

Don’t Believe a Word: Multilingual Typographic Systems and a 100-year Publishing Project

Dr. Rathna Ramanathan
Reader in Intercultural Communication and Dean of the School of Communication, Royal College of Art, London

The Murty Classical Library of India aims to make accessible modern translations of Indian texts in print and online. In the first five years, 22 volumes in 12 different languages have been published. Please join us as Dr. Ramanathan reflects on the delights and challenges of building a complex, multilingual typographic system for this unique 100-year publishing project. In addition, Dr. Ramanathan will discuss a subsequent research project which aims to create typographic guidelines for Indian languages and scripts.

A typographer, researcher and educator, known for her expertise in intercultural communication, typography and alternative publishing practices, Dr. Ramanathan has, for the past 20 years, run a design studio (based in Chennai and London) with a focus on research-led, intercultural, multi-platform graphic communication. Her practice evidences an interest in the research and design of marginalised content, endangered languages and practices in South Asia and an expertise in the design of multilingual communication.

See What’s Happening At IUC 43

For over 28 years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. Join us in Santa Clara to promote your ideas and experiences working with natural languages, multicultural user interfaces, producing and supporting multinational and multilingual products, linguistic algorithms, applying internationalization across mobile and social media platforms, or advancements in relevant standards.

Join expert practitioners and industry leaders as they present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

For further information and to register, please visit the IUC website.

Adopt-a-Character

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Wednesday, July 17, 2019

The Unicode Consortium Launches New Website in Celebration of World Emoji Day

The New Unicode.org Also Offers Emoji Enthusiasts the Chance to “Adopt a Character”

The Unicode Consortium, a nonprofit that maintains text standards to support all the world’s written languages across every device, today debuted a new look for unicode.org. The redesigned website will make information about the emoji proposal process more easily accessible while encouraging public participation and engagement in all Unicode initiatives.

“Unicode is a global technology standard that is one of the core building blocks of the internet,” said Unicode board member Greg Welch. “Unicode has helped facilitate the work of programmers and linguists from around the world since the 1990s. But with the rise of mobile devices and public enthusiasm for emoji, we knew it was time to redesign the Unicode website to make information more easily accessible, and increase community involvement.”

Emoji were adopted into the Unicode Standard in 2010 in a move that made the characters available everywhere. Today, emoji have been used by 92% of the world’s online population. And while emoji encoding and standardization make up just one small part of the Consortium’s text standards work, the growing popularity and demand for emoji have put the organization in the international spotlight.

“We’ve been working with the Unicode Consortium for several years to open up the emoji proposals process by making it more accessible and understandable,” said Jennifer 8. Lee, co-founder of Emojination. “While I personally found the late-90s aesthetic of the developer-centric Unicode.org site very retro and nerd charming, the new site redesign is a reflection of Unicode’s deep desire to engage the public in its work.”

In addition to offering a clearer picture of the emoji submission and standardization process, the new Unicode.org website offers information about the Consortium and its mission to enable people everywhere in the world to use any language on any device.

“Emoji are just one element of our broader mission,” said Mark Davis, president and co-founder of the Unicode Consortium. “The Consortium is a team of largely volunteers who are dedicated to ensuring that people all over the world can use their language of choice in digital communication across any computer, phone or other device. From English and Chinese to Cherokee, Hindi and Rohingya, the Consortium is committed to preserving every language for the digital era.”

A team of designers from Adobe provided design and branding support, as well as free access to leading design tools, to bring Unicode’s new website to life.

“The Unicode Consortium’s work to keep digitally disadvantaged languages alive is incredibly important,” said Adobe Design Program Manager Lisa Pedee. “We collaborated closely with the Consortium to develop a unique visual brand and streamlined web interface that makes everything from contributing language data to proposing an emoji more accessible, inclusive and user-friendly.”

The Consortium’s recent language work includes adding language data for Cherokee, encoding the Hanifi Rohingya script, and developing the Mayan hieroglyphic script.

The Consortium invites emoji and language enthusiasts to celebrate World Emoji Day on July 17 and “Adopt a Character” to support its ongoing efforts. More than 136,000 characters are up for adoption — including this new Emoji 12.0 additions such as the sloth, the sea otter, the waffle and Saturn.

Those who choose to adopt will receive a custom digital badge they can display to publicly show their support, whether on their website or social media. The Unicode Consortium is a 501(c)(3) charitable organization and “adoption fees” are tax-deductible in the U.S. Additionally, some companies may provide matching funds. Learn more and adopt your character here.

About the Unicode Consortium
The Unicode Consortium is a nonprofit on a mission to enable anyone to use any language across every device, globally. The Consortium develops, extends and promotes the use of the Unicode Standard, freely-available specifications and data that form the foundation for software internationalization in all major operating systems, search applications and the web.

The Unicode Consortium is open to all and comprises individuals, companies, academic institutions and governments. Members include Adobe, Apple, Emojipedia, Facebook, Google, IBM, Microsoft, Netflix, Oracle and SAP, among others. For more information, please visit http://www.unicode.org.

Tuesday, July 16, 2019

Unicode Technical Committee Considers Emoji Color Mechanism

The Unicode Technical Committee (UTC) is discussing a mechanism for color changes to existing emoji characters. Such a mechanism could be used for emoji representations of a black cat or a glass of white wine, for example. The color mechanism would use the emoji color characters (including the seven colored square characters at U+1F7E6..U+1F7EB) that were added to the Unicode Standard Version 12.0 in early 2019.

Emoji color mechanisms could potentially be defined as part of Unicode Emoji 13.0. The topic will be discussed at the upcoming July UTC meeting. Specific proposals for new colored emoji characters will not be taken up until the fundamental color mechanism has been established.

For more information, see the Working Draft for Proposed Update UTS #51: Unicode Emoji, section 2.9 “Color”.

The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard and its associated standards and data form the foundation for CLDR and ICU releases.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, June 12, 2019

Unicode 12.0 Paperback Available

The Unicode 12.0 core specification is now available in paperback book form with a new, original cover design by Monica Tang. This edition consists of a pair of modestly priced print-on-demand volumes containing the complete text of the core specification of Version 12.0 of the Unicode Standard.

Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. The cost for the pair is US $23.46, plus shipping and taxes (if applicable). Please visit the description page to order.

Note that these volumes do not include the Version 12.0 code charts, nor do they include the Version 12.0 Standard Annexes and Unicode Character Database, which are all freely available on the Unicode website.

Purchase The Unicode Standard, Version 12.0 - Core Specification

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

Monday, June 10, 2019

Unicode Adlam Chart Font Updated

The Unicode Consortium has recently updated the current code charts for the Adlam script specifically to provide improved reference glyphs that align with current community practice. The new font is an updated Ebrima font, with the update coordinated by Judy Safran-Aasen and the font designed by Jamra Patel.

The new Adlam code chart can be viewed along with all of the current code charts.

The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard and its associated standards and data form the foundation for CLDR and ICU releases.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

Monday, June 3, 2019

CLDR v36 open for data submission

The Unicode CLDR Technical Committee is pleased to announce the opening of the CLDR Survey Tool for general data submission. CLDR relies on community contributions for its ongoing data refinement and to offer new data to the CLDR user community. The collected data will be released as Version 36 on October 15.

Unicode CLDR provides key building blocks for software to support the world's languages, and is used by much of the world’s software — for example, all major browsers and all modern mobile phones use CLDR for language support.

Version 36 is focusing on:

New measurement units and patterns
New names and search keywords for the draft candidate emoji for Emoji 13.0 (scheduled for release in 2020Q1)
Adding more locales for data contributions
Fleshing out Islamic calendar support
Improving translation quality in general

For more information on contributing to CLDR, see the CLDR Information Hub. If you would like to contribute missing data for your language, see Survey Tool Accounts.

The Common Locale Data Repository (CLDR) provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks as:

Locale-specific patterns for formatting and parsing: dates, times, timezones, numbers and currency values, measurement units,…
Translations of names: languages, scripts, countries and regions, currencies, eras, months, weekdays, day periods, time zones, cities, and time units, emoji characters and sequences (and search keywords),…
Language & script information: characters used; plural cases; gender of lists; capitalization; rules for sorting & searching; writing direction; transliteration rules; rules for spelling out numbers; rules for segmenting text into graphemes, words, and sentences; keyboard layouts;…
Country information: language usage, currency information, calendar preference, week conventions,…
Validity: Definitions, aliases, and validity information for Unicode locales, languages, scripts, regions, and extensions,…

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

Monday, May 13, 2019

IUC 43 Program Announced

Join Us At IUC 43!

Trained, Tested, Trusted: Understand best practices in process and among teams reliably delivering high quality global products. Examine how developers build, test, and deploy great global products. Explore technologies for design, localization, multilingual testing, workflow management, and content management.

Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

For over 28 years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. Join us in Santa Clara to promote your ideas and experiences working with natural languages, multicultural user interfaces, producing and supporting multinational and multilingual products, linguistic algorithms, applying internationalization across mobile and social media platforms, or advancements in relevant standards.

Track and Session Topics to Include:

Automation
Emojis
Internationalization
Programming
Case Studies
ICU/CLDR
Localization
Scripts

GOLD SPONSOR:

MEDIA SPONSOR:

Tuesday, May 7, 2019

Unicode コンソーシアムは「令和」をサポートする Unicode 12.1 を正式リリースしました

English (英語) here

バージョン 12.1 に関する情報とデータファイルがこちらに公開されています。12.1 では日本の新年号「令和」の合字一文字のみを加えており、総文字数は 137,929 となりました。

追加された文字は、日付の和暦表示に合字を使う場合に必須です。ユニコードコンソーシアムメンバー一同は、日本政府の新年号発表後直ちに、この合字に対応できたことを喜んでいます。なお、Unicode Common Locale Data Repository（CLDR）と International Components for Unicode（ICU）も、「令和」に対応するようアップデートされています。

以下を含む、主要ファイルがすべてアップデートされています。

Unicode 文字データファイル—UCD データファイル
Unicode 照合アルゴリズム—DUCET データファイル

Unicode 標準は、オペレーティングシステム、ブラウザー、モバイル機器、さらにインターネットと Web（URLs, HTML, XML, CSS, JSON 等）を含む、最新のソフトウェアやグローバル通信の基盤となるものです。Unicode 標準は、文字セット標準に関するデータと、CLDR、ICU から構成されています。

(Translated into Japanese by Mina Nishimura, Adobe Inc.)

ユニコードコンソーシアムは不利な条件下にある言語のサポートにも取り組んでいます。「Adopt a Character」を通して、ご支援ください。13万6千個以上もの文字から気に入った文字を選べます。

Unicode Version 12.1 released in support of the Reiwa Era

日本語 (Japanese) はこちら

Version 12.1 of the Unicode Standard is now available with updated data files. This version adds a single important character, the square ligature for the name of the new Japanese era, Reiwa (令和), for a total of 137,929 characters.

This new character in Version 12.1 adds critical support for those Japanese implementations that depend on the ligature form of Japanese era names when presenting calendar information. The Unicode Consortium and its members are pleased to add support for this important new character with a timely release of Unicode Version 12.1, shortly after the selection of the name was finalized by the Japanese government. The Unicode Common Locale Data Repository (CLDR) and International Components for Unicode (ICU) have also been updated for the new calendar data during this time.

Critical data files have been updated for Version 12.1, including:

Unicode Character Database—UCD data files
Unicode Collation Algorithm—DUCET data files

The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard and its associated standards and data form the foundation for CLDR and ICU releases.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, April 17, 2019

CLDR Version 35.1 Language/Locale Data Released for Reiwa Era, Unicode 12.1

Unicode CLDR 35.1 is a dot release focusing on calendar and date format support for the new Reiwa era in Japan, including support for the upcoming Unicode 12.1. Version 35.1 is the latest version of CLDR, the core open-source language data that major software systems use to adapt software to the conventions of over 80 different languages. The open-source Unicode ICU library incorporates the CLDR Version 35.1 data as part of its ICU 64.2 release. ICU code is used by many products for Unicode and language support, including Android, Cloudant, ChromeOS, Db2, iOS, macOS, Windows, and many others.

In addition to updates related to the new Reiwa era, the CLDR 35.1 release includes a small number of other updates, including more localized name updates for North Macedonia, and support for tzdata 2019a.

For further details and links to documentation, see the CLDR 35.1 Release Notes.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

Friday, March 29, 2019

ICU 64 Released

Unicode® ICU 64 has just been released. It updates to Unicode 12 and to CLDR 35 locale data with many additions and corrections, and some new languages. ICU adds a data filtering/subsetting mechanism, improved formatting API, and a C++ LocaleBuilder.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details please see http://site.icu-project.org/download/64.

Wednesday, March 27, 2019

Unicode CLDR Version 35 Language/Locale Data Released

Unicode CLDR 35 provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 35 included a limited Survey Tool data collection phase. The following summarizes the changes in the release.

Data	70,000+ new data fields, 13,400+ revised data fields
Basic coverage	New languages at Basic coverage: Cebuano (ceb), Hausa (ha), Igbo (ig), Yoruba (yo)
Modern coverage	Languages Somali (so) and Javanese (jv) increased coverage from Moderate to Modern
Emoji 12.0	Names and annotations (search keywords) for 90+ new emoji; Also includes fixes for previous names & keywords
Collation	Collation updated to Unicode 12.0, including new emoji; Japanese single-character (ligature) era names added to collation and search collation
Measurement units	23 additional units
Date formats	Two additional flexible formats, and 20 new interval formats
Japanese calendar	In Japanese locale, updated to use Gannen (元年) year numbering for non-numeric formats (which include 年), and to consistently use narrow eras in numeric date formats such as “H31/3/27”.
Region Names	Many names updated to local equivalents of “North Macedonia” (MK) and “Eswatini” (SZ).
Segmentation	Enhanced Grapheme Cluster Boundary rules for 6 Indic scripts: Gujr, Telu, Mlym, Orya, Beng, Deva.

A dot release, version 35.1 is expected in April, with further changes for Japanese calendar.

For details, see Detailed Specification Changes, Detailed Structure Changes, Detailed Data Changes.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

Tuesday, March 19, 2019

Adopt-A-Character Grant to Support Maya Inscriptional Hieroglyphs

Image from Maya site of Yaxchilan, Mexico

The Adopt-a-Character Program is awarding the third in a series of grants to support the encoding of Maya hieroglyphs and their study by researchers. This grant is a continuation of earlier AAC-funded efforts to incorporate information from Maya codices into a multidimensional database. The work in 2019 will focus on hieroglyphs inscribed on monuments, and will fund work to advance the understanding of the corpus of inscriptional hieroglyphs by including this dataset in the multidimensional database developed for the Maya script. This work will further understanding of an appropriate encoding model for these complex hieroglyphs and will also provide support for new research work on the Maya script through the updated database.

The work will be led by Dr. Gabrielle Vail (Research Labs of Archaeology, University of North Carolina, Chapel Hill and Anthropology Program, University of South Florida, St. Petersburg) under the direction of Dr. Deborah Anderson (SEI, UC Berkeley).

The image included in this announcement is text from a lintel from the Maya site of Yaxchilan, Mexico. Photo by Gabrielle Vail.

Over 136,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, March 14, 2019

Emoji 12.0 Now Available for Adoption

The latest Unicode Emoji can now be adopted. See Emoji Recently Added, v12.0 for a full list.

sloth	otter
waffle	ice cube
ringed planet	flamingo

You can adopt one of the new emoji yourself, or for friends, family, and so on. While the new emoji will appear on mobile phones and other devices later this year, you can adopt them right now! Gold level adoptions are special — if you adopt an emoji at the gold level, you are guaranteed to be the only sponsor at that level.

Your sponsorship helps to support the Unicode Consortium’s mission to enable a growing number of languages to be used on computers. The Adopt-a-Character program funds work on digitally disadvantaged languages, both modern and historic. In 2018 and 2019 the program awarded grants to support work on improved keyboard layouts, additional work on Mayan hieroglyphs, and more historic Indic scripts, among others.

You can now also adopt any of the nearly 500 other characters in Unicode 12.0, and of course you can adopt from any of the over 136,000 characters already in Unicode.

For more information on the program, or to adopt a character, see the Adopt-a-Character Page.

Over 136,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Friday, November 22, 2019

Thursday, November 21, 2019

Tuesday, November 19, 2019

About the Unicode Consortium

Monday, October 28, 2019

Monday, October 21, 2019

Wednesday, October 9, 2019

Friday, October 4, 2019

Monday, September 30, 2019

Thursday, September 19, 2019

Friday, August 30, 2019

Don’t Believe a Word: Multilingual Typographic Systems and a 100-year Publishing Project

See What’s Happening At IUC 43

Adopt-a-Character

Wednesday, July 17, 2019

Tuesday, July 16, 2019

Wednesday, June 12, 2019

Monday, June 10, 2019

Monday, June 3, 2019

Monday, May 13, 2019

Tuesday, May 7, 2019

Wednesday, April 17, 2019

Friday, March 29, 2019

Wednesday, March 27, 2019

Tuesday, March 19, 2019

Thursday, March 14, 2019

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog