Monday, November 5, 2018

Unicode 12.0 Beta Review

U12 beta image The beta review period for Unicode 12.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 12.0 includes a number of changes and 554 new characters. Some of the Unicode Standard Annexes have modifications for Unicode 12.0, often in coordination with changes to character properties. In particular, there are minor changes to UAX #29, Unicode Text Segmentation, to account for differences in Georgian casing behavior. Four new scripts have been added in Unicode 12.0. There are also 61 additional emoji characters, as well as very significant enhancements to the representation and behavior of multiperson emoji.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by January 7, 2019. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-12.0.0.html for more information about testing the 12.0.0 beta.

See http://unicode.org/versions/Unicode12.0.0/ for the current draft summary of Unicode 12.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Shopify, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Tuesday, October 23, 2018

Draft Candidates for Emoji 12.0 Beta (2019)

Emoji The Emoji 12.0 Beta contains 236 Emoji Draft Candidates, consisting of 61 characters plus 175 sequences. These are slated for release in 2019Q1 together with Unicode Version 12.0.

The emoji are in the following categories: 3 smileys & emotion, 209 people & body, 7 animals & nature, 9 food & drink, 6 travel & places, 3 activities, 15 objects, and 12 miscellaneous symbols. 50 of  the new emoji (including gender/skin-tone variants) are for accessibility, such as ear with hearing aid and woman in manual wheelchair. The hearts, circles, and squares now have the same set of colors for decorative and/or descriptive uses.

Multi-person emoji now have skin-tone variants:

(A) Full Emoji v12.0 support requires that the holding-hands emoji (👫 👬 👫) with specific genders be supported with 55 combinations of mixed skin tones, such as:
  • man with dark skin tone and woman with light skin tone holding hands
  • woman with medium skin tone and woman with medium light skin tone holding hands
  • man with light skin tone and man with light skin tone holding hands
(B) Full Emoji v12.0 support requires that the 6 multi-person emoji (👯️‍  ðŸ¤¼ 🤝 💏 💑 👪) without specific gender be supported with the 5 human skin tones, such as:
  • family (adult+adult+child) with dark skin tone
  • couples with heart (adult+adult) with medium skin tone
  • couples kissing (adult+adult) with light skin tone
A mechanism is provided for mixed skin tones for emoji in group B, such as with a family of man+woman+girl+boy, but support is optional.

The following notes are relevant for implementers:
  1. The 40 holding-hands emoji with mixed skin tones have a simpler internal representation, compared to the previous draft. The 15 with uniform skin tones use a single character plus skin-tone modifiers.
  2. Implementations may optionally support all combinations of mixed skin tones for the 6 multi-person emoji in the B group. This can be a large number — over 4,000 for the family emoji alone — and thus may not be practical for all devices.
  3. Clearer definitions are now provided in the specification, along with a new set for Basic_Emoji. For other details, see the specification.
The complete list of emoji sequences for Emoji 12.0 will be finalized during the next UTC meeting in January 2019. The CLDR English names and keywords for the new emoji characters will be finalized within the next month, and translation into 80+ languages (such as Slavic languages) will begin. Feedback is welcome on the sorting order and the English names and keywords.

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Tuesday, October 16, 2018

ICU 63 Released

ICU LogoUnicode® ICU 63 has just been released. It updates to CLDR 34 locale data with many additions and corrections, and some new languages. ICU adds an API for number and currency range formatting, and an API for additional Unicode properties and for constructing custom properties. CLDR and ICU include data for testing readiness for the upcoming Japanese calendar era.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

For details please see http://site.icu-project.org/download/63.

Monday, October 15, 2018

CLDR Version 34 Language/Locale Data Released

Emoji Version 34 is the latest version of CLDR, the core open-source language data that major software systems use to adapt software to the conventions of over 80 different languages. CLDR data is used by many products for Unicode and language support, including Android, Cloudant, Chrome OS, Db2, iOS, macOS, Windows, and many others.

CLDR 34 included a full Survey Tool data collection phase increasing to 85 languages at the “modern” (full) level, 4 at the lower “moderate” level (suitable for document content), 18 at the basic level, and about 100 others that don’t meet the level requirements.

Among the other changes: new units were added (e.g., atmosphere, petabyte); many new emoji keywords and names were corrected/refined, with updated emoji sort order; and preparations for the New Japanese Era (affecting most software for Japan) were made. The specification was also updated with many changes for Unicode Locale Identifier and BCP 47 Conformance sections, plus defining the syntax of unit identifiers. For other changes, details, and links to documentation, see the CLDR 34 Release Notes.

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Tuesday, October 9, 2018

Unicode Arabic Mark Rendering UTR #53 Now Published

exampleThe combining classes of Arabic combining characters in Unicode are different than combining classes in most other scripts. They are a mixture of special classes for specific marks plus two more generalized classes for all the other marks. This has resulted in inconsistent and/or incorrect rendering for sequences with multiple combining marks since Unicode 2.0.

The Arabic Mark Transient Reordering Algorithm (AMTRA) described in UTR #53 is the recommended solution to achieving correct and consistent rendering of Arabic combining mark sequences. This algorithm provides results that match user expectations and assures that canonically equivalent sequences are rendered identically, independent of the order of the combining marks.

The concepts in this algorithm were first proposed four years ago by Roozbeh Pournader. We are pleased it has now been published as an official Technical Report.

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Monday, October 8, 2018

Unicode Board of Directors Election Results

OrrissCoadyThe Unicode Consortium announces the election of four Directors for three year terms beginning January 2019: Bob Jung, Iris Orriss, Alolita Sharma, and Greg Welch. A fifth candidate, Michele Coady, was elected for a one year term.

Michel Coady and Iris Orriss join the Consortium Board of Directors for the first time. Bob Jung, Alolita Sharma, and Greg Welch have been re-elected to continue their service as Directors.

Michele Coady is a Director of Global Readiness at Microsoft, responsible for the Microsoft Global Readiness policy, which includes driving geopolitical, globalization and internationalization compliance, risk management and awareness company-wide. She has been providing geopolitical support and guidance for the Microsoft Unicode emoji work for several years.

Iris Orriss serves as Director of Internationalization at Facebook. She has been with Facebook since January 2013 and is passionate about eliminating the internet language and cultural barriers. Her work focuses on growing Facebook in international markets. In addition, Iris is member of the board at Translators without Borders, a nonprofit organization that provides vital information in the right language at the right time. Prior to Facebook, Iris was a director at Microsoft working on product internationalization and development process in the enterprise and language technology divisions. She is a native of Germany, speaks four languages, and was educated at Freie Universität Berlin.

The Unicode Consortium would like to thank Dachuan Zhang who will step down in 2019 after four years as a member of the Board of Directors.

For the listing of current directors and officers of the Consortium please see Unicode Directors, Officers and Staff.

Monday, October 1, 2018

New Unicode Technical Director

Ken LundeThe Unicode Consortium would like to welcome a new Technical Director, Dr. Ken Lunde.

Ken Lunde has worked at Adobe since 1991, specializing in CJKV Type Development, meaning that he develops East Asian fonts, along with the specifications on which they are based. He architected the Adobe-branded “Source Han” and Google-branded “Noto CJK” open source Pan-CJK typeface families that were released in 2014 and 2017, is the author of “CJKV Information Processing” Second Edition that was published by O’Reilly Media at the end of 2008, and frequently publishes articles on Adobe’s CJK Type Blog. Ken holds BA, MA, and PhD degrees in linguistics from The University of Wisconsin-Madison. Ken has been Adobe’s representative to Unicode since 2006, has been the primary representative since 2015, serves as the IVD Registrar, participates in the Unicode Editorial Committee, and received the 2018 Unicode Bulldog Award.

For the listing of current directors and officers of the Consortium please see Unicode Directors, Officers and Staff

Friday, September 14, 2018

Unicode CLDR 34 alpha available for testing

The alpha version of Unicode CLDR 34 is available for testing. The alpha period lasts until the beta release on September 26, which will include updates to the LDML spec. The final release is expected on October 10.

CLDR 34 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

CLDR 34 included a full Survey Tool data collection phase. Other enhancements include several changes to prepare for the new Japanese calendar era starting 2019-05-01; updated emoji names, annotations, collation and grouping; and other specific fixes. The draft release page at http://cldr.unicode.org/index/downloads/cldr-34 lists the major features, and has pointers to the newest data and charts. It will be fleshed out over the coming weeks with more details, migration issues, known problems, and so on. Particularly useful for review are:
Please report any problems that you find using a CLDR ticket. We'd also appreciate it if programmatic users of CLDR data download the xml files and do a trial integration to see if any problems arise.

Thursday, September 6, 2018

New Japanese Era

A new era in the Japanese calendar is expected to begin on May 1, 2019, following the announced abdication of Japanese Emperor Akihito. This era will be represented in dates by two names: one consisting of a sequence of two existing kanji and one consisting of a new single Japanese character that combines those two. (Similarly, the current era Heisei can be represented by either “平成” or “㍻”.)

The Japanese calendar system and support for era names is essential for important public sector business functions. Therefore, most software distributed in Japan will need to adopt the new era name and add font support for the new character.

The current Heisei era has been in place since 1989 — during the evolution of modern computer systems. Because of this, most software systems have not been tested for such an event. The exact date of the announcement of the new era name is unknown, but current expectations are that there will be a very narrow window for implementing the new era information in IT environments, perhaps less than a month. Until the announcement, dates in 2019 and beyond will continue to be written with the Heisei era name and its year numbering.

To prepare as well as possible for this unprecedented event, the Unicode Consortium has taken the following actions:

  • The code point U+32FF has been reserved for the new era character.
  • Once the new era name is announced, the Unicode Consortium will quickly issue a dot-release (Version 12.1) that will add that character at the reserved code point, U+32FF, with an appropriate character name, decomposition, and representative glyph.
  • Unicode CLDR and ICU are including test mechanisms in the 2018 October releases of CLDR 34 and ICU 63. Systems that use CLDR or ICU (all smartphones, for example) can test using these mechanisms.
  • Systems and applications that do not use CLDR or ICU will need to take similar steps for testing.
The short time window between the actual announcement and the effective date will present challenges to the IT industry. IT systems in Japan will be expected to have the support in place seamlessly. Because of the narrow timeframe and the need to upgrade or patch legacy software, it is important to start now to determine how soon your application/system can add support to your current implementations, stacks, and dependencies.

Thursday, August 23, 2018

IUC 42: Keynote Speaker Announced

Carlos Pallan Gayol

The Advent of Mayan Script Encoding: Mapping the Last Frontiers of Mayan Hieroglyphic Decipherment

Carlos Pallan Gayol
Archaeologist & Epigrapher, Dept. of Old American Studies & Ethnology, University of Bonn


Mayan hieroglyphs rank among the most visually complex writing systems ever created. Deciphering them has entailed a 200+ year scholarly quest, but this task is not yet completed and posits an inviting challenge for applying new tools from the information-age, culminating in the encoding of the Mayan script. Join us Tuesday morning, September 11th, as this keynote highlights the latest milestones attained in this pursuit by the NcodeX Project, where Carlos Pallan collaborates with Dr. Deborah Anderson, Researcher, Dept. of Linguistics, UC Berkeley, the Script Encoding Initiative and members of the Unicode advisory board. Stemming from research funded by Unicode’s Adopt-a-Character Program, it has been possible to produce new database tools and advanced functionalities, capable of mapping and analyzing all the textual contents of the extant Mayan books or Codices by relying on a novel catalog of Mayan signs with assigned code points.

See What’s Happening At IUC 42

For over 27 years the Internationalization & Unicode® Conference (IUC) has been the preeminent event highlighting the latest innovations and best practices of global and multilingual software providers. Join us in Santa Clara to promote your ideas and experiences working with natural languages, multicultural user interfaces, producing and supporting multinational and multilingual products, linguistic algorithms, applying internationalization across mobile and social media platforms, or advancements in relevant standards.

Join expert practitioners and industry leaders as they present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Thursday, August 9, 2018

More Emoji Draft Candidates for 2019

Couples Image There are now 179 proposed Emoji Draft Candidates (61 characters plus variants) for 2019. These are the short-listed candidates for Emoji 12.0, which is planned for release in 2019Q1 together with Unicode 12.0.

The following changes were made in the recent Unicode Technical Committee (UTC) meeting:
  1. Added a candidate emoji for deaf person
  2. Changed service animal vest to safety vest, and added a candidate emoji sequence using it: service dog
  3. Added candidate emoji sequences for couple holding hands, with 55 combinations of skin tone and gender
  4. Changed names and ordering for various characters
The list of draft candidates will be reviewed and finalized in the next UTC meeting, this coming September. Feedback is solicited on short names, keywords, and ordering. See also the Emoji 11.0 charts.

Eight Emoji Provisional Candidates for 2020 were also added (ninja, military helmet, mammoth, feather, dodo, magic wand, carpentry saw, screwdriver). For example:

􁌂
􁌅
ninja
magic wand

Between now and March 2019, these and other Provisional Candidates will be collected. The Unicode emoji subcommittee will then assess the whole set, and make recommendations to the UTC for which emoji to advance to Draft Candidate status for 2020.

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Wednesday, July 18, 2018

ICU moves to GitHub and Jira

ICU LogoInternational Components for Unicode (ICU) is a mature, widely used set of C/C++ and Java libraries providing Unicode and Globalization support for software applications. ICU is widely portable and gives applications the same results on all platforms and between C/C++ and Java software.

As of this week, ICU has moved from a self-hosted source code and bug tracking environment, to git on GitHub and Jira on Atlassian Cloud, respectively. Pull requests are welcome, as are bug reports on the new issue tracking system.

For more information, please see the following links:

ICU Repository Access: http://site.icu-project.org/repository
ICU Bug Tracking: http://site.icu-project.org/bugs

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Friday, July 13, 2018

Unicode 11.0 Paperback Available

Unicode 11.0 copies The Unicode 11.0 core specification is now available in paperback book form with a new, original cover design. This edition consists of a pair of modestly priced print-on-demand volumes containing the complete text of the core specification of Version 11.0 of the Unicode Standard.

Each of the two volumes is a compact 6×9 inch US trade paperback size. The two volumes may be purchased separately or together, although they are intended as a set. The cost for the pair is US $16.58, plus postage and applicable taxes. Please visit the description page to order.

Note that these volumes do not include the Version 11.0 code charts, nor do they include the Version 11.0 Standard Annexes and Unicode Character Database, which are all freely available on the Unicode website.

Purchase The Unicode Standard, Version 11.0 - Core Specification

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Thursday, July 5, 2018

Unicode Consortium Announces Version 11.0 and Version 12.0 Cover Designs

The Unicode Consortium is pleased to announce the design selected for the cover of the forthcoming print-on-demand publication of The Unicode Standard, Version 11.0. The Unicode Consortium issued an open call for artists and designers to submit cover design proposals. An independent panel reviewed all submitted designs. Because of the accelerated release schedule for Version 12.0 (March 2019), the design for the print-on-demand publication of The Unicode Standard, Version 12.0 was also selected at this time.

Unicode 11.0 Books
The cover for Version 11.0 is an original design by Joyce S. Lee, a graduate student in the UC Berkeley School of Information. Her artwork was inspired by the well-known early 20th-century Bauhaus design school. She explains, “I see numerous parallels between the Bauhaus and the Unicode Consortium, including an intersection of workmanship and technological reproduction, a spirit of collaboration, as well as a widespread cultural influence. With this Bauhaus inspired cover, I thus aim to represent the Unicode Standard as a form of instructional reference for technologists around the world.”

[cover art by Monica Tang]
Cover artwork for Version 12.0 was created by Monica Tang, a computer science student at UC Berkeley. Her design was inspired by the simplicity of the geometric shapes that comprise the diversity of characters and symbols represented in the Unicode Standard. She notes, “Incorporating a variety of shapes and colors into a patterned design, I seek to convey the sheer breadth of the languages covered in the Unicode Standard as well as a sense of commonality.”

Runner-up designs by Feixiong “Hasutai” Liu and Maurice Meilleur were also selected. Hasutai is the founder and chief designer of Sir Sebsihiyan Sibe-Manchu Culture Center. Maurice Meilleur is Assistant Professor of Graphic Design at Appalachian State University.

Hasutai:
[art by Hasutai]
Maurice Meilleur:
[art by Maurice Meilleur]

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Wednesday, June 27, 2018

New Gold Sponsor dotFM .FM TLD

The Unicode Consortium is pleased to announce that dotFM .FM TLD is now a gold sponsor for:

dotFM .FM TLD's  sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage.

BRS Media’s dotFM is pleased to sponsor Adopt a Character. This year, dotFM launched Emoji Domains within the .FM Top-Level Domain. Emoji domain is a domain name with an expressive digital image or icon in it. dotFM pioneered the ‘multimedia’ domain space since launching the .FM Top Level Domains in 1998. Today, the comprehensive portfolio of registrants not only includes broadcasters, Internet radio and the music community, but also interactive companies, premier social media ventures and podcast entrepreneurs worldwide.  — dotFM .FM TLD

The Unicode Consortium thanks dotFM .FM TLD for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 140,000 other characters are available for adoption — see Adopt a Character

Wednesday, June 20, 2018

ICU 62 Released

ICU LogoUnicode® ICU 62 has just been released. It upgrades to Unicode 11 and to CLDR 33.1 locale data. A new syntax for locale-neutral number skeleton strings can be used in MessageFormat for more control over number formatting. Several still-draft NumberFormatter methods and helper classes have been modified or renamed. In C++, DecimalFormat wraps the new NumberFormatter code, and there is a new implementation for number parsing.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

For details please see http://site.icu-project.org/download/62

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

CLDR Version 33.1 Language/Locale Data Released for Unicode 11.0

Emoji Unicode CLDR 33.1 adds support for the recently released Unicode 11.0. Version 33.1 is the latest version of CLDR, the core open-source language data that major software systems use to adapt software to the conventions of over 80 different languages. The open-source Unicode ICU library incorporates the CLDR Version 33.1 data as part of its update to Unicode 11.0 in its ICU 62 release. ICU code is used by many products for Unicode and language support, including Android, Cloudant, ChromeOS, Db2, iOS, macOS, Windows, and many others.

The CLDR 33.1 release focuses on updates for Unicode 11.0: new names and keywords for the Unicode 11.0 emoji, Chinese collation stroke order, and script metadata. In addition, there are major improvements for names and annotations for the pre-11.0 emoji in CLDR languages. More extensive updates are planned for CLDR 34 (release expected in early October), with data submission still continuing.

For further details and links to documentation, see the CLDR 33.1 Release Notes.

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Tuesday, June 5, 2018

Announcing The Unicode® Standard, Version 11.0

U+10F3D Sogdian Ain 10F3D Version 11.0 of the Unicode Standard is now available, both the core specification and data files. Version 11.0 adds 684 characters, for a total of 137,374 characters. These additions include seven new scripts, for a total of 146 scripts, as well as 145 new emoji.

The new scripts and characters in Version 11.0 add support for lesser-used languages and unique written requirements worldwide, including:
  • Georgian Mtavruli capital letters, newly added to support modern casing practices
  • Hanifi Rohingya, used to write the modern Rohingya language in Southeast Asia
  • Medefaidrin, used for modern liturgical purposes in Africa
  • Mazahua, a Mesoamerican language recognized by law in Mexico
  • Mayan numerals used in printed materials in Central America
  • Historic Sanskrit, Gurmukhi, and the Buryats
  • Five urgently needed CJK unified ideographs: three for chemical names and two for Japan's government administration
Popular symbol additions:
  • Copyleft symbol
  • Half stars for rating systems
  • More astrological symbols
  • Xiangqi Chinese chess symbols
  • New emoji characters including:
🦸 👨🏽‍🦰
🧸 🦞
🧨 🥳

For the full list of emoji characters, see emoji additions for Unicode 11.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji. Version 11.0 also includes other improvements for emoji handling:
  • a mechanism to request the glyph direction for emoji
  • descriptions of the four new emoji hair components
  • descriptions of gender neutral emoji
  • simplified statements of emoji-related rules for grapheme cluster boundaries and for word boundaries.
Three other important Unicode specifications have been updated for Version 11.0:

Unicode 11.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications, often in coordination with changes to character properties. In particular, there are changes to:

The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.

Adopt-a-Character

All the new characters including the new emoji are now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

Wednesday, May 9, 2018

Emoji Draft Candidates for 2019

waffle image 104 proposed Emoji Candidates (60 characters plus variants) have advanced to Draft Candidate status for 2019.  These are the short-listed candidates for Emoji 12.0, which is planned for release in 2019Q1 together with Unicode 12.0.

The draft candidates include the following:

dog image kite image white heart image
Guide dog Kite White heart

See Emoji Candidates for the full list.

That list of draft candidates will be reviewed and finalized this September. Feedback is solicited on short names, keywords, and ordering. See also the Emoji 11.0 charts.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Tuesday, April 17, 2018

Submissions open for 2020 Emoji

stopwatch image The deadline for emoji for 2019 was April 1, so any submissions received after that date are considered for release in 2020.

The submission form has undergone some revision, so please be sure to review the new text before putting together a proposal. There is a limited number of emoji characters considered each year, so be sure to follow the form so that you can provide the best case for any proposed emoji.

The emoji subcommittee has also produced a new page which shows the Emoji Requests submitted so far. You can look at what other people have proposed or suggested. In many cases, people have made suggestions, but have not followed through with complete submission forms, or have submitted forms, but not followed through on requested modifications to the forms.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Friday, April 13, 2018

Last Call on Unicode 11.0 Review

stopwatch image The beta review period for Unicode 11.0 and related technical standards will close on April 23, 2018. This is the last opportunity for technical comments before version 11.0 is released in Q2 2018. Implementers and interested parties are encouraged to download data files, review proposed updates, and submit comments.

Unicode 11.0 adds seven new scripts, including Hanifi Rohingya, 66 additional emoji characters, including four new components for hair color (for a total of 157 emoj sequences). The set of Georgian Mtavruli capital letters has been added to support modern casing practices.
In addition to the Unicode core specification, five Unicode Standard Annexes and two Unicode Technical Standards have significant specification and/or data file updates that are correlated with the new additions for Unicode 11.0.0. Review of those changes is strongly encouraged during the beta review period.

UAX #14, Unicode Line Breaking Algorithm
  • Uses Extended_Pictographic property for future-proofing
UAX #29, Unicode Text Segmentation
  • New support for Indic virama handling
  • Uses Extended_Pictographic property for future-proofing
  • A new table of formal regex definitions
UAX #31, Unicode Identifier and Pattern Syntax
  • Refines the use of ZWJ in identifiers
  • Broadens the definition of hashtag identifiers
UAX #38, Unicode Han Database (Unihan)
  • Five new fields and improved regular expressions.
  • Document extension of Unihan properties to non-Unihan
UAX #44, Unicode Character Database
  • New property Equivalent_Unified_Ideograph
  • New regular expressions Bidi_Paired_Bracket & Equivalent_Unified_Ideograph
  • More discussion of emoji variation sequences
  • Clarification of values allowed for the Age property
UTS #10, Unicode Collation Algorithm
  • Updates data to Unicode 11.0
  • Clarification of search tailoring in visual-order scripts
UTS #39, Unicode Security Mechanisms
  • Updates data to Unicode 11.0
  • Enhances discussions of joining controls & combining sequences
UTS #46, Unicode IDNA Compatibility Processing
  • Updates data to Unicode 11.0
  • Changes the format of the test file for arbitrary input settings
  • Updates input setting for Transitional_Processing
UTS #51, Unicode Emoji
  • Supplies Extended_Pictographic property for future-proofing
  • Simplifies emoji sequence definitions
  • EBNF and Regex expressions for loose matches
  • More proposed guidelines: gender-neutral emoji, skin-tone modifiers, ZWJ visible fallbacks, hair-style components
  • Mechanism for changing the “facing” direction for emoji
Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 23, 2018. Feedback instructions are in each public review page. For more information, see the open public review issues.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Monday, April 9, 2018

Last call on UTS #51 Unicode Emoji

stopwatch image The Unicode Consortium is soliciting feedback on the text and data changes in the proposed update UTS #51 Unicode Emoji. This specification is now synchronized with Unicode Version 11.0, and slated for release at the same time, in early June. Feedback is due by April 23 — this is the last chance to provide feedback on any changes and any open review issues.

The recent changes modify the definition of emoji combining sequences, add a section describing the emoji property stability (including under operations like lowercasing) and a section providing EBNF and Regex expressions for loose matches on emoji in running text, and some clarifications of gender neutral characters.

Note: the emoji characters and properties for Version 11.0 have already been finalized, so this last call is just for the text of the specification, not the emoji characters or properties.

Tuesday, April 3, 2018

Updating Three Specifications Synchronized with Unicode Version 11.0

stopwatch image The Unicode Consortium is soliciting feedback on the text and data changes in the following proposed update specifications. These specifications are synchronized with Unicode Version 11.0, and slated for release at the same time, in early June. Feedback is due by April 23 — this is the last chance to provide feedback on any changes and any open review issues.

UTS #39, Unicode Security Mechanisms updates data for Unicode 11.0, adds a new section describing the handling of Joining Controls (ZWJ and ZWNJ), and adds tests to Section Section 5.4 Optional Detection for checking nonspacing marks and sequences.

UTS #46 Unicode IDNA Compatibility Processing updates data for Unicode 11.0, and extends  the format of the test data file. The new test format allows implementations to determine more precisely where any validity test fails, and allows the implementation to filter for the exact combination of supported features.

UTS #10 Unicode Collation Algorithm updates data for Unicode 11.0, and otherwise makes no material changes to the text.

Details of the Unicode 11.0 Beta and open Public Review Issues are available on the Unicode website.

Friday, March 30, 2018

ICU 61 Released

ICU LogoUnicode® ICU 61 has just been released. This version upgrades to CLDR 33, has a new Java implementation for number and currency parsing, and includes many small API additions, improvements, and bug fixes.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

For details please see http://site.icu-project.org/download/61

Wednesday, March 28, 2018

CLDR Version 33 Released

Bold image Unicode CLDR 33 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

This release had a limited submission phase. The focus was on improvements to emoji keywords and to the Odia and Assamese locales, addition of typographic names data, and improvements to the structure for specifying keyboard layouts. Improvements include:
  • Structure
    • New structure for typographicNames translations (such as terms for Bold, Italic, ...), with data for 33 locales.
    • The structure for specifying keyboard layouts was significantly enhanced, with many new elements and attributes, and expanded syntax for some preëxisting attribute values.
  • Additional Translations/Data
    • Annotations (emoji keywords) for a limited set of locales had a full review (ar, en_GB, de, es, ja, ru).
    • Two additional locales (Odia, Assamese) were brought up to Modern coverage level; some missing items were added in other locales.
    • Added 4 new transforms, and number spellout rules for 6 additional languages.
  • Property files
    • The emoji property data file ExtendedPictographic.txt has been removed from CLDR data, since the contents are now part of the UTS #51 “Unicode Emoji” data.
    • labels.txt was added for emoji categories and subcategories. 
For further details and links to documentation, see the CLDR Release Notes.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Shopify, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html

For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.

Wednesday, March 14, 2018

Unicode 11.0 Beta Review

U11 beta image The beta review period for Unicode 11.0 has started. The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 11.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 11.0, often in coordination with changes to character properties. In particular, there are major changes to UAX #29, Unicode Text Segmentation. Seven new scripts have been added in Unicode 11.0, including Hanifi Rohingya. A major adjustment has been made to the Georgian script, with the introduction of uppercase Georgian letters. There are also 66 additional emoji characters.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 23, 2018. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-11.0.0.html for more information about testing the 11.0.0 beta.

See http://unicode.org/versions/Unicode11.0.0/ for the current draft summary of Unicode 11.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, Emojipedia, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, SAP, Shopify, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Wednesday, March 7, 2018

Call for Unicode 11.0 and 12.0 Cover Design Art

book cover The Unicode Consortium is inviting artists and designers to submit cover design proposals for Versions 11.0 and 12.0 of The Unicode Standard. This call is being issued simultaneously for the next two versions of the standard, scheduled for publications in 2018 and 2019, respectively.

The two selected cover designs will appear on the Unicode Standard 11.0 and 12.0 web pages, in the print-on-demand publications, and in associated promotional literature on the Unicode website. The two artists whose designs are selected for the covers will receive full credit in the colophon of the publication for which the art is used, and wherever else the design appears, and will each receive $700. Two selected runner-up artists will receive $150 apiece.

Please see the announcement page for requirements and more details.