Tuesday, June 5, 2018

Announcing The Unicode® Standard, Version 11.0

U+10F3D Sogdian Ain 10F3D Version 11.0 of the Unicode Standard is now available, both the core specification and data files. Version 11.0 adds 684 characters, for a total of 137,374 characters. These additions include seven new scripts, for a total of 146 scripts, as well as 145 new emoji.

The new scripts and characters in Version 11.0 add support for lesser-used languages and unique written requirements worldwide, including:
  • Georgian Mtavruli capital letters, newly added to support modern casing practices
  • Hanifi Rohingya, used to write the modern Rohingya language in Southeast Asia
  • Medefaidrin, used for modern liturgical purposes in Africa
  • Mazahua, a Mesoamerican language recognized by law in Mexico
  • Mayan numerals used in printed materials in Central America
  • Historic Sanskrit, Gurmukhi, and the Buryats
  • Five urgently needed CJK unified ideographs: three for chemical names and two for Japan's government administration
Popular symbol additions:
  • Copyleft symbol
  • Half stars for rating systems
  • More astrological symbols
  • Xiangqi Chinese chess symbols
  • New emoji characters including:
🦸 👨🏽‍🦰
🧸 🦞
🧨 🥳

For the full list of emoji characters, see emoji additions for Unicode 11.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji. Version 11.0 also includes other improvements for emoji handling:
  • a mechanism to request the glyph direction for emoji
  • descriptions of the four new emoji hair components
  • descriptions of gender neutral emoji
  • simplified statements of emoji-related rules for grapheme cluster boundaries and for word boundaries.
Three other important Unicode specifications have been updated for Version 11.0:

Unicode 11.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications, often in coordination with changes to character properties. In particular, there are changes to:

The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.

Adopt-a-Character

All the new characters including the new emoji are now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

Wednesday, May 9, 2018

Emoji Draft Candidates for 2019

waffle image 104 proposed Emoji Candidates (60 characters plus variants) have advanced to Draft Candidate status for 2019.  These are the short-listed candidates for Emoji 12.0, which is planned for release in 2019Q1 together with Unicode 12.0.

The draft candidates include the following:

dog image kite image white heart image
Guide dog Kite White heart

See Emoji Candidates for the full list.

That list of draft candidates will be reviewed and finalized this September. Feedback is solicited on short names, keywords, and ordering. See also the Emoji 11.0 charts.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Tuesday, April 17, 2018

Submissions open for 2020 Emoji

stopwatch image The deadline for emoji for 2019 was April 1, so any submissions received after that date are considered for release in 2020.

The submission form has undergone some revision, so please be sure to review the new text before putting together a proposal. There is a limited number of emoji characters considered each year, so be sure to follow the form so that you can provide the best case for any proposed emoji.

The emoji subcommittee has also produced a new page which shows the Emoji Requests submitted so far. You can look at what other people have proposed or suggested. In many cases, people have made suggestions, but have not followed through with complete submission forms, or have submitted forms, but not followed through on requested modifications to the forms.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Friday, April 13, 2018

Last Call on Unicode 11.0 Review

stopwatch image The beta review period for Unicode 11.0 and related technical standards will close on April 23, 2018. This is the last opportunity for technical comments before version 11.0 is released in Q2 2018. Implementers and interested parties are encouraged to download data files, review proposed updates, and submit comments.

Unicode 11.0 adds seven new scripts, including Hanifi Rohingya, 66 additional emoji characters, including four new components for hair color (for a total of 157 emoj sequences). The set of Georgian Mtavruli capital letters has been added to support modern casing practices.
In addition to the Unicode core specification, five Unicode Standard Annexes and two Unicode Technical Standards have significant specification and/or data file updates that are correlated with the new additions for Unicode 11.0.0. Review of those changes is strongly encouraged during the beta review period.

UAX #14, Unicode Line Breaking Algorithm
  • Uses Extended_Pictographic property for future-proofing
UAX #29, Unicode Text Segmentation
  • New support for Indic virama handling
  • Uses Extended_Pictographic property for future-proofing
  • A new table of formal regex definitions
UAX #31, Unicode Identifier and Pattern Syntax
  • Refines the use of ZWJ in identifiers
  • Broadens the definition of hashtag identifiers
UAX #38, Unicode Han Database (Unihan)
  • Five new fields and improved regular expressions.
  • Document extension of Unihan properties to non-Unihan
UAX #44, Unicode Character Database
  • New property Equivalent_Unified_Ideograph
  • New regular expressions Bidi_Paired_Bracket & Equivalent_Unified_Ideograph
  • More discussion of emoji variation sequences
  • Clarification of values allowed for the Age property
UTS #10, Unicode Collation Algorithm
  • Updates data to Unicode 11.0
  • Clarification of search tailoring in visual-order scripts
UTS #39, Unicode Security Mechanisms
  • Updates data to Unicode 11.0
  • Enhances discussions of joining controls & combining sequences
UTS #46, Unicode IDNA Compatibility Processing
  • Updates data to Unicode 11.0
  • Changes the format of the test file for arbitrary input settings
  • Updates input setting for Transitional_Processing
UTS #51, Unicode Emoji
  • Supplies Extended_Pictographic property for future-proofing
  • Simplifies emoji sequence definitions
  • EBNF and Regex expressions for loose matches
  • More proposed guidelines: gender-neutral emoji, skin-tone modifiers, ZWJ visible fallbacks, hair-style components
  • Mechanism for changing the “facing” direction for emoji
Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by April 23, 2018. Feedback instructions are in each public review page. For more information, see the open public review issues.


Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[badge]

Monday, April 9, 2018

Last call on UTS #51 Unicode Emoji

stopwatch image The Unicode Consortium is soliciting feedback on the text and data changes in the proposed update UTS #51 Unicode Emoji. This specification is now synchronized with Unicode Version 11.0, and slated for release at the same time, in early June. Feedback is due by April 23 — this is the last chance to provide feedback on any changes and any open review issues.

The recent changes modify the definition of emoji combining sequences, add a section describing the emoji property stability (including under operations like lowercasing) and a section providing EBNF and Regex expressions for loose matches on emoji in running text, and some clarifications of gender neutral characters.

Note: the emoji characters and properties for Version 11.0 have already been finalized, so this last call is just for the text of the specification, not the emoji characters or properties.