Wednesday, August 16, 2017

Unicode Emoji 6.0 initial drafts / Draft Candidate chart updated

AAC imageEmoji 6.0 is starting development, and initial drafts of the specification and data files are available. In the specification and data, a new property is added that helps to “future-proof” segmentation for emoji. The specification also contains more proposed guidelines: for gender-neutral emoji, the application of skin-tone modifiers, and others.

There are two types of emoji: characters and sequences. While these appear and behave similarly for users, they are released on different time schedules.
  • Emoji characters at Draft Candidate status are targeted at Unicode 11.0 (due in June 2018). These characters are “short-listed”. The Emoji Candidates chart has been updated with these characters, and feedback is solicited on names, keywords, and ordering. They will be reviewed at the October UTC meeting and are on track for Final Candidate status.
  • Emoji sequences may be released as a part of Emoji 6.0. The exact content and release schedule of Emoji 6.0 has yet to be determined: it could appear earlier than Unicode 11.0. The Proposals for new sequences for Emoji 6.0 were presented in L2/17-287 and will be reviewed in the October UTC meeting. Other proposals may be considered at that meeting.
Over 100,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[salad badge]

Tuesday, August 15, 2017

PRI 354: Registration of additional sequences in the Moji_Joho collection

IVD imageThe Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #354: A submission for the "Registration of additional sequences in the Moji_Joho collection" has been received by the IVD registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2017-11-17. Please see the submission page for details and instructions on how to review this issue and provide comments:

http://www.unicode.org/ivd/pri/pri354/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for Ideographs, which enables standardized interchange in plain text, in accordance with UTS #37, Unicode Ideographic Variation Database.

Monday, July 31, 2017

Gold Sponsor Oakland Athletics Baseball

The Unicode Consortium is pleased to announce that Oakland Athletics Baseball is now a gold sponsor for:





Oakland Athletics Baseball's  sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage. 
The team adopted the three characters as visual representation for Oakland Athletics Baseball. The organization's nine World Series titles and 15 American League Pennants make the Athletics one of the most storied clubs in Major League Baseball. The Athletics take great pride in the achievements of the past, and view them as a challenge to push further. The Oakland Athletics are committed to creating winning experiences that encompass the many aspects of the game and the Oakland community. — Oakland Athletics Baseball
The Unicode Consortium thanks Oakland Athletics Baseball for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character

Monday, July 24, 2017

Gold Sponsor White Unicorn Agency

The Unicode Consortium is pleased to announce that White Unicorn Agency is now a gold sponsor for:


White Unicorn Agency's  sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage. 
White Unicorn Agency is a dynamic, full-service creative agency based in the Design District of Dallas, Texas. We’re a diverse team of dreamers who build remarkable brands with design, technology, and a touch of magic. We value process and function as much as beauty and creativity, and summon them all to help bring your ideas to life. For us, the creative process seamlessly unites strategy and imagination – producing insightful, innovative business solutions delivered in new and interesting ways. We work across a variety of mediums to create captivating experiences that convey our client’s message effectively. We're proud to be the official sponsor of the Unicorn Emoji and support the Unicode Consortium. White Unicorn Agency

The Unicode Consortium thanks White Unicorn Agency for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character

Thursday, July 20, 2017

Gold Sponsor JMP Software

The Unicode Consortium is pleased to announce that JMP is now a gold sponsor for:


JMP's  sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage. 

Scientific advances, engineering breakthroughs and statistical discoveries: Clarity comes to those who explore data visually and interactively with statistical software from JMP. That’s why we adopted the lightbulb — to represent the aha moments scientists, engineers and other data explorers experience. From its beginning, JMP has connected statistics with data visualization for data analysis, later adding design of experiments, predictive modeling, and quality, reliability, and consumer research analysis. Unicode allows us to focus on statistical discovery rather than on character set differences among the seven languages our software supports or on multiple operating systems. That’s another reason we decided to support Unicode’s work, as explained in this post. — JMP
The Unicode Consortium thanks JMP for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character

Tuesday, July 18, 2017

Gold Sponsor discourse.org

The Unicode Consortium is pleased to announce that discourse.org is now a gold sponsor for:


discourse.org's  sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage. 

Discourse is the 100% open source discussion platform built for the next decade of the Internet. It works as a mailing list, discussion forum, long-form chat room, and more! Install it yourself, or try our managed hosting service. As a team discussion platform, emoji (and Unicode) are essential to the Discourse mission. Thanks to the efforts of the greater community, Discourse has already been translated into 87 languages and counting. We’re thrilled to support the Unicode Consortium’s mission of making all software available in every language. — discourse.org
The Unicode Consortium thanks discourse.org for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character

Gold Sponsor dtSearch Corp.

The Unicode Consortium is pleased to announce that dtSearch Corp. is now a gold sponsor for:


dtSearch Corp.'s  sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage. 
dtSearch Corp. appreciates the critical role that the Unicode Standard has played in making search possible across so many of the world's languages. The recent Adopt-a-Character grants to support encoding Mayan Script and Egyptian Hieroglyphs demonstrate how the Unicode Consortium's continuing efforts further the preservation and sharing of human knowledge. — dtSearch Corp.
The Unicode Consortium thanks dtSearch Corp. for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character

Adopt-A-Character Grant to Support Three Historic Scripts

AAC imageThe Adopt-a-Character Program has awarded a grant to support further development of the following three historic scripts in the Unicode Standard:
  • Dhives Akuru, a Brahmi-based script formerly used to write the Maldivian language in the Maldive islands
  • Elymaic, an Aramaic-based script formerly used in the region southeast of the Tigris river in Iran
  • Khwarezmian, a script formerly used in the northern part of Uzbekistan and the adjacent areas of Turkmenistan and Kazakhstan
This grant will fund the development of proposals for encoding scripts that can be included in the Unicode Standard. The work will be done by Anshuman Pandey under the direction of Deborah Anderson (SEI, UC Berkeley) and Rick McGowan (Unicode Consortium).

Over 100,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

[infinity badge]

Wednesday, July 5, 2017

Gold Sponsor MediaLab inc.

The Unicode Consortium is pleased to announce that MediaLab, Inc. is now a gold sponsor for:


MediaLab, Inc.'s  sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage. 

The Unicode Consortium thanks MediaLab, Inc. for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character.   

Tuesday, June 27, 2017

Gold Sponsor Elastic

The Unicode Consortium is pleased to announce that Elastic is now a gold sponsor for:

Elastic's  sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage. 

Elastic builds software to make data usable in real time and at scale for search, logging, security, and analytics use cases. Founded in 2012, Elastic develops the open source Elastic Stack (Elasticsearch, Kibana, Beats, and Logstash), X-Pack (commercial features), and Elastic Cloud (a SaaS offering). When Elastic Founder and CEO Shay Banon found out about the Unicode adoption program, he had a cool idea: why not allow every engineer at Elastic (as well as other teammates within the company) to choose and adopt a character? Check out this blog— Elastic


The Unicode Consortium thanks Elastic for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character.   

Monday, June 26, 2017

Gold Sponsor Avocados from Mexico

The Unicode Consortium is pleased to announce that Avocados from Mexico is now a gold sponsor for:

Avocados from Mexico’s sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage.

Avocados From Mexico are healthy, always in season and a delicious way to elevate go-to dishes into a nutritious meal. They provide naturally good fats, nearly 20 vitamins and minerals,  are cholesterol- and sodium-free, making this fresh fruit a heart-healthy fruit. You can find more information and recipe ideas at AvocadosFromMexico.com
Avocados from Mexico

The Unicode Consortium thanks Avocados from Mexico for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character.   

Tuesday, June 20, 2017

Announcing The Unicode® Standard, Version 10.0

Soyombo 11A9EVersion 10.0 of the Unicode Standard is now available. For the first time, both the core specification and the data files are available on the same date. Version 10.0 adds 8,518 characters, for a total of 136,690 characters. These additions include four new scripts, for a total of 139 scripts, as well as 56 new emoji characters.

The new scripts and characters in Version 10.0 add support for lesser-used languages and unique written requirements worldwide, including:
  • Masaram Gondi, used to write Gondi in Central and Southeast India
  • Nüshu,used by women in China to write poetry and other discourses until the late twentieth century
  • Soyombo and Zanabazar Square, used in historic Buddhist texts to write Sanskrit, Tibetan, and Mongolian
  • Syriac letters used for writing Suriyani Malayalam, also known as Garshuni and as Syriac Malayalam
  • Gujarati signs used for the transliteration of the Arabic script into Gujarati by Ismaili Khoja communities
  • A set of 285 Hentaigana characters used in Japan (historic variants of Hiragana characters)
  • CJK Extension F (7,473 Han characters)
Among important symbol additions are:
  • Bitcoin sign
  • A set of Typicon marks and symbols
  • 56 emoji characters including:
🧙  mage 🥦  coconut
 fairy 🥦  broccoli
🧛  vampire 🥪  sandwich

For the full list of emoji characters, see emoji additions for Unicode 10.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.

Three other important Unicode specifications have been updated for Version 10.0:

Unicode 10.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 10.0, often in coordination with changes to character properties. In particular, there are changes to UAX #14, Unicode Line Breaking Algorithm, UAX #29, Unicode Text Segmentation, and UAX #31, Unicode Identifier and Pattern Syntax. In addition, UAX #50, Unicode Vertical Text Layout, has been newly incorporated as a part of the standard.

The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.

Adopt-a-Character

All the additional 8,518 characters including 239 new emoji are now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

[emoji image]

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Monday, June 19, 2017

Gold Sponsor CSRA

The Unicode Consortium is pleased to announce that CSRA is now a gold sponsor for:

sponsor

CSRA’s sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage.

CSRA is a leading provider of next-generation technology to its public-sector customers. The company’s sponsorship of the U.S. flag emoji is symbolic of the nexus between its IT services and its customers, as featured in an article by NextGov.

The Unicode Consortium thanks CSRA for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character.

Wednesday, June 14, 2017

Gold Sponsor ☮.com

The Unicode Consortium is pleased to announce that ☮.com is now a gold sponsor for:

☮

☮.com’s sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage.

☮.com proudly supports Unicode's efforts because those efforts promote wider and clearer communication to prevent misunderstandings that can cause conflict, violence, and suffering anywhere around the world.

The Unicode Consortium thanks ☮.com for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character.

Tuesday, June 13, 2017

Feedback on Draft Additional Repertoire for Amendments to ISO/IEC 10646:2017 (5th edition)

chart image The Unicode Technical Committee is soliciting feedback on pending additions to the draft repertoire of characters, to help discover any errors in character names, incorrect glyphs, or other problems. There is a short window of opportunity to review and comment on the repertoire additions noted below.

Additional repertoire for two amendments to ISO/IEC 10646:2017 (5th Edition) is under review. See the associated repertoire in: Feedback on draft additional repertoire for Amendment 1.3 (PDAM) to ISO/IEC 10646:2017 (5th edition) and Feedback on draft additional repertoire for Amendment 2 (PDAM) to ISO/IEC 10646:2017 (5th edition).

Review of the Amendment 1.3 draft repertoire is especially urgent, as that content will be finalized by SC2 in September, and is scheduled for eventual publication in next year's Unicode 11.0. Note that the hentaigana and emoji portions of the amendment have already been accelerated for imminent publication in Unicode 10.0, so further comments on character names for those portions of the repertoire are no longer actionable.

There is more time to provide feedback on the Amendment 2 draft repertoire, but note that the addition of Mtavruli Georgian as part of that repertoire is also rather urgent.

The Unicode Standard is developed in synchrony with ISO/IEC 10646. After ISO balloting is completed on any repertoire additions, no further changes or corrections will be possible. (See the FAQ Standards Developing Organizations for additional information on the stages in ISO standards development.) Advance feedback on these repertoire additions will help inform the UTC discussions about its own contribution to the ISO balloting process.

Documents referenced in the draft repertoire with numbers such as L2/15-088 are available in the UTC Document Registry.

For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.

Monday, May 22, 2017

Unicode Emoji submission deadline now July 1

The next emoji character submission deadline has been moved up to July 1, 2017 to accommodate upcoming changes in the release schedule for Unicode versions. Emoji character proposals submitted before July 1 are eligible to be considered for the 2018 version of Unicode, those submitted after that date will be considered earliest for the 2019 version.

The change in deadline only affects proposals for new emoji characters; proposals that don’t involve new characters — such as for new ZWJ sequences or subdivision flags — are unaffected by the change in deadline.

The annual Unicode Standard release is being shifted from June to early March to to better align with product development schedules across the industry, especially for mobile products. This shift will not fully take effect until 2019, but in preparation for this change the submission date for emoji character proposals is being adjusted now.

The 239 new emoji are also now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

[emoji 23f3 image]

Thursday, May 18, 2017

Unicode Emoji 5.0 specification now final

The new Emoji 5.0 set was finalized in March 2017, making it available for vendors to begin working on their emoji fonts and code ahead of the release of Unicode 10.0, scheduled for June 2017.

The Emoji 5.0 specification is now final as well. The specification has become a technical standard, adding conformance clauses and enhanced syntax definitions. A general mechanism for emoji tag sequences has been added, initially used for country subdivisions such as Scotland. The Emoji_Component property has been added, for filtering out characters from keyboard palettes. The design and usage guidelines have also been enhanced.

The 239 new emoji are also now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

[emoji 1f92b image]

Wednesday, April 26, 2017

Last Call on Unicode 10.0 Beta Review

U10 beta image The beta review period for Unicode 10.0 and related technical standards will close on May 1, 2017. This is the last opportunity for technical comments before version 10.0 is released in Q2 2017. Implementers and interested parties are encouraged to download data files, review proposed updates, and submit comments soon.

In addition to the Unicode Standard proper, three other Unicode Technical Standards have significant text and data file updates that are correlated with the new additions for Unicode 10.0.0. Review of that text and data is also encouraged during the beta review period.

UTS #10, Unicode Collation Algorithm Data files
UTS #39, Unicode Security Mechanisms Data files
UTS #46, Unicode IDNA Compatibility Processing Data files

Additional documents are available for public review and will be discussed at the May UTC meeting, such as the final Emoji 5.0 text, and a proposed Unicode character property. For more information, see the open public review issues and the UTC document registry.

The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 10.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 10.0, often in coordination with changes to character properties. In particular, there are changes to UAX #14, Unicode Line Breaking Algorithm, UAX #29, Unicode Text Segmentation, and UAX #31, Unicode Identifier and Pattern Syntax. In addition, UAX #50, Unicode Vertical Text Layout, has been newly incorporated as a part of the standard. Four new scripts have been added in Unicode 10.0, including Nüshu. There are also 56 additional emoji characters, a major new extension of CJK ideographs, and 285 hentaigana, important historic variants for Hiragana syllables.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 1, 2017. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-10.0.0.html for more information about testing the 10.0.0 beta.

See http://unicode.org/versions/Unicode10.0.0/ for the current draft summary of Unicode 10.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Monday, April 17, 2017

ICU 59 Released

ICU LogoUnicode® ICU 59 has just been released! ICU is the main avenue for many software products and libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

ICU 59 upgrades to CLDR 31 and to emoji 5.0 data, together with segmentation and bidi updates from Unicode 10 beta. The Java code for number formatting has been completely rewritten for reliability and performance. There is also a new case mapping API for styled text, and a technology preview of enhanced language matching.

There are major changes for ICU4C that will make ICU easier to use but require changes in projects using ICU: C++11, char16_t, UTF-8 source files.

For details please see http://site.icu-project.org/download/59

Thursday, April 13, 2017

Call for Unicode 10.0 Cover Design Art

 [cover1] The Unicode Consortium is inviting artists and designers to submit cover design proposals for Version 10.0 of The Unicode Standard.

The cover design will appear on the Unicode Standard 10.0 web page, in the print-on-demand publication, and in associated promotional literature on the Unicode website. The chosen artist will receive full credit in the colophon of the publication, and wherever else the design appears, and receive $700. The two runner-up artists will receive $150 apiece.

Please see the announcement web page for requirements and more details.

Friday, April 7, 2017

PRI #351: Combined registration of the KRName collection and of sequences in that collection

PRI 351 The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #351: A submission for the “Combined registration of the KRName collection and of sequences in that collection” has been received by the IVD Registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2017-07-07. Please see the submission page for details and instructions on how to review this issue and provide comments:

http://www.unicode.org/ivd/pri/pri351/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for Ideographs, which enables standardized interchange in plain text, in accordance with UTS #37, Unicode Ideographic Variation Database.

Monday, March 27, 2017

Unicode Emoji 5.0 characters now final


Fifty-six new emoji characters are in the just released Emoji 5.0 data, including such characters as:

shushing face mage
flying saucerpie
T-Rexbroccoli*
* for healthy eaters!

The new Emoji 5.0 set is fixed, and available for vendors to begin working on their emoji fonts and code ahead of the release of Unicode 10.0, scheduled for June 2017.

The majority of these new emoji characters are the 34 Smileys & People, with 13 new Food & Drink, followed up by 6 Animals & Nature and a few others.

There are an additional 180 emoji sequences for gender and skin-tone in Smileys & People — such as woman in lotus position: medium skin tone — and new regional flags for England, Scotland, and Wales. This makes a total of 239 new emoji (characters and sequences). For a full list, see Emoji Recently Added.

The emoji charts have been updated to show the new characters and sequences. The draft Emoji 5.0 specification will be finalized in the May UTC meeting, and is still available for comment.
The 239 new emoji are also now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

Adopt a Character

Monday, March 20, 2017

CLDR Version 31 Released

CLDR CoverageUnicode CLDR 31 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Some of the improvements in the release are:
  • Canonical codes
    • The subdivision codes have been changed to all have the bcp47 format.
    • The locales in the language-territory population data are in canonical format.
    • The timezone ID for GMT has been split from UTC.
    • There is a mechanism for identifying hybrid locales, such as Hinglish.
  • Emoji 5.0
    • Short names and keywords have been updated for English. (Data for other languages to be gathered in the next cycle).
    • Collation (sorting) adds the new 5.0 Emoji characters and sequences, and some fixes for Emoji 4.0 characters and sequences.
    • For Emoji usage, subdivision names for Scotland, Wales, and England have been added for 65 languages.
For further details and links to documentation, see the CLDR Release Notes.

Thursday, March 9, 2017

Unicode 10.0 Beta Review

U10 beta image The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 10.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 10.0, often in coordination with changes to character properties. In particular, there are changes to UAX #14, Unicode Line Breaking Algorithm, UAX #29, Unicode Text Segmentation, and UAX #31, Unicode Identifier and Pattern Syntax. In addition, UAX #50, Unicode Vertical Text Layout, has been newly incorporated as a part of the standard. Four new scripts have been added in Unicode 10.0, including Nüshu. There are also 56 additional emoji characters, a major new extension of CJK ideographs, and 285 hentaigana, important historic variants for Hiragana syllables.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 1, 2017. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-10.0.0.html for more information about testing the 10.0.0 beta.

See http://unicode.org/versions/Unicode10.0.0/ for the current draft summary of Unicode 10.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.