Monday, June 26, 2017

Gold Sponsor Avocados from Mexico

The Unicode Consortium is pleased to announce that Avocados from Mexico is now a gold sponsor for:
Avocados from Mexico’s sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage.

Avocados From Mexico are healthy, always in season and a delicious way to elevate go-to dishes into a nutritious meal. The fresh fruit will provide you with nearly 20 vitamins and minerals, are cholesterol-free and sodium-free, naturally good in fats and are a good source of fiber per 50 g serving (one third of a medium avocado). You can find more information and recipe ideas at AvocadosFromMexico.com. —Avocados from Mexico

The Unicode Consortium thanks Avocados from Mexico for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character.   

Tuesday, June 20, 2017

Announcing The Unicode® Standard, Version 10.0

Soyombo 11A9EVersion 10.0 of the Unicode Standard is now available. For the first time, both the core specification and the data files are available on the same date. Version 10.0 adds 8,518 characters, for a total of 136,690 characters. These additions include four new scripts, for a total of 139 scripts, as well as 56 new emoji characters.

The new scripts and characters in Version 10.0 add support for lesser-used languages and unique written requirements worldwide, including:
  • Masaram Gondi, used to write Gondi in Central and Southeast India
  • Nüshu,used by women in China to write poetry and other discourses until the late twentieth century
  • Soyombo and Zanabazar Square, used in historic Buddhist texts to write Sanskrit, Tibetan, and Mongolian
  • Syriac letters used for writing Suriyani Malayalam, also known as Garshuni and as Syriac Malayalam
  • Gujarati signs used for the transliteration of the Arabic script into Gujarati by Ismaili Khoja communities
  • A set of 285 Hentaigana characters used in Japan (historic variants of Hiragana characters)
  • CJK Extension F (7,473 Han characters)
Among important symbol additions are:
  • Bitcoin sign
  • A set of Typicon marks and symbols
  • 56 emoji characters including:
🧙  mage 🥦  coconut
 fairy 🥦  broccoli
🧛  vampire 🥪  sandwich

For the full list of emoji characters, see emoji additions for Unicode 10.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.

Three other important Unicode specifications have been updated for Version 10.0:

Unicode 10.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 10.0, often in coordination with changes to character properties. In particular, there are changes to UAX #14, Unicode Line Breaking Algorithm, UAX #29, Unicode Text Segmentation, and UAX #31, Unicode Identifier and Pattern Syntax. In addition, UAX #50, Unicode Vertical Text Layout, has been newly incorporated as a part of the standard.

The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.

Adopt-a-Character

All the additional 8,518 characters including 239 new emoji are now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

[emoji image]

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Monday, June 19, 2017

Gold Sponsor CSRA

The Unicode Consortium is pleased to announce that CSRA is now a gold sponsor for:

sponsor

CSRA’s sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage.

CSRA is a leading provider of next-generation technology to its public-sector customers. The company’s sponsorship of the U.S. flag emoji is symbolic of the nexus between its IT services and its customers, as featured in an article by NextGov.

The Unicode Consortium thanks CSRA for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character.

Wednesday, June 14, 2017

Gold Sponsor ☮.com

The Unicode Consortium is pleased to announce that ☮.com is now a gold sponsor for:

☮

☮.com’s sponsorship directly funds the work of the Unicode Consortium in enabling modern software and computing systems to support the widest range of human languages. There are approximately 7,000 living human languages. Fewer than 100 of these languages are well-supported on computers, mobile phones, and other devices. AAC donations are used to improve support for digitally disadvantaged languages, and to help preserve the world’s linguistic heritage.

☮.com proudly supports Unicode's efforts because those efforts promote wider and clearer communication to prevent misunderstandings that can cause conflict, violence, and suffering anywhere around the world.

The Unicode Consortium thanks ☮.com for their support!

All sponsors are listed on Sponsors of Adopted Characters. More than 128,000 other characters are available for adoption — see Adopt a Character.

Tuesday, June 13, 2017

Feedback on Draft Additional Repertoire for Amendments to ISO/IEC 10646:2017 (5th edition)

chart image The Unicode Technical Committee is soliciting feedback on pending additions to the draft repertoire of characters, to help discover any errors in character names, incorrect glyphs, or other problems. There is a short window of opportunity to review and comment on the repertoire additions noted below.

Additional repertoire for two amendments to ISO/IEC 10646:2017 (5th Edition) is under review. See the associated repertoire in: Feedback on draft additional repertoire for Amendment 1.3 (PDAM) to ISO/IEC 10646:2017 (5th edition) and Feedback on draft additional repertoire for Amendment 2 (PDAM) to ISO/IEC 10646:2017 (5th edition).

Review of the Amendment 1.3 draft repertoire is especially urgent, as that content will be finalized by SC2 in September, and is scheduled for eventual publication in next year's Unicode 11.0. Note that the hentaigana and emoji portions of the amendment have already been accelerated for imminent publication in Unicode 10.0, so further comments on character names for those portions of the repertoire are no longer actionable.

There is more time to provide feedback on the Amendment 2 draft repertoire, but note that the addition of Mtavruli Georgian as part of that repertoire is also rather urgent.

The Unicode Standard is developed in synchrony with ISO/IEC 10646. After ISO balloting is completed on any repertoire additions, no further changes or corrections will be possible. (See the FAQ Standards Developing Organizations for additional information on the stages in ISO standards development.) Advance feedback on these repertoire additions will help inform the UTC discussions about its own contribution to the ISO balloting process.

Documents referenced in the draft repertoire with numbers such as L2/15-088 are available in the UTC Document Registry.

For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions.

Monday, May 22, 2017

Unicode Emoji submission deadline now July 1

The next emoji character submission deadline has been moved up to July 1, 2017 to accommodate upcoming changes in the release schedule for Unicode versions. Emoji character proposals submitted before July 1 are eligible to be considered for the 2018 version of Unicode, those submitted after that date will be considered earliest for the 2019 version.

The change in deadline only affects proposals for new emoji characters; proposals that don’t involve new characters — such as for new ZWJ sequences or subdivision flags — are unaffected by the change in deadline.

The annual Unicode Standard release is being shifted from June to early March to to better align with product development schedules across the industry, especially for mobile products. This shift will not fully take effect until 2019, but in preparation for this change the submission date for emoji character proposals is being adjusted now.

The 239 new emoji are also now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

[emoji 23f3 image]

Thursday, May 18, 2017

Unicode Emoji 5.0 specification now final

The new Emoji 5.0 set was finalized in March 2017, making it available for vendors to begin working on their emoji fonts and code ahead of the release of Unicode 10.0, scheduled for June 2017.

The Emoji 5.0 specification is now final as well. The specification has become a technical standard, adding conformance clauses and enhanced syntax definitions. A general mechanism for emoji tag sequences has been added, initially used for country subdivisions such as Scotland. The Emoji_Component property has been added, for filtering out characters from keyboard palettes. The design and usage guidelines have also been enhanced.

The 239 new emoji are also now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

[emoji 1f92b image]

Wednesday, April 26, 2017

Last Call on Unicode 10.0 Beta Review

U10 beta image The beta review period for Unicode 10.0 and related technical standards will close on May 1, 2017. This is the last opportunity for technical comments before version 10.0 is released in Q2 2017. Implementers and interested parties are encouraged to download data files, review proposed updates, and submit comments soon.

In addition to the Unicode Standard proper, three other Unicode Technical Standards have significant text and data file updates that are correlated with the new additions for Unicode 10.0.0. Review of that text and data is also encouraged during the beta review period.

UTS #10, Unicode Collation Algorithm Data files
UTS #39, Unicode Security Mechanisms Data files
UTS #46, Unicode IDNA Compatibility Processing Data files

Additional documents are available for public review and will be discussed at the May UTC meeting, such as the final Emoji 5.0 text, and a proposed Unicode character property. For more information, see the open public review issues and the UTC document registry.

The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 10.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 10.0, often in coordination with changes to character properties. In particular, there are changes to UAX #14, Unicode Line Breaking Algorithm, UAX #29, Unicode Text Segmentation, and UAX #31, Unicode Identifier and Pattern Syntax. In addition, UAX #50, Unicode Vertical Text Layout, has been newly incorporated as a part of the standard. Four new scripts have been added in Unicode 10.0, including Nüshu. There are also 56 additional emoji characters, a major new extension of CJK ideographs, and 285 hentaigana, important historic variants for Hiragana syllables.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 1, 2017. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-10.0.0.html for more information about testing the 10.0.0 beta.

See http://unicode.org/versions/Unicode10.0.0/ for the current draft summary of Unicode 10.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Monday, April 17, 2017

ICU 59 Released

ICU LogoUnicode® ICU 59 has just been released! ICU is the main avenue for many software products and libraries to support the world's languages, implementing both the latest version of the Unicode encoding standard and of the Unicode locale data (CLDR).

ICU 59 upgrades to CLDR 31 and to emoji 5.0 data, together with segmentation and bidi updates from Unicode 10 beta. The Java code for number formatting has been completely rewritten for reliability and performance. There is also a new case mapping API for styled text, and a technology preview of enhanced language matching.

There are major changes for ICU4C that will make ICU easier to use but require changes in projects using ICU: C++11, char16_t, UTF-8 source files.

For details please see http://site.icu-project.org/download/59

Thursday, April 13, 2017

Call for Unicode 10.0 Cover Design Art

 [cover1] The Unicode Consortium is inviting artists and designers to submit cover design proposals for Version 10.0 of The Unicode Standard.

The cover design will appear on the Unicode Standard 10.0 web page, in the print-on-demand publication, and in associated promotional literature on the Unicode website. The chosen artist will receive full credit in the colophon of the publication, and wherever else the design appears, and receive $700. The two runner-up artists will receive $150 apiece.

Please see the announcement web page for requirements and more details.

Friday, April 7, 2017

PRI #351: Combined registration of the KRName collection and of sequences in that collection

PRI 351 The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #351: A submission for the “Combined registration of the KRName collection and of sequences in that collection” has been received by the IVD Registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2017-07-07. Please see the submission page for details and instructions on how to review this issue and provide comments:

http://www.unicode.org/ivd/pri/pri351/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for Ideographs, which enables standardized interchange in plain text, in accordance with UTS #37, Unicode Ideographic Variation Database.

Monday, March 27, 2017

Unicode Emoji 5.0 characters now final


Fifty-six new emoji characters are in the just released Emoji 5.0 data, including such characters as:

shushing face mage
flying saucerpie
T-Rexbroccoli*
* for healthy eaters!

The new Emoji 5.0 set is fixed, and available for vendors to begin working on their emoji fonts and code ahead of the release of Unicode 10.0, scheduled for June 2017.

The majority of these new emoji characters are the 34 Smileys & People, with 13 new Food & Drink, followed up by 6 Animals & Nature and a few others.

There are an additional 180 emoji sequences for gender and skin-tone in Smileys & People — such as woman in lotus position: medium skin tone — and new regional flags for England, Scotland, and Wales. This makes a total of 239 new emoji (characters and sequences). For a full list, see Emoji Recently Added.

The emoji charts have been updated to show the new characters and sequences. The draft Emoji 5.0 specification will be finalized in the May UTC meeting, and is still available for comment.
The 239 new emoji are also now available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages.

Adopt a Character

Monday, March 20, 2017

CLDR Version 31 Released

CLDR CoverageUnicode CLDR 31 provides an update to the key building blocks for software supporting the world’s languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Some of the improvements in the release are:
  • Canonical codes
    • The subdivision codes have been changed to all have the bcp47 format.
    • The locales in the language-territory population data are in canonical format.
    • The timezone ID for GMT has been split from UTC.
    • There is a mechanism for identifying hybrid locales, such as Hinglish.
  • Emoji 5.0
    • Short names and keywords have been updated for English. (Data for other languages to be gathered in the next cycle).
    • Collation (sorting) adds the new 5.0 Emoji characters and sequences, and some fixes for Emoji 4.0 characters and sequences.
    • For Emoji usage, subdivision names for Scotland, Wales, and England have been added for 65 languages.
For further details and links to documentation, see the CLDR Release Notes.

Thursday, March 9, 2017

Unicode 10.0 Beta Review

U10 beta image The Unicode Standard is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases. Thus it is important to ensure a smooth transition to each new version of the standard.

Unicode 10.0 includes a number of changes. Some of the Unicode Standard Annexes have modifications for Unicode 10.0, often in coordination with changes to character properties. In particular, there are changes to UAX #14, Unicode Line Breaking Algorithm, UAX #29, Unicode Text Segmentation, and UAX #31, Unicode Identifier and Pattern Syntax. In addition, UAX #50, Unicode Vertical Text Layout, has been newly incorporated as a part of the standard. Four new scripts have been added in Unicode 10.0, including Nüshu. There are also 56 additional emoji characters, a major new extension of CJK ideographs, and 285 hentaigana, important historic variants for Hiragana syllables.

Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 1, 2017. Feedback instructions are on the beta page.

See http://unicode.org/versions/beta-10.0.0.html for more information about testing the 10.0.0 beta.

See http://unicode.org/versions/Unicode10.0.0/ for the current draft summary of Unicode 10.0.0.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.

The membership of the consortium represents a broad spectrum of corporations and organizations, many in the computer and information processing industry. Members include: Adobe, Apple, EmojiXpress, Facebook, Google, Government of Bangladesh, Government of India, Huawei, IBM, Microsoft, Monotype Imaging, Netflix, Sultanate of Oman MARA, Oracle, Rajya Marathi Vikas Sanstha, SAP, Symantec, Tamil Virtual University, The University of California (Berkeley), plus well over a hundred Associate, Liaison, and Individual members. For a complete member list go to http://www.unicode.org/consortium/members.html.

Friday, March 3, 2017

UTS #51, Unicode Emoji proposed update available

PRI349 image Proposed Update Version 5.0 of UTS #51, Unicode Emoji is available for public review and feedback. This new version is slated to be a Unicode Technical Standard, and thus adds a conformance section and related definitions.

This new version adds a mechanism to support regional flags, such as Scotland or California, although the choice of which of these flags to support is left to vendors beyond a recommended set of three. UTS #51 will have a separate data file for the valid emoji presentation sequences. It also reflects some changes to the recommended sort order that will be released soon in CLDR v31. For more details, see the Modifications section of the document.

Thursday, March 2, 2017

PRI #349: Registration of additional sequences in the Adobe-Japan1 collection

PRI349 image The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #349: A submission for the "Registration of additional sequences in the Adobe-Japan1 collection" has been received by the IVD registrar.

This submission is currently under review according to the procedures of UTS #37, Unicode Ideographic Variation Database, with an expected close date of 2017-06-02. Please see the submission page for details and instructions on how to review this issue and provide comments: http://www.unicode.org/ivd/pri/pri349/

The IVD (Ideographic Variation Database) establishes a registry for collections of unique, and sometimes shared, variation sequences for CJK Unified Ideographs, which enables standardized interchange in plain text, in accordance with UTS #37, Unicode Ideographic Variation Database.

For further information on Public Review Issues, please see: http://www.unicode.org/review/

Tuesday, February 28, 2017

Netflix Upgrades to Full Member of the Unicode Consortium

Netflix The Unicode Consortium is pleased to announce that Netflix has upgraded from associate member to a full corporate member.

Netflix is the world’s leading Internet television network with over 93 million members in over 190 countries enjoying more than 125 million hours of TV shows and movies per day, including original series, documentaries and feature films.

We look forward to their contributions to the Unicode Standard, ICU, the Common Locale data project, and are grateful for their financial support of the Consortium’s work. Full members of the consortium have a vote in all technical committees, and in the governance of the consortium.

Monday, February 27, 2017

Be a Part of IUC 41! Call for Participation

IUC 41 The Internationalization and Unicode Conference® (IUC) is the annual conference of the Unicode Consortium where experts and industry leaders gather to map the future of internationalization, ignite new ideas and present the latest in technologies and best practices for creation, management, and testing of global, web, and multilingual software solutions.

Join in with other industry leaders to present your ideas and solutions at the 41st Internationalization & Unicode Conference (IUC 41) in Santa Clara, California, October 16-18, 2017.

Please submit your proposals for presentations or tutorials by Friday, March 24, 2017. Topics can include case studies, best practices, innovative technology, or evolving standards.

Full details and information about how to submit an abstract can be found on the IUC 41 Call for Participation page.

Thursday, February 23, 2017

Proposed update of UTS #46, for Unicode domain names

UTS #46 “Unicode IDNA Compatibility Processing” is used by many applications to support internationalized domain names with non-English characters. The proposed update to Version 10.0 regenerates the UTS #46 data files based on new additions to the Unicode repertoire, and adds three new parameters for processing: CheckHyphens, CheckBidi, and CheckJoiners. These parameters allow implementations to reflect current practice in browsers. The note about the use of IDNA2008 now includes the number of “missing” IDNA2008 characters (26,568), and is reworded for clarity.

There are two review notes requesting feedback on the use of Joiner characters.

For details and information about how to provide feedback, please see Public Review Issue #347.

Monday, February 20, 2017

Unicode Locale Data v31α available for testing

cldr v31 alpha The Alpha version of Unicode CLDR version 31 is available for testing. The beta v31 will contain updates to the LDML spec and should be available on March 1, with the release of v31 planned for March 15.

CLDR 31 provides an update to the key building blocks for software supporting the world's languages. This data is used by all major software systems for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.

Aside from the regular updates to codes and data, some of the more noticeable changes are:
  • Canonical codes
    • The subdivision codes were changed to consistently use the bcp47 format.
    • The locales in the language-territory population data and the exemplars directory were regularized (dropping likely scripts subtags).
    • The timezone ID for GMT has been split from UTC.
    • There is a new mechanism for identifying hybrid locales, such as Hinglish.
  • Subdivisions
    • Names for Scotland, Wales, and England have been added in many languages.
  • Emoji 5.0
    • Short names and keywords have been updated for English.
    • Collation (sorting) adds the new 5.0 Emoji characters and sequences, and some fixes for Emoji 4.0 characters and sequences.
  • Transforms
    • The Zawgyi→Unicode transform has been improved.
    • Tamil can now be transcribed to the International Phonetic Alphabet (IPA).
This release did not have a data-submission cycle, so the changes reflect cleanup and bug fixes. For more details, and important notes for smoothly migrating implementations, see Unicode CLDR Version 31. If you find a problem, please file a ticket.

Wednesday, January 11, 2017

New Unicode Character Property EquivalentUnifiedIdeograph

sample image A new character property EquivalentUnifiedIdeograph is proposed for addition to Unicode 10.0. This property associates (where possible) the 365 characters in the CJK Radicals Supplement, Kangxi Radicals, and CJK Strokes blocks to an appropriate CJK Unified Ideograph.

For details of the proposal, a link to the proposed data, and information about how to provide feedback, please see Public Review Issue #344.