Tuesday, December 18, 2012

Unicode 6.2 Paperback Available

Unicode 6.2, Core Specification is now available as paperback book.

Responding to requests, the editorial committee has created a modestly-priced print-on-demand volume that contains the complete text of the core specification of Version 6.2 of the Unicode Standard. This 692-page volume may be purchased from Lulu.com for $17.24, plus shipping.

Note that this volume does not include the Version 6.2 code charts, nor does it include the Version 6.2 Standard Annexes and Unicode Character Database, all of which are available only on the Unicode website, http://www.unicode.org/versions/Unicode6.2.0/ .

Purchase The Unicode Standard, Version 6.2 - Core Specification.

Friday, December 14, 2012

Unicode Stability Policies Updated

The Unicode Character Encoding Stability Policies ensure that developers know what they can depend on between successive releases of the Unicode Standard.
Recent changes to these policies include new guarantees:
  • Property aliases will not be reused later for different properties.
  • Property value aliases will not be reused later for different property values.
  • Characters with the General_Category of Number are guaranteed to have a corresponding Numeric_Type value.
Additionally, the wording for two earlier guarantees about General_Category and Bidi_Class have been clarified:
  • No new General_Category property values will ever be added.
  • New Bidi_Class property values can only be added for a tightly constrained class of new character additions.
For the exact wording of these new and updated guarantees, see Unicode Character Encoding Stability Policies.

Wednesday, December 12, 2012

Feedback requested for Unicode 6.3


Unicode 6.3 is slated to be released in 2013Q3. Now is your opportunity to comment on the contents of this release.

The text of the Unicode Standard Annexes (segmentation, normalization, identifiers, etc.) is open for comments and feedback, with proposed update versions posted at UAX Proposed Updates. Initially, the contents of these documents are unchanged: the one exception is UAX #9 (BIDI), which has major revisions in PRI232. Changes to the text will be rolled in over the next few months, with more significant changes being announced. Feedback is especially useful on the changes in the proposed updates, and should be submitted by mid-January for consideration at the Unicode Technical Committee meeting at the end of January.

A later announcement will be sent when the beta versions of the Unicode character properties for 6.3 are available for comment. The only characters planned for this release are a small number of bidi control characters connected with the changes to UAX #9.

Monday, December 10, 2012

Unicode Collation Proposed Update

The Unicode Collation Algorithm (UCA) data is being modified to make all digits with the same numeric value sort the same, whether they are European (ASCII), Arabic, Devanagari, or others. In addition, the format of the main data table has changed to omit the (unused) 4th level weight, and some data tables are moved to the Unicode CLDR project.

These and other changes are in the new proposed update: see PRI 235. For the exact list of modifications, see Modifications.

Friday, November 16, 2012

Unicode 6.2 core specification now available

The Unicode 6.2 core specification is now available. The text has been updated to align it with changes to Unicode algorithms and properties that were released in September, including the addition of the newly adopted Turkish lira sign. The release of the core specification completes the definitive documentation of the Unicode Standard, Version 6.2.

For more details, see http://www.unicode.org/versions/Unicode6.2.0.

Friday, October 26, 2012

CLDR Version 22.1 Released

Oct 26, 2012 — Unicode CLDR 22.1 has been released, providing an update to the key building blocks for software supporting the world's languages.

Unicode CLDR 22.1 contains data for 215 languages and 227 territories—654 locales in all. Version 22.1 is an update release, with several important fixes to CLDR 22.0, such as addition of the new Turkish currency symbol, and simpler patterns for fallback timezone formatting (“Los Angeles Time” instead of “United States Time (Los Angeles)”). For details, see CLDR-22.1.

CLDR is by far the largest and most extensive standard repository of locale data, used by a wide spectrum of companies for their software internationalization and localization. It is widely deployed via International Components for Unicode (ICU), and also accessed directly by companies such as Apple, Google, IBM, Twitter, and many others. CLDR is part of the Unicode locale data project, together with the Unicode Locale Data Markup Language (LDML)—an XML format used for general interchange of locale data, such as in Microsoft's .NET.

See the Charts pages for views of the CLDR data, organized in various ways. For more information about the Unicode CLDR project see cldr.unicode.org.





About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Tamil Virtual University, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.

Tuesday, October 16, 2012

Two New Public Review Issues, UAX #9 and UTR #20

The Unicode Technical Committee has posted two new issues for public review and comment.

    http://www.unicode.org/review/

Review periods for the new items close on January 21, 2013.
Please see the page for links to discussion and relevant documents.

Briefly, the new issues are:

PRI #232, Proposed Update UAX #9, Unicode Bidirectional Algorithm

UAX #9 will be updated for Unicode 6.2.1. This proposed update involves a substantial extension of the Unicode Bidirectional Algorithm to allow for the implementation of isolate runs. It also introduces a new X_Bidi_Class property in support of that extension. See the modifications section of the proposed update for information on specific changes to sections in the document.
http://www.unicode.org/review/pri232/

PRI #233, Proposed Update UTR #20, Unicode in XML and other Markup Languages

This Unicode Technical Report will have its references corrected and various other small editorial changes made to bring it up-to-date with Unicode 6.2.
http://www.unicode.org/review/pri233/

To supply feedback on these issues, see http://www.unicode.org/review/#feedback .

Wednesday, September 26, 2012

Announcing The Unicode Standard, Version 6.2

Version 6.2 of the Unicode Standard is now available. This version adds only a single character, the newly adopted Turkish Lira sign; however, the properties and behaviors for many other characters have been adjusted. Emoji and pictographic symbols now have significantly improved line-breaking, word-breaking and grapheme cluster behaviors. The script categorizations for some characters are improved and better documented.

The Unicode Collation Algorithm has been greatly enhanced for Version 6.2, with a major overhaul of its documentation. There have also been significant changes to the collation weight tables, including improved handling of tertiary weights for characters with decompositions, and changed weights for some pictographic symbols.

The newly encoded Turkish Lira sign, like other currency symbols, is expected to be heavily used in its target environment. The Unicode Consortium accelerated the release of Unicode 6.2, to accommodate the urgent need for this character.

For more details of this release, see http://www.unicode.org/versions/Unicode6.2.0/.

Monday, September 10, 2012

CLDR Version 22 Released


Mountain View, CA, Sept. 10, 2012  - The Unicode® Consortium announced today the release of a new version of the Unicode Common Locale Data Repository (Unicode CLDR 22.0), providing key building blocks for software to support the world's languages.

Unicode CLDR 22.0 contains data for 215 languages and 227 territories—654 locales in all. The main focus for this release is to flesh out data items in major languages and locales, yielding an increase of over 100% in the total number of data fields. Other major features include the addition of keyboard mapping data for different platforms, the new Zhuyin (Bopomofo) sort order for Chinese, and script metadata. There are also enhancements to compact decimals (such as formatting 1,000,000 as “1 million” or “1M”) for different languages and to rule-based number formats (such as writing 423 as "four hundred and twenty-three"). For more details, see the CLDR 22.0 Release Note.
CLDR is used to adapt software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; choosing languages or countries by name; and transliterating different alphabets.

It is by far the largest and most extensive standard repository of locale data, used by a wide spectrum of companies for their software internationalization and localization. It is widely deployed via International Components for Unicode (ICU), and also accessed directly by companies such as Apple, Google, IBM, Twitter, and many others. 

CLDR is part of the Unicode locale data project, together with the Unicode Locale Data Markup Language (LDML)—an XML format used for general interchange of locale data, such as in Microsoft's .NET. See the charts pages for views of the CLDR data, organized in various ways. For more information about the Unicode CLDR project see cldr.unicode.org.

About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Tamil Virtual University, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.

Monday, July 23, 2012

Unicode Security Mechanisms, Version 3 Released

Version 3.0 of UTS #39, Unicode Security Mechanisms has been released by the Unicode Consortium, together with a new version of the associated UTR #36, Unicode Security Considerations. Because the Unicode Standard contains such a large number of characters for the writing systems of the world, caution is necessary to avoid exposing programs and systems to possible security attacks. These revised documents describe security considerations for Unicode and specify improved mechanisms for reducing the risk of problems.

Version 3.0 is a major revision. Significant changes include:
  • Mixed Script Detection has extensive revisions to its specification.
  • Restriction Level now has an explicitly defined process.
  • Mixed Number Detection now has an explicitly defined process.
  • Conformance requirements have been extended to include Restriction Level and Mixed Number Detection.
http://www.unicode.org/reports/tr36/
http://www.unicode.org/reports/tr39/

Thursday, July 19, 2012

Version 15 of UTS #18, Unicode Regular Expressions has been released by the Unicode Consortium. Regular expressions are used throughout much of the world's software for matching and manipulating text. UTS #18 provides the foundation for the handling of Unicode text in those expressions.

Version 15 is a major revision. Changes include:
  • Conformance clauses dealing with non 1:1 equivalences were either retracted or modified.
  • A Level 2 conformance clause for full properties was added.
  • New properties, including Name_Alias matching and Script_Extensions, were added.
  • A recommended compact form of Unicode escapes was added: \u{...}.
  • There were many clarifications of the text. See http://www.unicode.org/reports/tr18/tr18-15.html

Monday, July 2, 2012

Dr. Vinton G. Cerf to Keynote IUC 36!

Dr. Vinton G. Cerf, Vice President and Chief Internet Evangelist at Google has just been announced as the keynote speaker for the 36th Internationalization & Unicode Conference. Dr. Cerf has served as vice president and chief Internet evangelist for Google since October 2005. In this role, he is responsible for identifying new enabling technologies to support the development of advanced, Internet-based products and services from Google. Dr. Cerf is widely known as one of the “Fathers of the Internet,” for being a co-designer of the TCP/IP protocols. For details please see the on-line announcement: http://www.unicodeconference.org/e/IUC36-07-02-12.htm

Tuesday, June 26, 2012

Proposed updates for Unicode Collation and IDNA

The proposed update of UTS#10 Unicode Collation Algorithm (UCA) modifies the specification for certain edge cases (overlapping contractions), and tightens the requirements for well-formed collation element tables. The detailed descriptions of parametric tailoring options have been removed, and now refer to the corresponding section in LDML. That section adds new explanations and definitions. There are a number of improvements, including additional examples, and some rearrangement of text. See PRI #223

The data has been updated for the Unicode 6.2 beta review, and the associated CollationAuxiliary.txt file in CollationAuxiliary.zip now includes a description of the implicit fractional weight generation and the context syntax. For more details, see Modifications.

There is also a proposed update of UTS #46 Unicode IDNA Compatibility Processing. The data has been updated for the Unicode 6.2 beta review, with minor changes to the text. See PRI #224

Monday, June 25, 2012

Using the Unicode Glossary

The Unicode glossary is useful for people doing documents, specifications, and general-purpose articles. Each of the glossary entries now has a link on it, and clicking on that link exposes it in the address bar of your browser. This makes it easy to add links directly to the Unicode glossary for terms that may be unfamiliar to readers, such as
http://unicode.org/glossary/#grapheme_cluster or
http://unicode.org/glossary/#code_point.

Wednesday, June 13, 2012

Tutorials Announced for IUC 36

Tutorials Announced for 36th Internationalization and Unicode Conference
Santa Clara, Calif., USA; October 22-24, 2012
Mountain View, CA, USA – June 13, 2012 – The Unicode® Consortium today announced the tutorial sessions for the Thirty-sixth Internationalization and Unicode Conference (IUC). IUC 36 will take place in Santa Clara, Calif., USA at the Hyatt Regency Hotel on October 22-24, 2012, sponsored by Adobe. This is the premier conference on technologies and practices for the creation and management of global and multilingual software applications. For more information about the program, please visit http://www.unicodeconference.org/iuc36-tutorials
The Internationalization and Unicode Conference (IUC) covers the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.
Tutorial Sessions Include:
  • “An Introduction to Writing Systems & Unicode,” by Richard Ishida, Internationalization Activity Lead, W3C
  • “Unicode – A Grand Tour,” by Michael McKenna, International Product Engineer, Zynga, Inc., and Craig Cummings, Globalization Center of Excellence, Rearden Commerce and UTC Vice Chair, Unicode Consortium
  • “Internationalizing Domain Names in Applications (IDNA),” by Amit Gupta, Member Technical Staff, Adobe Systems
  • “Internationalization, An Introduction (Part I: Character Encoding) (Part II: Enabling),” by Addison Phillips, Globalization Architect, Lab 126
  • “Developing an OpenType Font for Complex Scripts Using Fontforge,” by Pravin Dinkar Satpute, Senior Software Engineer, Red Hat
  • “I18N in Javascript with iLib,” by Edwin Hoogerbeets, Independent Globalization Consultant
  • “Keyboard Design for Tavultesoft Keyman and Unicode,” by Marc Durdin, CEO, Tavultesoft Pty Ltd
  • “Web Internationalization – Standards and Best Practices,” by Tex Texin, Chief Globalization Architect, Rearden Commerce, Inc.
  • “Using ICU Workshop,” by Steven R. Loomis, Software Engineer, IBM
  • “Internationalization and Localization in Ruby and Ruby on Rails,” by Martin J. Dürst, Professor, Aoyama Gakuin University
  • “The Road to World-Class Starts with World-Ready,” by Michael Kuperstein, Localization Engineer and Loïc Dufresne de Virel, Localization Strategist, Intel Corporation
  • “Building Multilingual Websites in Drupal 7 and Joomla 2.5,” by Jim DeLaHunt, Principal, Jim DeLaHunt & Associates
MultiLingual Magazine is the media sponsor. The early-bird registration deadline is September 7, 2012. Sponsorships and exhibit space are available; for more information on sponsoring contact Ken Berk at ken.berk@omg.org, +1-781-444 0404. For exhibiting questions email event_marketing@omg.org. For all other questions email info@unicodeconference.org.
###
About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.
The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, Rearden Commerce, SAP, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.
For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html
About the Event Producer
OMG® is the Event Producer for the Internationalization & Unicode Conferences. OMG is an open membership, not-for-profit consortium that produces and maintains computer industry specifications for interoperable enterprise applications. Our specifications include MDA®, UML®, CORBA®, MOF™, XMI® and CWM™. OMG’s specifications are all available for download by everyone without charge.
For more information about OMG, visit us online at http://www.omg.org.

Thursday, June 7, 2012

CLDR 21.0.2: New T Extensions for language/locale identifiers

New T Extension fields and subfields [RFC 6497] are now available for use in BCP47 and Unicode Locale/Language Identifiers. These T extensions provide for the identification of transforms that can be used for tagging content or requesting resources. The new T extension fields and subfields are defined in the following files, as part of the CLDR 21.0.2 release:
For example:
  • "zh-t-i0-pinyin", to indicate Chinese text generated with a pinyin input method
  • "en-t-k0-dvorak", to identify a Dvorak keyboard for English
  • "it-t-k0-osx-extended", to request an extended Mac keyboard for Italian
The private use subfields can be used for private agreements, such as:
  • "ru-t-en-x0-mobile", to indicate a translation from English to Russian for use on a mobile device, or
  • "ja-t-de-t0-und-x0-medical", to identify a machine translation from German to Japanese with a specialized dictionary for medical terms.
Related to this, there is draft keyboard layout data currently slated for CLDR 22.0: see Draft Keyboard Charts.

Wednesday, June 6, 2012

PRI #231: Bidi Parenthesis Algorithm

The Unicode Technical Committee is seeking feedback on a proposal to enhance the Unicode Bidirectional Algorithm (UAX #9) with additional logic--a bidirectional parenthesis algorithm (BPA)--for processing paired punctuation marks such as parentheses. This proposal is intended to produce better bidi-layout results in common text sequences that involve paired punctuation marks. Details of the proposal, with questions for reviewers and a detailed background document are available through the PRI #231 page:
http://www.unicode.org/review/pri231/

PRI #229: Linebreaking Changes for Pictographic Symbols

The UTC is proposing changes to the line break property of many pictographic characters. Details of the proposed changes are on the PRI #229 page and its associated background document.

Please see: http://www.unicode.org/review/pri229/

Tuesday, June 5, 2012

New Public Review Issues: Changes to Character Properties

The UTC is considering some property changes for Unicode 6.2. PRI #227 proposes changes to the Script Extensions property values for certain combining marks. PRI #228 proposes changes of General Category for some common characters from Punctuation to Symbol, to better align with expectations about how those characters behave. Because these characters are quite common, the proposed change may impact a large number of implementations.

For details of the proposals, see the PRI pages:
http://www.unicode.org/review/pri227/
http://www.unicode.org/review/pri228/

Monday, June 4, 2012

New Public Review Issues: Changes to the Unihan database

The UTC is considering two changes relating to properties for the Unihan database. PRI #225 proposes transforming the data for the kHanyuPinlu field so that it also uses accented pinyin. PRI #226 proposes deprecation of the kCompatibilityVariant field.

For details of these see the PRI pages:
http://www.unicode.org/review/pri225/
http://www.unicode.org/review/pri226/

Friday, June 1, 2012

Unicode 6.2 Beta Review

The Unicode® Consortium today announced the start of beta review for the forthcoming Unicode 6.2.0. All beta feedback must be submitted by July 23, 2012.

Unicode 6.2 is a minor release of the Unicode Standard. The main feature of this release is the inclusion of the newly encoded Turkish lira symbol. However, there are other important changes to Unicode properties and annexes, affecting segmentation and more.

Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML,...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard. Software developers and other experts are strongly encouraged to review the beta data files and documentation for Unicode 6.2.0 carefully, and to provide any feedback regarding errors or other issues to the Unicode Consortium. Software developers can also get an early start in testing their programs with the beta data files so they will be ready for the release of Unicode 6.2.0 in September 2012.

• See http://www.unicode.org/versions/beta-6.2.0.html, for information about testing the 6.2.0 beta.
• See http://www.unicode.org/versions/Unicode6.2.0/ for the current draft summary of Unicode 6.2.0.

Thursday, May 31, 2012

Unicode sessions at Localization World Paris

On Monday, 4 June, noted experts Richard Ishida (W3C) and Addison Phillips (Lab126) have teamed up to present a full day of sessions on Unicode.

In the morning, Richard Ishida will present “An Introduction to Writing Systems and Unicode”, a tutorial that will introduce the basic functioning of Unicode in dealing with non-Latin writing systems. It is an excellent orientation for people new to these concepts, but it also offers content for people at intermediate and advanced levels due to the breadth of scripts discussed.

In the afternoon, Addison will present "Internationalization: An Introduction", a two-part tutorial covering:

• What is internationalization?
• What is Unicode? Implementing and using the standard.
• How do you prepare software localization and translation?

Finally, Richard and Addison will present " Towards the Promised Land: Globalization Developments in Web Standards", which surveys current developments at the W3C.

You may register for any or all of these sessions via http://localizationworld.com/lwparis2012/registration.php where you will see the sessions in the preconference day.

This is an opportunity to get a taste of the Unicode conference to be held in California on the following October 22-24, and see how the people on your staff can benefit from a deeper knowledge of Unicode and internationalization.

Friday, May 25, 2012

Unicode 6.1 Paperback Available

Unicode 6.1, Core Specification is now available as paperback book.

Responding to requests, the editorial committee has created a modestly-priced print-on-demand volume that contains the complete text of the core specification of Version 6.1 of the Unicode Standard. This 692-page volume may be purchased from Lulu.com for $15.96, plus shipping.

Note that this volume does not include the Version 6.1 code charts, nor does it include the Version 6.1 Standard Annexes and Unicode Character Database, all of which are available only on the Unicode website, http://www.unicode.org/versions/Unicode6.1.0/ .

Purchase The Unicode Standard, Version 6.1 - Core Specification.

Tuesday, May 15, 2012

Unicode 6.2 to Support the Turkish Lira Sign

In March of this year, the Central Bank of Turkey announced the adoption of a new currency symbol for the Turkish lira. Public use of the new currency symbol  has already been adopted within Turkey.

Recognizing the urgent need to support the new currency symbol in information systems, the Unicode Consortium has scheduled its next release, Unicode 6.2, for the third quarter of 2012. That release will include the new character, U+20BA TURKISH LIRA SIGN.

Additional information regarding the new Turkish lira sign is available from the Central Bank of Turkey: http://www.tcmb.gov.tr/yeni/iletisimgm/TurkishLira.php

Monday, April 23, 2012

Unicode Version 6.1 - Complete Text of Core Specification Published

Mountain View, CA, April 23, 2012 - The Unicode® Consortium is pleased to announce the publication of the final text of the core specification for Unicode 6.1. The Unicode 6.1 core specification documents newly encoded scripts, certain conformance clarifications, and other updates and improvements to the text. In Version 6.1, the standard grew by 732 characters.

Version 6.1 of the Unicode Standard continues the Unicode Consortium's long-term commitment to support the full diversity of languages around the world. This latest version adds characters to support additional languages of China, other Asian countries, and Africa. It also addresses educational needs in the Arabic-speaking world.

This version of the Standard brings technical improvements to support implementers, particularly with improvements to property values and their aliases that enable easier programmatic use. Other improvements include line-breaking behavior of Hebrew and Japanese text and segmentation behavior of Thai, Lao, and other similar languages.

In January 2012, the other portions of Unicode 6.1 were released: the Unicode Standard Annexes, code charts, and the Unicode Character Database, to allow vendors to update their implementations of Unicode 6.1 as quickly as possible. The release of the core specification completes the definitive documentation of the Unicode Standard, Version 6.1.

For more information on all of The Unicode Standard, Version 6.1, see http://www.unicode.org/versions/Unicode6.1.0/ .

Wednesday, April 4, 2012

Unicode CLDR Survey Tool now open for data submissions

April 4, 2012 — The Unicode CLDR Survey Tool is now open for data submissions for Version 22. Organizations and individuals are invited to help contribute translations to this repository.

CLDR provides key software building for the world's languages, with the largest and most extensive standard repository of locale data available. That repository is used in a wide variety of products, including most smart phones.

The survey tool (http://cldr.org/index/survey-tool) is used to submit translations to this repository, and to vote on others’ translations. For Version 22, the survey tool has undergone substantial revision, with dramatic improvements in performance and usability.

The data submission phase is scheduled to run from now until May 30, 2012, after which the vetting stage will begin. During the vetting stage, users can vote on translations, and correct new translations, but cannot otherwise enter translations.

If you have used the survey tool in a previous release of CLDR, your login ID and

password are still active. Otherwise you will need to set up a new account; please see the account instructions (http://cldr.org/index/survey-tool/accounts).

Friday, March 30, 2012

Call for Participation! - The 36th Internationalization & Unicode Conference - October 22-24, 2012

The Internationalization and Unicode Conference (IUC) is the premier event covering the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.

The Program Committee is soliciting proposals for presentations that describe cases studies, best practices, effective software design, innovative technology, or important standards. Tutorial presentations are also welcome. Suitable topics include, but are not limited to:

Application Areas
  • Designing software platforms, operating systems, software as a service (SAAS), or programming environments
  • Social networks
  • Search engines, SEO, discovery and navigation best practices
  • Websites and web services
  • Libraries and education
  • Mobile applications including iPhone, Android, iPad, Kindle, Windows Mobile, tablets, etc.
  • Publishing and broadcasting for a global audience
  • Internationalized Domain Names and other identifiers
  • Security concerns and practices
  • Voice to text, text to voice
  • Machine translation
  • Unicode, encodings, scripts, character properties, and algorithms

General Techniques
  • Advances in technologies, algorithms or methodologies
  • Using internationalization libraries and programming environments
  • Handling bidirectional or other complex scripts
  • HTML5 and HTML5-based applications
  • Dealing with data formats: XML, JSON, HTML5, DITA, and upcoming standards
  • Project management and methodologies for global development teams e.g. Agile
  • Best practices in localization process and technology
  • Best practices in world-ready development, test, and deployment
  • Improving globalization capabilities within organizations
  • Approaches for migrating legacy applications to global markets
  • Font development and Typography
Culture and Technology
  • Endangered Languages
  • Unencoded Languages
  • Case studies and research on cross-culture communication
  • Digital Divide
  • ISO language tag issues
Regional Considerations
  • Languages of Africa, Asia, and the Middle East
  • Locales and the Unicode Common Locale Data Repository (CLDR)
  • Emoji support

Tutorial presenters receive complimentary conference registration, and two nights lodging. Session presenters receive a fifty percent conference discount and two nights lodging.

To be considered as a presenter for the conference, please submit a brief abstract by the deadline of Friday, May 18th.

The Program Committee will notify authors by Friday, June 1st. Final presentation materials will be required from selected presenters by Friday, August 3.

Wednesday, March 21, 2012

Unicode Releases Common Locale Data Repository, Version 21.0.1

The Unicode CLDR 21.0.1 maintenance release is now available. See http://cldr.unicode.org/index/downloads/cldr-21-0-1 for details.
The next major release is CLDR 22, scheduled for late August. The CLDR 22 release does involve general data submission, which will begin soon. For the latest schedule, see http://cldr.unicode.org .

Unicode CLDR Survey Tool Beta

March 21, 2012 — The Unicode CLDR Survey Tool is open for beta testing today. CLDR provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. The survey tool is an online tool used by organizations and individuals to contribute data to this repository, and to vote on alternative contributions.



The survey tool has undergone substantial revision, with dramatic improvements in performance and usability. We would appreciate people trying out the tool so that we can identify any remaining problems before we start data submission (currently scheduled for April 4). For more information, see http://goo.gl/7M1IG.


Access

  1. Production Survey Tool. If you have an existing survey tool account, you can go to the production tool at Production Survey Tool.
  2. Smoke-Test Survey Tool. If you don’t have an account, you can still try out the survey tool using the Smoke Test version. It will create a test account automatically.


So that you can try out the tool as you wish, none of the data you enter during beta is saved.



The Smoke Test tool may be restarted at any time, because it used for development. If you get disconnected when this happens, then refresh your browser: all of your changes should be saved.

Guide

If you haven’t used the survey tool before, you may want to take a quick look at:

  1. http://cldr.unicode.org/index/survey-tool/guide
  2. http://cldr.unicode.org/index/survey-tool/walkthrough

This documentation should be updated in the next few days for some of the changes in UI.

How you can help

Visit and randomly vote and enter changes.
  1. Pick your favorite locale
  2. Visit different Sections of the locale (Code Lists, etc), and different pages in the Section.
    1. Vote for different choices (including Proposed, Others, Abstain)
    2. Change a value (Change column)
    3. Try zooming on different columns (clicking on the following cells)
      1. St (status, eg error/alert)
      2. Draft (the approval status)
      3. Voted (the voting status)
      4. Proposed / Others (particular values)
      5. (Clicking on Code shows some internals, not really user-focused item)
    4. Reset your Coverage Level (at the top). (This changes how many items show).
  1. Verify that data was accepted, or is rejected (appropriately) because of an error.
  2. Periodically, refresh the entire page you are on and verify that items previously added remain visible.
  3. Report any new issues at http://unicode.org/cldr/trac/newticket. (Skip those below). Please include the URL to the page where you found the error.

Known issues

Please read these over so you know what to skip:

  1. Use FireFox/Safari/Chrome, not IE 8 or other browsers.
  2. Some locales, such as English (en), are read-only.
  3. Do not post comments in the ‘forums’ or try “Show Coverage”
  4. Some generated examples use English instead of the local language, or the wrong currency.
  5. The information in the “Code” column will be simplified during the beta process; some rows will move to different sections or pages.
  6. There will be a bookmark on each row, for reference.
  7. Other items may be added to this list during the beta period

What changed?

  1. Page access is 10-30 times faster, depending on the operation.
  2. Items are submitted individually (with Return/Enter), instead of having to submit a whole page.
  3. Pages are not broken up into multiple subpages, simplifying navigation.
  4. Errors and Warnings appear when you submit an item.
  5. There is no “zoom” window; instead, zooming is in-place, with separate versions depending on what part of a row you click on.


Wednesday, March 7, 2012

Proposed Draft UTR #50, "Unicode Properties for Vertical Text Layout" has been updated to revision three. For instructions and information about commenting on this UTR, please see: http://www.unicode.org/review/pri207/

Updates include:
  • Mongolian and Egyptian Hieroglyphs changed to U.
  • Implementation of recent UTC decisions
    • Removal of the East Asian Class property
    • East Asian Orientation renamed East Asian Vertical Orientation
    • New property, Default Vertical Orientation

Tuesday, March 6, 2012

PRI #182: Unicode Regular Expressions: new proposed update

UTS #18, Unicode Regular Expressions provides the foundation for handling Unicode characters in regular expression engines, a key component of many programs and programming languages.

There are significant additions and changes in the new proposed update of this specification, with the addition of Name_Alias matching, matching rules from UAX #44, use of the new Script_Extensions property, new recommended properties, a compact form of \u{...}, alignment of rule RL1.4 with Appendix C, and the incorporation of text for PRI #179.

There are several of review notes requesting feedback on particular issues. Please submit feedback on those and the rest of this document by May 1 for consideration at the UTC meeting starting on May 7. For details, see:

http://www.unicode.org/review/pri182/

PRI #208, #209: Unicode Security: new proposed updates

UTR #36, Unicode Security Considerations and UTS #39, Unicode Security Mechanisms provide guidance and mechanisms to help deal with Unicode Security issues.

There are significant additions and changes in the new proposed updates of these specifications. The definition of Restriction Levels has moved from UTR #36 to UTS #39, which also adds two new conformance clauses and specifications for Restriction Levels and mixed number detection, an amended specification for mixed script detection, and updates for Unicode 6.1.

There are several of review notes requesting feedback on particular issues. Please submit feedback on those and the rest of this document by May 1 for consideration at the UTC meeting on May 7. For details, see:

http://www.unicode.org/review/pri208/
http://www.unicode.org/review/pri209/

Saturday, March 3, 2012

New version of Unicode Ideographic Variation Database released

The Unicode Consortium is pleased to announce the release of version 2012-03-02 of the Unicode Ideographic Variation Database (IVD). This release adds 32 new sequences to the registered Adobe-Japan1 collection, and 8,850 new sequences to the registered Hanyo-Denshi collection. It also introduces a new datafile called IVD_Stats.txt that details Ideographic Variation Sequence (IVS) and Variation Selector (VS) usage for the entire IVD and on a per-collection basis. Details can be found at http://www.unicode.org/ivd/.

Thursday, March 1, 2012

IUC 36: October 22-24, 2012, Santa Clara, CA, USA

The Internationalization and Unicode Conference (IUC) is the premier event covering the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.

Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

This highly rated conference features excellent technical content, industry-tested recommendations and updates on the latest standards and technology. Subject areas include cloud computing, upgrading to HTML5, integrating with social networking software, and implementing mobile apps. This year's conference will also highlight new features in Unicode Version 6.1 and other relevant standards published this year. Reasons to Attend Include:

  • tutorials and sessions for beginners, to train you and your staff on basic practices and implementation techniques for creating international software 
  • learn recommended solutions to difficult problems or sophisticated requirements from industry leaders and experts in attendance 
  • find help from tool and product vendors to get you to market quickly and cost-effectively 

Click here for more information.  

Friday, February 17, 2012

Localization World Unicode workshop, June 2012, Paris

We are pleased to announce that Localization World is organizing a one-day Unicode workshop on Unicode, including an introduction with Richard Ishida and three additional sessions. This will take place on the preconference day, June 4, 2012, in Paris. Richard is an experienced presenter at Unicode conferences, and is well known for his clear and effective presentations.

The Unicode Consortium’s goal is to enable people around the world to use computers in any language. The Consortium is involved in core internationalization specifications at the heart of all modern software, such as the Unicode Standard for character encoding. The Consortium’s involvement in localization is a key extension of this work. The Unicode Consortium maintains and extends the Common Data Locale Repository (CLDR), and in 2011 established the Unicode Localization Interoperability Technical Committee to improve the interoperability of localization data interchange.


For more information, including the program of the June LocalizationWorld Conference, please see http://www.localizationworld.com/lwparis2012/program.php .

Helena Chapman, chair, Unicode Localization Interoperability Technical Committee
Ulrich Henes, Donna Parrish and Daniel Goldschmidt, chair, vice-chairs, Localization World Conference Program Committee

Friday, February 10, 2012

Unicode Releases Common Locale Data Repository, Version 21.0


Mountain View, CA, February 10, 2012 - The Unicode® Consortium announced today the release of a new version of the Unicode Common Locale Data Repository (Unicode CLDR 21.0), providing key building blocks for software to support the world's languages.

Unicode CLDR 21.0 contains data for 193 languages and 170 territories: 528 locales in all. This release did not include a public data submission phase, and focused on improvements to the LDML structure and tools, and consistency of data.

Main features included the updates for Unicode 6.1, a major cleanup of timezone names, date format data, and delimiters (“…” vs „…“ vs „…” vs …); the new BCP47 -t- extension; addition of ordinal categories (1st, 2nd,…), collation reordering (eg, Cyrillic before Latin), multiple numbering systems for a locale, abbreviated numbers (eg, “1.2 B”); and restructuring of Chinese calendar data. For more information on other changes since the 2.0.1 release, see the CLDR 21 Release Note.

Unicode CLDR is by far the largest and most extensive standard repository of locale data. This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; choosing languages or countries by name; transliterating different alphabets; and many others. Unicode CLDR 21 is part of the Unicode locale data project, together with the Unicode Locale Data Markup Language (LDML: http://unicode.org/reports/tr35/). LDML is an XML format used for general interchange of locale data, such as in Microsoft's .NET.

For web pages with different views of CLDR data, see http://cldr.unicode.org/index/charts. For more information about the Unicode CLDR project (including charts) see http://cldr.unicode.org/.

Thursday, February 2, 2012

UTS #10, Unicode Collation Algorithm, Version 6.1 Released

Mountain View, CA, USA – February 2, 2010 – The new version of Unicode Technical Standard #10, Unicode Collation Algorithm has been released, updating to Unicode Version 6.1.
This new version adds a number of features:
  • The collation ordering for the 732 new Unicode characters.
  • A major revision to the ordering of "variable" characters into groups, separating punctuation and symbols. This change may present migration issues for some implementations.
  • Options added for ignoring spaces and punctuation (but not symbols), and for reordering groupings of characters, such as putting Latin characters before Greek (for Greek users), or digits after letters.
There are also important improvements in documentation:
  • A new section on asymmetric search (where a query of the base character 'e' matches é, è,…, but a query of the more specific é doesn't match other accented versions or the base character).
  • Important restructuring and clarifications of other sections.

Wednesday, February 1, 2012

UTS #46, Unicode IDNA Compatibility Processing, Version 6.1 Released

Mountain View, CA, USA – February 1, 2010 – The new version of Unicode Technical Standard #46, Unicode IDNA Compatibility Processing has been released, updating to Unicode Version 6.1. It adds support for 528 additional characters in internationalized domain names (IDN).
The specification provides two main features for use with the internationalized domain names specification released in August 2010 (IDNA2008):
  1. A comprehensive mapping to reflect user expectations for casing and other variants of domain names. This mapping is allowed by IDNA2008, and follows the same principles as in the previous version of that specification (IDNA2003). It thus provides users consistency between old and new versions.
  2. A compatibility mechanism that supports internationalized domain names valid under the IDNA2003 specification and the IDNA2008 specification. This second feature allows browsers, search engines, and other clients to handle both old and new domain names during the transitional period until registries update their rules to follow IDNA2008.
UTS #46 supplies normative data tables that are synchronized with the latest version of the Unicode Standard, allowing implementations to update without recalculation. This new version also provides an "NV8" flag in the data files, making it easier for implementations to disable the compatibility mechanism.

Tuesday, January 31, 2012

Announcing the Unicode Standard, Version 6.1

Mountain View, January 31, 2012. The Unicode Consortium announces the release of Version 6.1 of the Unicode Standard, continuing Unicode's long-term commitment to support the full diversity of languages around the world. This latest version adds characters to support additional languages of China, other Asian countries, and Africa. It also addresses educational needs in the Arabic-speaking world. A total of 732 new characters have been added. For full details, see http://www.unicode.org/versions/Unicode6.1.0/.

This version of the Standard also brings technical improvements to support implementers. Improved changes to property values and their aliases mean that properties now have easy-to-specify labels. The new labels combined with a new script extensions property means that regular expressions can be more straightforward and are easier to validate.

Over 200 new Standardized Variants have been added for emoji characters, allowing implementations to distinguish preferred display styles between text and emoji styles. For example:

26FA FE0E U+26FA+U+FE0E/ TENT text style
26FA FE0F U+26FA+U+FE0F/ TENT emoji style
26FD FE0E U+26FD+U+FE0E/ FUEL PUMP text style
26FD FE0F U+26FD+U+FE0F/ FUEL PUMP emoji style

Among the notable property changes and additions in Unicode 6.1 are two new line break property values, which improve the line-breaking behavior of Hebrew and Japanese text. Segmentation behavior was also improved for Thai, Lao, and similar languages.

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and have updates for Version 6.1. These will be finalized in February:
  • UTS #10, Unicode Collation Algorithm
  • UTS #46, Unicode IDNA Compatibility Processing

Friday, January 6, 2012

Release candidate for Unicode 6.1 character data

Because Unicode is at the foundation of all modern software using text, it is important to verify that problems are not introduced with new versions. If your implementation uses Unicode data, please download and test the final release candidate of the Unicode 6.1 data (UCD) with your implementation now. Please note that the Unicode Collation Algorithm (UCA) and the Unicode IDNA Compatibility Processing are correlated with version 6.1; if you have an implementation of them, please check the data below as well.

That data can be found in:
  1. Unicode
    1. http://unicode.org/Public/6.1.0/ucd/ (data, semicolon-delimited)
    2. http://unicode.org/Public/6.1.0/ucdxml/ (data, xml)
    3. http://www.unicode.org/reports/tr44/proposed.html (documentation)
  2. UCA
    1. http://unicode.org/Public/UCA/6.1.0/ (data)
    2. http://www.unicode.org/reports/tr10/proposed.html (documentation)
  3. IDNA compatibility
    1. http://unicode.org/Public/idna/6.1.0/ (data)
    2. http://www.unicode.org/reports/tr46/proposed.html (documentation)
For more information, see http://unicode.org/versions/beta.html.
Note that at this point in the process, no substantive changes can be made unless:
  1. a problem is found in carrying out the actions directed by the Unicode Technical Committee for the release, or
  2. an editorial problem is found in the data comments or documentation.
The Unicode Consortium is planning to move up the release date of Unicode 6.1 (UCD and UAXes) to January instead of February, so any final comments should be made by January 6th. You can send your comments using the Contact Form (http://www.unicode.org/reporting.html).

The draft code charts for Unicode 6.1 have also been updated. We encourage users to check the code charts carefully to verify correctness of the new characters added to Unicode 6.1 and to ensure that there are no regressions in glyph shapes for previously encoded characters. For links to the charts, see http://unicode.org/versions/beta.html.