The Unicode Blog: June 2012

Tuesday, June 26, 2012

Proposed updates for Unicode Collation and IDNA

The proposed update of UTS#10 Unicode Collation Algorithm (UCA) modifies the specification for certain edge cases (overlapping contractions), and tightens the requirements for well-formed collation element tables. The detailed descriptions of parametric tailoring options have been removed, and now refer to the corresponding section in LDML. That section adds new explanations and definitions. There are a number of improvements, including additional examples, and some rearrangement of text. See PRI #223

The data has been updated for the Unicode 6.2 beta review, and the associated CollationAuxiliary.txt file in CollationAuxiliary.zip now includes a description of the implicit fractional weight generation and the context syntax. For more details, see Modifications.

There is also a proposed update of UTS #46 Unicode IDNA Compatibility Processing. The data has been updated for the Unicode 6.2 beta review, with minor changes to the text. See PRI #224

Monday, June 25, 2012

Using the Unicode Glossary

The Unicode glossary is useful for people doing documents, specifications, and general-purpose articles. Each of the glossary entries now has a link on it, and clicking on that link exposes it in the address bar of your browser. This makes it easy to add links directly to the Unicode glossary for terms that may be unfamiliar to readers, such as
http://unicode.org/glossary/#grapheme_cluster or
http://unicode.org/glossary/#code_point.

Wednesday, June 13, 2012

Tutorials Announced for IUC 36

Tutorials Announced for 36th Internationalization and Unicode Conference
Santa Clara, Calif., USA; October 22-24, 2012

Mountain View, CA, USA – June 13, 2012 – The Unicode® Consortium today announced the tutorial sessions for the Thirty-sixth Internationalization and Unicode Conference (IUC). IUC 36 will take place in Santa Clara, Calif., USA at the Hyatt Regency Hotel on October 22-24, 2012, sponsored by Adobe. This is the premier conference on technologies and practices for the creation and management of global and multilingual software applications. For more information about the program, please visit http://www.unicodeconference.org/iuc36-tutorials.
The Internationalization and Unicode Conference (IUC) covers the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.
Tutorial Sessions Include:

“An Introduction to Writing Systems & Unicode,” by Richard Ishida, Internationalization Activity Lead, W3C
“Unicode – A Grand Tour,” by Michael McKenna, International Product Engineer, Zynga, Inc., and Craig Cummings, Globalization Center of Excellence, Rearden Commerce and UTC Vice Chair, Unicode Consortium
“Internationalizing Domain Names in Applications (IDNA),” by Amit Gupta, Member Technical Staff, Adobe Systems
“Internationalization, An Introduction (Part I: Character Encoding) (Part II: Enabling),” by Addison Phillips, Globalization Architect, Lab 126
“Developing an OpenType Font for Complex Scripts Using Fontforge,” by Pravin Dinkar Satpute, Senior Software Engineer, Red Hat
“I18N in Javascript with iLib,” by Edwin Hoogerbeets, Independent Globalization Consultant
“Keyboard Design for Tavultesoft Keyman and Unicode,” by Marc Durdin, CEO, Tavultesoft Pty Ltd
“Web Internationalization – Standards and Best Practices,” by Tex Texin, Chief Globalization Architect, Rearden Commerce, Inc.
“Using ICU Workshop,” by Steven R. Loomis, Software Engineer, IBM
“Internationalization and Localization in Ruby and Ruby on Rails,” by Martin J. Dürst, Professor, Aoyama Gakuin University
“The Road to World-Class Starts with World-Ready,” by Michael Kuperstein, Localization Engineer and Loïc Dufresne de Virel, Localization Strategist, Intel Corporation
“Building Multilingual Websites in Drupal 7 and Joomla 2.5,” by Jim DeLaHunt, Principal, Jim DeLaHunt & Associates

MultiLingual Magazine is the media sponsor. The early-bird registration deadline is September 7, 2012. Sponsorships and exhibit space are available; for more information on sponsoring contact Ken Berk at ken.berk@omg.org, +1-781-444 0404. For exhibiting questions email event_marketing@omg.org. For all other questions email info@unicodeconference.org.

###

About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.
The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, Rearden Commerce, SAP, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.
For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.
About the Event Producer
OMG® is the Event Producer for the Internationalization & Unicode Conferences. OMG is an open membership, not-for-profit consortium that produces and maintains computer industry specifications for interoperable enterprise applications. Our specifications include MDA®, UML®, CORBA®, MOF™, XMI® and CWM™. OMG’s specifications are all available for download by everyone without charge.
For more information about OMG, visit us online at http://www.omg.org.

Thursday, June 7, 2012

CLDR 21.0.2: New T Extensions for language/locale identifiers

New T Extension fields and subfields [RFC 6497] are now available for use in BCP47 and Unicode Locale/Language Identifiers. These T extensions provide for the identification of transforms that can be used for tagging content or requesting resources. The new T extension fields and subfields are defined in the following files, as part of the CLDR 21.0.2 release:

i0 - Input Method Transformations: bcp47/transform_ime.xml
k0 - Keyboard Transformations: bcp47/transform_keyboard.xml
t0 - Machine Translations: bcp47/transform_mt.xml
x0 - Private Use fields: bcp47/transform_private_use.xml

For example:

"zh-t-i0-pinyin", to indicate Chinese text generated with a pinyin input method
"en-t-k0-dvorak", to identify a Dvorak keyboard for English
"it-t-k0-osx-extended", to request an extended Mac keyboard for Italian

The private use subfields can be used for private agreements, such as:

"ru-t-en-x0-mobile", to indicate a translation from English to Russian for use on a mobile device, or
"ja-t-de-t0-und-x0-medical", to identify a machine translation from German to Japanese with a specialized dictionary for medical terms.

Related to this, there is draft keyboard layout data currently slated for CLDR 22.0: see Draft Keyboard Charts.

Wednesday, June 6, 2012

PRI #231: Bidi Parenthesis Algorithm

The Unicode Technical Committee is seeking feedback on a proposal to enhance the Unicode Bidirectional Algorithm (UAX #9) with additional logic--a bidirectional parenthesis algorithm (BPA)--for processing paired punctuation marks such as parentheses. This proposal is intended to produce better bidi-layout results in common text sequences that involve paired punctuation marks. Details of the proposal, with questions for reviewers and a detailed background document are available through the PRI #231 page:
http://www.unicode.org/review/pri231/

PRI #229: Linebreaking Changes for Pictographic Symbols

The UTC is proposing changes to the line break property of many pictographic characters. Details of the proposed changes are on the PRI #229 page and its associated background document.

Please see: http://www.unicode.org/review/pri229/

Tuesday, June 5, 2012

New Public Review Issues: Changes to Character Properties

The UTC is considering some property changes for Unicode 6.2. PRI #227 proposes changes to the Script Extensions property values for certain combining marks. PRI #228 proposes changes of General Category for some common characters from Punctuation to Symbol, to better align with expectations about how those characters behave. Because these characters are quite common, the proposed change may impact a large number of implementations.

For details of the proposals, see the PRI pages:
http://www.unicode.org/review/pri227/
http://www.unicode.org/review/pri228/

Monday, June 4, 2012

New Public Review Issues: Changes to the Unihan database

The UTC is considering two changes relating to properties for the Unihan database. PRI #225 proposes transforming the data for the kHanyuPinlu field so that it also uses accented pinyin. PRI #226 proposes deprecation of the kCompatibilityVariant field.

For details of these see the PRI pages:
http://www.unicode.org/review/pri225/
http://www.unicode.org/review/pri226/

Friday, June 1, 2012

Unicode 6.2 Beta Review

The Unicode® Consortium today announced the start of beta review for the forthcoming Unicode 6.2.0. All beta feedback must be submitted by July 23, 2012.

Unicode 6.2 is a minor release of the Unicode Standard. The main feature of this release is the inclusion of the newly encoded Turkish lira symbol. However, there are other important changes to Unicode properties and annexes, affecting segmentation and more.

Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML,...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard. Software developers and other experts are strongly encouraged to review the beta data files and documentation for Unicode 6.2.0 carefully, and to provide any feedback regarding errors or other issues to the Unicode Consortium. Software developers can also get an early start in testing their programs with the beta data files so they will be ready for the release of Unicode 6.2.0 in September 2012.

• See http://www.unicode.org/versions/beta-6.2.0.html, for information about testing the 6.2.0 beta.
• See http://www.unicode.org/versions/Unicode6.2.0/ for the current draft summary of Unicode 6.2.0.