The proposed update of UTS#10 Unicode
Collation Algorithm (UCA) modifies the specification for certain edge cases
(overlapping contractions), and tightens the requirements for well-formed
collation element tables. The detailed descriptions of parametric tailoring
options have been removed, and now refer to the corresponding section in
LDML.
That section adds new explanations and definitions. There are a number of
improvements, including additional examples, and some rearrangement of text. See
PRI #223
The data has been updated for the
Unicode 6.2 beta review, and the associated CollationAuxiliary.txt file in
CollationAuxiliary.zip now
includes a description of the implicit fractional weight generation and the
context syntax. For more details, see
Modifications.
There is also a proposed update of UTS
#46 Unicode IDNA Compatibility Processing. The data has been updated for the
Unicode 6.2 beta review, with minor changes to the text. See
PRI #224
Tuesday, June 26, 2012
Monday, June 25, 2012
Using the Unicode Glossary
The Unicode glossary is useful for people doing documents,
specifications, and general-purpose articles. Each of the glossary
entries now has a link on it, and clicking on that link exposes it in
the address bar of your browser. This makes it easy to add links directly to the Unicode
glossary for terms that may be unfamiliar to readers, such as
http://unicode.org/glossary/#grapheme_cluster or
http://unicode.org/glossary/#code_point.
http://unicode.org/glossary/#grapheme_cluster or
http://unicode.org/glossary/#code_point.
Wednesday, June 13, 2012
Tutorials Announced for IUC 36
Tutorials Announced for 36th Internationalization and
Unicode Conference
Santa Clara, Calif., USA; October 22-24, 2012
Mountain View, CA, USA – June 13, 2012 – The Unicode® Consortium today
announced the tutorial sessions for the Thirty-sixth Internationalization and
Unicode Conference (IUC). IUC 36 will take place in Santa Clara, Calif., USA at
the Hyatt Regency Hotel on October 22-24, 2012, sponsored by
Adobe. This is the premier
conference on technologies and practices for the creation and management of
global and multilingual software applications. For more information about the
program, please visit
http://www.unicodeconference.org/iuc36-tutorials. Santa Clara, Calif., USA; October 22-24, 2012
The Internationalization and Unicode Conference (IUC) covers the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.
Tutorial Sessions Include:
- “An Introduction to Writing Systems & Unicode,” by Richard Ishida, Internationalization Activity Lead, W3C
- “Unicode – A Grand Tour,” by Michael McKenna, International Product Engineer, Zynga, Inc., and Craig Cummings, Globalization Center of Excellence, Rearden Commerce and UTC Vice Chair, Unicode Consortium
- “Internationalizing Domain Names in Applications (IDNA),” by Amit Gupta, Member Technical Staff, Adobe Systems
- “Internationalization, An Introduction (Part I: Character Encoding) (Part II: Enabling),” by Addison Phillips, Globalization Architect, Lab 126
- “Developing an OpenType Font for Complex Scripts Using Fontforge,” by Pravin Dinkar Satpute, Senior Software Engineer, Red Hat
- “I18N in Javascript with iLib,” by Edwin Hoogerbeets, Independent Globalization Consultant
- “Keyboard Design for Tavultesoft Keyman and Unicode,” by Marc Durdin, CEO, Tavultesoft Pty Ltd
- “Web Internationalization – Standards and Best Practices,” by Tex Texin, Chief Globalization Architect, Rearden Commerce, Inc.
- “Using ICU Workshop,” by Steven R. Loomis, Software Engineer, IBM
- “Internationalization and Localization in Ruby and Ruby on Rails,” by Martin J. Dürst, Professor, Aoyama Gakuin University
- “The Road to World-Class Starts with World-Ready,” by Michael Kuperstein, Localization Engineer and Loïc Dufresne de Virel, Localization Strategist, Intel Corporation
- “Building Multilingual Websites in Drupal 7 and Joomla 2.5,” by Jim DeLaHunt, Principal, Jim DeLaHunt & Associates
###
About the Unicode ConsortiumThe Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards.
The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, Rearden Commerce, SAP, The Society for Natural Language Technology Research, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.
For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.
About the Event Producer
OMG® is the Event Producer for the Internationalization & Unicode Conferences. OMG is an open membership, not-for-profit consortium that produces and maintains computer industry specifications for interoperable enterprise applications. Our specifications include MDA®, UML®, CORBA®, MOF™, XMI® and CWM™. OMG’s specifications are all available for download by everyone without charge.
For more information about OMG, visit us online at http://www.omg.org.
Thursday, June 7, 2012
CLDR 21.0.2: New T Extensions for language/locale identifiers
New T Extension fields and subfields [RFC 6497] are now available for use in BCP47 and Unicode Locale/Language Identifiers.
These T extensions provide for the identification of transforms that
can be used for tagging content or requesting resources. The new T
extension fields and subfields are defined in the following files, as
part of the CLDR 21.0.2 release:
- i0 - Input Method Transformations: bcp47/transform_ime.xml
- k0 - Keyboard Transformations: bcp47/transform_keyboard.xml
- t0 - Machine Translations: bcp47/transform_mt.xml
- x0 - Private Use fields: bcp47/transform_private_use.xml
- "zh-t-i0-pinyin", to indicate Chinese text generated with a pinyin input method
- "en-t-k0-dvorak", to identify a Dvorak keyboard for English
- "it-t-k0-osx-extended", to request an extended Mac keyboard for Italian
- "ru-t-en-x0-mobile", to indicate a translation from English to Russian for use on a mobile device, or
- "ja-t-de-t0-und-x0-medical", to identify a machine translation from German to Japanese with a specialized dictionary for medical terms.
Related to this, there is draft keyboard layout data currently slated for CLDR 22.0: see Draft Keyboard Charts.
Wednesday, June 6, 2012
PRI #231: Bidi Parenthesis Algorithm
The Unicode Technical Committee is seeking feedback on a proposal to
enhance the Unicode Bidirectional Algorithm (UAX #9) with additional
logic--a bidirectional parenthesis algorithm (BPA)--for processing
paired punctuation marks such as parentheses. This proposal is intended
to produce better bidi-layout results in common text sequences that
involve paired punctuation marks. Details of the proposal, with
questions for reviewers and a detailed background document are available
through the PRI #231 page:
http://www.unicode.org/review/pri231/
http://www.unicode.org/review/pri231/
PRI #229: Linebreaking Changes for Pictographic Symbols
The UTC is proposing changes to the line break property of many
pictographic characters. Details of the proposed changes are on the PRI
#229 page and its associated background document.
Please see: http://www.unicode.org/review/pri229/
Please see: http://www.unicode.org/review/pri229/
Tuesday, June 5, 2012
New Public Review Issues: Changes to Character Properties
The UTC is considering some property changes for Unicode 6.2. PRI #227
proposes changes to the Script Extensions property values for certain
combining marks. PRI #228 proposes changes of General Category for some
common characters from Punctuation to Symbol, to better align with
expectations about how those characters behave. Because these characters
are quite common, the proposed change may impact a large number of
implementations.
For details of the proposals, see the PRI pages:
http://www.unicode.org/review/pri227/
http://www.unicode.org/review/pri228/
For details of the proposals, see the PRI pages:
http://www.unicode.org/review/pri227/
http://www.unicode.org/review/pri228/
Monday, June 4, 2012
New Public Review Issues: Changes to the Unihan database
The UTC is considering two changes relating to properties for the Unihan
database. PRI #225 proposes transforming the data for the kHanyuPinlu
field so that it also uses accented pinyin. PRI #226 proposes
deprecation of the kCompatibilityVariant field.
For details of these see the PRI pages:
http://www.unicode.org/review/pri225/
http://www.unicode.org/review/pri226/
For details of these see the PRI pages:
http://www.unicode.org/review/pri225/
http://www.unicode.org/review/pri226/
Friday, June 1, 2012
Unicode 6.2 Beta Review
The Unicode® Consortium today announced the start of beta review for the forthcoming Unicode 6.2.0. All beta feedback must be submitted by July 23, 2012.
Unicode 6.2 is a minor release of the Unicode Standard. The main feature of this release is the inclusion of the newly encoded Turkish lira symbol. However, there are other important changes to Unicode properties and annexes, affecting segmentation and more.
Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML,...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard. Software developers and other experts are strongly encouraged to review the beta data files and documentation for Unicode 6.2.0 carefully, and to provide any feedback regarding errors or other issues to the Unicode Consortium. Software developers can also get an early start in testing their programs with the beta data files so they will be ready for the release of Unicode 6.2.0 in September 2012.
• See http://www.unicode.org/versions/beta-6.2.0.html, for information about testing the 6.2.0 beta.
• See http://www.unicode.org/versions/Unicode6.2.0/ for the current draft summary of Unicode 6.2.0.
Unicode 6.2 is a minor release of the Unicode Standard. The main feature of this release is the inclusion of the newly encoded Turkish lira symbol. However, there are other important changes to Unicode properties and annexes, affecting segmentation and more.
Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML,...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard. Software developers and other experts are strongly encouraged to review the beta data files and documentation for Unicode 6.2.0 carefully, and to provide any feedback regarding errors or other issues to the Unicode Consortium. Software developers can also get an early start in testing their programs with the beta data files so they will be ready for the release of Unicode 6.2.0 in September 2012.
• See http://www.unicode.org/versions/beta-6.2.0.html, for information about testing the 6.2.0 beta.
• See http://www.unicode.org/versions/Unicode6.2.0/ for the current draft summary of Unicode 6.2.0.