The Unicode Blog: 2013

Monday, December 23, 2013

Save the Date for IUC 38 - Nov 3-5, 2014

Nov 3-5, 2014, Santa Clara, CA, USA

The Internationalization and Unicode Conference (IUC) is the premier event covering the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.
Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.
This highly rated conference features excellent technical content, industry-tested recommendations and updates on the latest standards and technology. Subject areas include cloud computing, upgrading to HTML5, integrating with social networking software, and implementing mobile apps.

Reasons to Attend Include:

tutorials and sessions for beginners, to train you and your staff on basic practices and implementation techniques for creating international software
learn recommended solutions to difficult problems or sophisticated requirements from industry leaders and experts in attendance
find help from tool and product vendors to get you to market quickly and cost-effectively

http://www.unicodeconference.org/e/IUC38-SaveDate-12-20-13.htm

Friday, December 13, 2013

Unicode 7.0 Annexes Available for Early Review

As technical work gets underway to prepare the publication of Unicode 7.0 (tentatively scheduled for June, 2014), the Unicode Technical Committee has posted proposed updates for several important specifications:

PRI #260, Proposed Update UTS #10, Unicode Collation Algorithm
PRI #261, Proposed Update UAX #15, Unicode Normalization Forms
PRI #262, Proposed Update UAX #44, Unicode Character Database

In UTS #10, collation weights are discussed more generically, with fewer references to the 16-bit weights used in the DUCET. Section 6.3.2, Large Values for Secondary or Tertiary Weights was merged into Section 6.2, Large Weight Values. In UAX #44, the derivation of the Alphabetic property has been updated and the discussion of @missing in Section 4.2.10 @missing Conventions has been simplified to reflect the revised conventions in the UCD data files, which eliminated special edge cases.

Review periods for these new public review issues close January 27, 2014. For details about reviewing and commenting, please see the Public Review Issues page.

http://unicode-inc.blogspot.com/2013/12/unicode-70-annexes-available-for-early.html

Tuesday, December 10, 2013

PRI #259: Combined registration of the Moji_Joho collection and of sequences in that collection

The Unicode Consortium has posted a new issue for public review and comment.

Public Review Issue #259: A submission for the "Combined registration of the Moji_Joho collection and of sequences in that collection" has been received by the IVD registrar.

This submission is currently under review according to the procedures of UTS#37, Ideographic Variation Database, with an expected close date of 2014-03-10. Please see the submission page for details and instructions on how to review this issue and provide comments:
http://www.unicode.org/ivd/pri/pri259/

Tuesday, November 19, 2013

Unicode Regular Expressions Updated

Regular expressions are used throughout much of the world's software for matching and manipulating text. UTS #18: Unicode Regular Expressions provides the foundation for the handling of Unicode text in those expressions.

Version 17 of this standard adds the Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type properties, both new in Unicode 6.3, and it expands the guidelines and requirements for support of the Script_Extensions property.

Friday, November 15, 2013

Unicode Security Standard version 6.3 Released

Version 6.3 of UTS #39: Unicode Security Mechanisms has been released. Because the Unicode Standard contains such a large number of characters for the writing systems of the world, caution is necessary to avoid exposing programs and systems to possible security attacks. This document provides mechanisms for reducing the risk of problems, while the associated UTR #36: Unicode Security Considerations describes a variety of security considerations for Unicode and guidelines for dealing with them.

UTS #39 includes a new Restriction Level (Single Script), and a number of clarifications for confusable detection, restriction revels, and optional detection. It also contains a new section describing how the identifier data is generated. That identifier data has been expanded to include certain characters from UAX #31: Unicode Identifier and Pattern Syntax, a few extra characters allowed in IDNA2008 (Internationalized Domain Name Architecture, http://tools.ietf.org/html/rfc5890), and certain characters based on user feedback. The version numbering has also been changed to align with versions of the Unicode Standard.

The associated UTR #36 has some smaller changes. There are a few important corrections, and the addition of new sections discussing security issues with transitivity and idempotence. There are also a few related new FAQ entries on http://www.unicode.org/faq/security.html.

Wednesday, November 13, 2013

Version 6.3 of UTS #46, Unicode IDNA Compatibility Processing

Unicode Technical Standard #46 version 6.3 has been released, synchronized with Unicode 6.3. The data tables are identical with the previous version, with the exception of the 5 new Bidi_Control characters. The table derivation has been modified to forbid Bidi_Control characters, now and in the future: this is consistent with the intent of IDNA2003, and with the treatment of these characters in IDNA2008.

Monday, October 28, 2013

Proposed Updates for UTR #36 and UTS #39 now open for review

The Unicode Technical Committee has posted two new issues for public review and comment.

PRI #257, Proposed Update UTR #36, Unicode Security Considerations
PRI #258, Proposed Update UTR #39, Unicode Security Mechanisms

The data for Unicode Technical Standard #39, Unicode Security Mechanisms, has been updated for Unicode 6.3, and there are some important additional changes to the recommended characters for identifiers. There is a new Restriction Level defined in UTS #39, and textual clarifications and fixes to both UTS #39 and UTR #36.

Unicode Technical Report #36, Unicode Security Considerations, is being updated in conjunction with the update of UTS #39 for Unicode 6.3.

As part of this public review, the UTC is soliciting information about links to relevant articles and blog posts which have a bearing on Unicode-related security issues.

Review periods for the new items close on January 27, 2014.

Tuesday, October 8, 2013

Feedback on repertoire for ISO/IEC 10646:2014 (4th Edition)

ISO/IEC 10646:2014 (4th Edition) is currently in its DIS ballot stage. Additionally, Amendment 1 to ISO/IEC 10646:2014 (4th Edition) is currently in its PDAM ballot stage. Documents showing the Draft Additional Repertoire for ISO/IEC 10646:2014 (4th Edition) and the Draft Additional Repertoire for Amendment 1 to ISO/IEC 10646:2014 (4th Edition) are posted in the UTC Document Register, for reference and feedback.

There is a short window of opportunity to review and comment on these major repertoire additions for the 4th Edition, before the ISO balloting has been completed — at which point no further changes or corrections will be possible. (See http://www.unicode.org/faq/sdos.html for additional information on the stages in ISO standards development.)

The UTC is soliciting feedback on the draft additional repertoire, to help discover any errors in character names, incorrect glyphs, or other problems in the repertoire under ballot. Such feedback will help inform the UTC discussions about its own contribution to the ISO balloting process.

Please see the PRI pages for details and links to the documents for review:

255 Feedback on repertoire for Amendment 1 to ISO/IEC 10646:2014 (4th Edition)
256 Feedback on repertoire for ISO/IEC 10646:2014 (4th Edition)

Please also see the general instructions for Public Review Issues.

Monday, September 30, 2013

Announcing The Unicode Standard, Version 6.3

The Unicode Consortium announces Version 6.3 of the Unicode Standard and with it, significantly improved bidirectional behavior. The updated Version 6.3 Unicode Bidirectional Algorithm now ensures that pairs of parentheses and brackets have consistent layout and provides a mechanism for isolating runs of text.

Based on contributions from major browser developers, the updated Bidirectional Algorithm and five new bidi format characters will improve the display of text for hundreds of millions of users of Arabic, Hebrew, Persian, Urdu, and many others. The display and positioning of parentheses will better match the normal behavior that users expect. By using the new methods for isolating runs of text, software will be able to construct messages from different sources without jumbling the order of characters. The new bidi format characters correspond to features in markup (such as in CSS). Overall, these improvements also bring greater interoperability and an improved ability for inserting text and assembling user interface elements.

The improvements come with new rigor: the Consortium now offers two reference implementations and greatly improved testing and test data.

In a major enhancement for CJK usage, this new version adds standardized variation sequences for all 1,002 CJK compatibility ideographs. These sequences address a well-known issue of the CJK compatibility ideographs — that they could change their appearance when any process normalized the text. Using the new standardized variation sequences allows authors to write text which will preserve the specific required shapes of these CJK ideographs, even under Unicode normalization.

Version 6.3 includes other improvements as well:

Improved Unihan data to better align with ISO/IEC 10646
Better support for Hebrew word break behavior and for ideographic space in line breaking

Get started with Unicode 6.3 today! http://www.unicode.org/versions/Unicode6.3.0/

Wednesday, September 18, 2013

CLDR Version 24 released

Unicode CLDR 24 has been released, providing an update to the key building blocks for software supporting the world's languages. This data is used by a wide spectrum of companies for their software internationalization and localization, adapting software to the conventions of different languages for such common software tasks.
Unicode CLDR 24 focused on additional structure for formatting units, dates, and times, and improving data coverage. This version contains data for 238 languages and 259 territories—740 locales in all. Ten languages were added to the 100%-modern-coverage list for a total of 70 languages. Between the new languages, and the new structure, more data was entered than in any previous release.

The new structure focused primarily on formatting of units and improvements to date and time formatting.

fractional plural forms. major extension to handle fractions (eg, some languages use the equivalent of “1.2 teaspoons” but “2.1 teaspoon”)
measurement units. many additional unit types (“10.3 kg”), in up to 6 plural forms per language
compound units. video length: "23 hrs, 7 mins", or "23:07"
dates/times. new relative fields such as "last Sunday", and "now"; 12 hour time formats that omit "am/pm"; neutral eras ("405 BCE"); additional timezone falback regional patterns ("{city} Daylight Time")
number formatting. exponential notation (1.42×10²³), at-least ("99+"), ranges ("3.5-4.5 kg"), narrow currency symbols (both "US$12.23" and "$12.23").
collation. major simplification of rule syntax, updated root files to Unicode 6.3; preliminary version of European Ordering Rules; documentation of the CLDR Collation Algorithm (extending UCA)
JSON. improved support, including new structure and data.

In addition, the data already present from CLDR v23 was reviewed for the supported languages, and many improvements made.

Details of coverage improvements and new features are provided in http://cldr.unicode.org/index/downloads/cldr-24, along with a detailed Migration section.

About the Unicode Consortium
The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Sultanate of Oman MARA, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium:
http://www.unicode.org/contacts.html.

Monday, September 16, 2013

Henry Luce Foundation Grant to Unicode in Support of Encoding Tangut

The Consortium is very pleased to announce the generous grant made by the Henry Luce Foundation to support progress on encoding Tangut. The Luce Foundation has made a one-time grant to the Unicode Consortium to support a December 2013 meeting to further progress the Tangut script for its eventual incorporation into the Unicode Standard and the associated ISO/IEC 10646 International Standard. The meeting will bring together scholars of Tangut and experts in the character encoding process to agree on the character repertoire for this large and complex script. Work on this grant is directed by Dr. Deborah Anderson, Technical Director of the Consortium and the Project Leader of the UC Berkeley's Script Encoding Initiative.

Wednesday, September 4, 2013

UTR #50, Unicode Vertical Text Layout, now published

The Unicode Consortium is pleased to announce the first published version of UTR #50, Unicode Vertical Text Layout. Up until now, vertical text layout has been challenging for implementers due to the somewhat ambiguous nature of character orientation, and often relied on information that is buried deep within font formats. This Unicode Technical Report lays the groundwork for a non-ambiguous vertical text layout model that can serve a broad range of environments. Details can be found at http://www.unicode.org/reports/tr50/.

For general information about Unicode Technical Reports, please see http://www.unicode.org/reports/about-reports.html

Friday, June 28, 2013

Testing the Unicode Bidirectional Algorithm for Unicode 6.3

Unicode Standard Annex #9, Unicode Bidirectional Algorithm (UBA), has a major update slated for release in September, 2013. This update is the most significant change in Unicode 6.3. The changes to the algorithm and text have been already been approved by the Unicode Technical Committee, subject to final editorial review.

The Unicode Technical Committee is encouraging implementations to test their code against the new test files and the two reference implementations during the month of July, 2013. It is vital that the interpretation of the text of the specification in UAX #9 be absolutely clear, and that the values in the test data be thoroughly tested by at least two implementations before release, because any changes after release—even to fix problems—can cause significant interoperability problems. The UBA is used for displaying all Arabic and Hebrew text on the web and in application programs, so there are significant ramifications for any changes to the algorithm.

The proposed update to UAX #9 involves a substantial extension of the UBA to allow for the implementation of isolate runs, introducing new Bidi_Class property values and formatting characters in support of that extension. There are also changes to Section 3.3.5, Resolving Neutral and Isolate Formatting Types to resolve paired punctuation marks as a unit. For details, see http://www.unicode.org/reports/tr9/tr9-28.html.

For further information about the review see http://www.unicode.org/review/pri254/.

Thursday, May 16, 2013

CLDR Version 23.1 Released

May 15, 2013 — Unicode CLDR 23.1 has been released, providing an update to the key building blocks for software supporting the world's languages. Unicode CLDR 23.1 contains data for 215 languages and 228 territories—657 locales in all. Unicode CLDR 23.1 is an update release, designed to address specific issues that have arisen since the initial publish of CLDR 23. The release also contains enhancements and fixes to the JSON conversion utility, as well as a downloadable sample of JSON format data for CLDR.

For more details on the contents of CLDR 23.1, please refer to the release note located at:

http://cldr.unicode.org/index/downloads/cldr-23-1

Wednesday, May 1, 2013

Unicode CLDR 24 Survey Tool is open for Data Submission

May 1, 2013 — The Unicode CLDR 24 Survey Tool is open for data submission starting today. CLDR provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. The survey tool is an online tool used by organizations and individuals to contribute data to this repository, and to vote on alternative contributions. For a complete description of the new enhancements and features in the CLDR survey tool, please see http://cldr.unicode.org/index/survey-tool/whats-new

If you do not have a CLDR survey tool account and would like information on how you or your organization can contribute data to the CLDR project, please see http://cldr.unicode.org/index/survey-tool/accounts

Friday, April 26, 2013

Draft UTR #50, Unicode Vertical Text Layout

Draft UTR #50, Unicode Vertical Text Layout is now available for public review and comment. The main changes in this version from the previous proposed draft:

The intended interaction with document formats has been clarified.
The guidelines for the assignment of a property value have been clarified.
A number of characters have been assigned a different property value.

Friday, April 19, 2013

Proposed Update UTS #18, Unicode Regular Expressions

UTS #18, Unicode Regular Expressions, is being updated to bring it into alignment with Unicode 6.3. Two new bidi-related properties introduced in Unicode 6.3 for the Unicode Bidirectional Algorithm are being added. In addition, the discussion of script extensions is being extended and clarified.

The proposed update document is now available for review and comment. See Public Review Issue #252.

Thursday, April 18, 2013

Membership Fee Changes

The Unicode Consortium is announcing an increase in the Full membership fee. As of June 1, 2013, the annual Full membership fee will increase from $15,000 to $18,000. This fee increase will enable the Consortium to continue its mission to enable people around the world to use computers in any language by providing freely-available specifications and data.

The Consortium has also added a multi-year discount option for all membership levels. Effective June 1, 2013, any member may pay fees in advance to receive the following discounts:

10 years, 20% discount
5 years, 10% discount
3 years, 6% discount

For further information please contact the Unicode office.

Wednesday, April 17, 2013

Unicode CLDR 24 Survey Tool Beta

April 15, 2013 — The Unicode CLDR Survey Tool is open for beta testing today. CLDR provides key building blocks for software to support the world's languages, with the largest and most extensive standard repository of locale data available. The survey tool is an online tool used by organizations and individuals to contribute data to this repository, and to vote on alternative contributions.

The survey tool has again undergone substantial revision, with dramatic improvements in usability. We would appreciate people trying out the tool so that we can identify any remaining problems before we start data submission for CLDR 24 on April 24th.

For more information, including how to login, try it out, and supply feedback, see Survey Tool Beta.

Monday, April 15, 2013

UTC Document Register Now Public

The Unicode Technical Committee (UTC) document register is now freely available for public access. This change has been made to increase public involvement in the ongoing deliberations of the UTC in its work developing and maintaining the Unicode Standard and other related standards and reports. Open access to the document register makes it easier to search both current and historical documents for topics of interest, using widely available search engines. The UTC document register contains online documents dating back to 1997 and online registers for paper document distributions dating back to 1991.
http://www.unicode.org/L2/all-docs.html

Friday, March 15, 2013

CLDR Version 23 Released

Unicode CLDR 23 has been released, providing an update to the key building blocks for software supporting the world's languages.

Unicode CLDR 23.0 contains data for 215 languages and 227 territories—654 locales in all. This release focused primarily on improvements to the LDML structure and tools, and on consistency of data. It includes substantially improved support for non-Gregorian calendars (such as the Japanese Imperial calendar used extensively in Japan). The data and structure has also been modified to easily permit changing between 12 and 24 hour formats, and between 2 digit and 4 digit years. The new Unicode character is used for the Turkish Lira, and information is provided for currencies that round to 5 cents (or other subunits) in cash transactions. For most languages that use non-Latin scripts, characters in the language’s script now collate before those in other scripts (including A-Z). Language-specific letter-casing changes (Lower, Upper, Title) have been added for Azerbaijani, Greek, Lithuanian, and Turkish. Keyboard data has also been updated for Android. Also, as of this release, the LDML specification is split into multiple parts, each focusing on a particular area.

The release had a short cycle so that we could move to the new regular semi-annual schedule. It thus only included a limited data submission phase, for 4 languages only: Armenian (hy), Georgian (ka), Mongolian (mn), and Welsh (cy). For those languages, the data increased by over 100%.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, Google, Government of Andhra Pradesh, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Tamil Virtual University, The University of California (Berkeley), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members. For more information, please contact the Unicode Consortium http://www.unicode.org/contacts.html.

Tuesday, March 12, 2013

Unicode 6.3 Beta Review

The Unicode® Consortium today announced the start of beta review for the forthcoming Unicode 6.3.0. All beta feedback must be submitted by April 29, 2013.

The main feature of Unicode 6.3 is the update of the Unicode Bidirectional Algorithm and five newly-encoded bidirectional format control characters: U+061C ARABIC LETTER MARK and the isolate span controls U+2066..U+2069. This version also rolls in various minor corrections for errata and other small updates for the Unicode Character Database.

Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smart phones; modern web protocols (HTML, XML,...); and internationalized domain names. Thus it is important to ensure a smooth transition to each new version of the Unicode Standard. Software developers and other experts are strongly encouraged to review the beta data files and documentation for Unicode 6.3.0 carefully, and to provide any feedback regarding errors or other issues to the Unicode Consortium. Software developers can also get an early start in testing their programs with the beta data files so they will be ready for the release of Unicode 6.3.0 in June, 2013.

• See http://www.unicode.org/versions/beta-6.3.0.html, for information about testing the 6.3.0 beta.
• See http://www.unicode.org/versions/Unicode6.3.0/ for the current draft summary of Unicode 6.3.0.

Wednesday, March 6, 2013

In Memoriam page for Unicode contributors

Unicode is a project that has been built by hundreds of people over many decades. Some people involved in this project are no longer with us, and we wish to remember their contributions: http://www.unicode.org/consortium/memoriam.html

Tuesday, March 5, 2013

Specifying Optional Conjuncts in Malayalam

The UTC has posted a new Public Review Issue regarding a proposal to specify optional conjuncts in Malayalam.

In Malayalam there are two prevailing orthographies, traditional and reformed. Both are written using the same Malayalam character set. The difference between them is typically manifested only by the font. Traditional orthography accommodates more full conjuncts, while the reformed orthography would use visible virama (Chandrakkala) separated sequences for many of those full conjuncts.

This proposal specifies the further use of ZWJ and ZWNJ in sequences in the Malayalam script to indicate preferences for optional display of conjuncts. Such sequences are intended to indicate the preferences, both for rendering systems that support the reformed Malayalam orthography and for systems that support the traditional Malayalam orthography.

The UTC is seeking feedback on this proposal, regarding its advisability and potential impacts on implementations, as well as any suggestions for alternative approaches to the issues raised in the background document.

Friday, March 1, 2013

New FAQ on Private-use Characters, Noncharacters and Sentinels

A new FAQ page devoted to the topic of private-use characters, noncharacters, and sentinels has been posted on the Unicode web site. This FAQ aims to clear up confusion about whether noncharacters are permitted in Unicode text, and how they differ from ordinary private-use characters. The recently published Corrigendum #9: Clarification About Noncharacters makes it clear that noncharacters are permitted even in interchange, and the new FAQ page addresses some of the fine points about their usage and about differences from other types of Unicode code points. The brief mentions of noncharacters in other FAQ pages have also been updated accordingly.

Are you unclear about what Unicode “noncharacters” even are? The new FAQ page also answers basic questions about noncharacters and private-use characters, and provides a bit of history about how they came to be part of the Unicode Standard.

Friday, February 22, 2013

Be a Part of IUC 37! Call for Participation

SUBMISSION DEADLINE: Friday, March 29th

| Submit Abstract Form |

Do you have knowledge or experience with creating global software that will benefit others? Join other experts and industry leaders and present your ideas at The Thirty-seventh Internationalization & Unicode® Conference (IUC 37), taking place in Santa Clara, Calif., USA; October 21-23, 2013. This is the premier conference on technologies and practices for the creation and management of global and multilingual software solutions.

The Unicode Consortium hosts this event annually, and the conference is recognized for its excellent technical content, industry-tested recommendations and updates on the latest standards. Topics from previous conferences can be found on the IUC 37 website.

Submit your proposals for presentations or tutorials regarding case studies, best practices, innovative technology, or evolving standards. Suitable topics include, but are not limited to:

Application Areas

•	Designing software platforms, operating systems, software as a service (SAAS), or programming environments
•	Social networks
•	Search engines, SEO, discovery and navigation best practices
•	Websites and web services
•	Libraries and education
•	Mobile applications including iPhone, Android, iPad, Kindle, Windows Mobile, tablets, etc.
•	Game, Cable Boxes, and other platforms
•	Publishing and broadcasting for a global audience
•	Security concerns and practices
•	Voice to text, text to voice
•	Machine translation

General Techniques

•	Advances in technologies, algorithms or methodologies
•	Using internationalization libraries and programming environments
•	Handling bidirectional or other complex scripts
•	Locales and the Unicode Common Locale Data Repository (CLDR)
•	Font development and Typography

Managing Global Software Development and Geographically Distributed Teams

•	Project management and methodologies e.g. Agile
•	Best practices in localization process and technology
•	Best practices in world-ready development, testing, and deployment
•	Improving globalization capabilities within organizations
•	Approaches for migrating legacy applications to global markets

Evolving Standards and Related Practices

•	Endangered or Unencoded Languages
•	Case studies and research on cross-culture communication
•	Internationalized Domain Names and other identifiers
•	Languages of Africa, Asia, and the Middle East
•	ISO language tag topics
•	HTML5, CSS3, and modern browser topics
•	Dealing with data formats: XML, JSON, HTML5, DITA, and upcoming standards
•	Unicode, encodings, scripts, character properties, and algorithms
•	Emoji support

Tutorial presenters receive complimentary conference registration, and two nights lodging. Session presenters receive a fifty percent conference discount and two nights lodging.

To be considered as a presenter for the conference, please submit a brief abstract by the deadline of Friday, March 29th.

The Program Committee will notify authors by Friday, May 3rd. Final presentation materials will be required from selected presenters by Friday, July 20th.

Wednesday, February 20, 2013

Corrigendum #9 clarifies noncharacter usage in Unicode

There has been confusion about whether noncharacters were permitted in Unicode text. The new Corrigendum #9: Clarification About Noncharacters makes it clear that noncharacters are permissible even in open interchange, although their intended semantics may not be interpretable in such contexts. The UTF-8, UTF-16, UTF-32 & BOM FAQ has also been updated for clarity, and other informative text about noncharacters will be revised over time, including the Core Specification.

Background. There are 66 noncharacters permanently reserved for internal use, typically used for some sort of internally-defined control function or sentinel value. They should be supported by APIs, components, and applications that handle (i.e., either process or pass through) all Unicode strings, such as a text editor or string class. Where an application does make internal use of a noncharacter, it should take some measures to sanitize input text from unknown sources. The best practice is to replace that particular noncharacter on input by U+FFFD. (The noncharacter should not be simply deleted, since that can cause security problems. For more information, see Section 3.5 Deletion of Code Points in UTR #36, Unicode Security Guidelines.)

Tuesday, February 12, 2013

IUC 37: Save The Date - Oct 21-23, 2013

The Internationalization and Unicode Conference (IUC) is the premier event covering the latest in industry standards and best practices for bringing software and Web applications to worldwide markets. This annual event focuses on software and Web globalization, bringing together internationalization experts, tools vendors, software implementers, and business and program managers from around the world.

Expert practitioners and industry leaders present detailed recommendations for businesses looking to expand to new international markets and those seeking to improve time to market and cost-efficiency of supporting existing markets. Recent conferences have provided specific advice on designing software for European countries, Latin America, China, India, Japan, Korea, the Middle East, and emerging markets.

This highly rated conference features excellent technical content, industry-tested recommendations and updates on the latest standards and technology. Subject areas include cloud computing, upgrading to HTML5, integrating with social networking software, and implementing mobile apps. This year's conference will also highlight new features in Unicode Version 6.1 and other relevant standards published this year.
Reasons to Attend Include:

Tutorials and sessions for beginners, to train you and your staff on basic practices and implementation techniques for creating international software
Learn recommended solutions to difficult problems or sophisticated requirements from industry leaders and experts in attendance
Find help from tool and product vendors to get you to market quickly and cost-effectively

Tuesday, February 5, 2013

Unicode Board Members and Officers

The Unicode Consortium would like to welcome two new board members, Bob Jung and Greg Welch, and a new vice president, Peter Constable.

Bob Jung is the Director of Engineering for Internationalization at Google, Inc. He built and leads the globally distributed team that develops highly scalable technologies and infrastructure used throughout Google to deliver internationalized and localized products. Previously, at Netscape, he built the team that established much of the early work on internationalization for the web and browsers. Even earlier, he helped drive the initial Unix/POSIX internationalization specifications and standards via work with industry consortiums (/usr/grp, Uniforum, Unix International). Prior to Google, Bob worked for Netscape/AOL, Apple, MIPS, Nippon Unisoft and UniSoft.

Greg Welch of Intel Corporation is Director of Strategic Marketing in Intel’s PC Client Group. Among his recent accomplishments has been responsibility for driving the formulation and coordination of Intel’s Ultrabook™ program. Previous positions at Intel include:

Director, Intel’s Architecture Group, Global WIMAX Organization: responsible for business development relationships between Intel, Clearwire, Best Buy and OEMs to promote the world’s first national 4G network.
Director of Strategy and Industry Initiatives in Intel’s Software and Solutions Group: drove Intel’s efforts to enable software for multi-core architectures.
Director of Strategic Planning for Intel's Mobile Platforms Group: oversaw long-range roadmap planning and business strategy for all notebook platform, processor, and chipset products that became the Core® family of processors.
Director of Brand Strategy: spearheaded the segmentation of Intel’s processor brands including the Itanium® and Xeon® brands for high-end server products, and the Celeron® brand for value PCs.

Peter Constable is Senior Program Manger at Microsoft. He was exposed to challenges of supporting non-Latin scripts in software systems and digital fonts while living in Thailand for five years. He began working on software internationalization in 1996 and became active in work on Unicode and other i18n standards activities shortly thereafter. Since 2003, he has worked for Microsoft on Unicode and support and international text display. He has long been active in the UTC, became a Unicode technical director in 2008, and has been the Unicode liaison to SC2 since 2007.

The Unicode Consortium would like to thank Vint Cerf and Harald Alvestrand, who recently stepped down after many years of contributions as members of the board of directors.

Vinton G. Cerf is vice president and Chief Internet Evangelist for Google. He is responsible for identifying new enabling technologies and applications on the Internet and other platforms for the company. Widely known as a "Father of the Internet," Vint is the co-designer with Robert Kahn of TCP/IP protocols and basic architecture of the Internet. In 1997, President Clinton recognized their work with the U.S. National Medal of Technology. In 2005, Vint and Bob received the highest civilian honor bestowed in the U.S., the Presidential Medal of Freedom. It recognizes the fact that their work on the software code used to transmit data across the Internet has put them "at the forefront of a digital revolution that has transformed global commerce, communication, and entertainment." He served on the board of the Unicode Consortium from 2010 until now.

Harald Alvestrand has worked for Norsk Data, UNINETT (the University Network of Norway), EDB Maxware, Cisco Systems and, since 2006, for Google, Inc. Harald has been active in Internet standardization since 1991, and has written a number of RFCs. He was an area director of Applications and of Operations & Management in the IETF and a member of the IAB before serving as chair of the IETF from 2001 to 2006. He served on the board of the Unicode Consortium from 2001 until now.

The Consortium also would like to thank Vice President Eric Muller, and Technical Directors John Jenkins and Mike Ksar, who recently stepped down from their roles as officers of the Consortium after serving for many years. They will continue to work with the Consortium on ongoing technical work.

Eric Muller is the former chair of INCITS/L2, the U.S. committee which coordinates its work closely with the ongoing work of the Unicode Technical Committee. Eric continues his contributions to the technical work of the Consortium through his work with the Unicode Technical Committee. John Jenkins has worked with the Ideographic Rapporteur Group (IRG) for many years, and continues to provide crucial maintenance and updates for the Unicode Database. Mike Ksar has convened ISO/IEC JTC1/SC2/WG2 for many years, and continues in that capacity.

For the listing of current directors and officers of the Consortium please see Unicode Directors, Officers and Staff. See also Former Board Members and Former Officers.

Tuesday, January 22, 2013

Making UTC Document Register Public

The Unicode Technical Committee (UTC) is making its document register freely available for public access, starting on April 15, 2013. This decision has been taken in the interest of increasing public involvement in the ongoing deliberations of the UTC regarding the development of the Unicode Standard and the other standards and reports that it maintains. Open access to the document register will also make it easier to search the documents, both current and historical, for topics of interest, using widely available search engines. The UTC document register contains online documents dating back to 1997 and online registers for paper document distributions dating back to 1991.

The date for opening up access has been set to April 15 to provide sufficient time for anyone who might have issues concerning this change to raise their concerns to the Unicode Consortium. In particular, any author of a document which was submitted to the UTC under the old rules, with the assumption that the document would be available only to current members of the Consortium for review, who has concerns about that document being made publicly accessible, is encouraged to contact the Unicode Consortium. Please identify precisely the document of concern and the reasons why you might not wish for it to be included in the publicly accessible set. Please note that the change to make the document register publicly accessible does not change anything with regard to copyright status of existing documents – these documents are not being put in the public domain; rather, the UTC is simply removing the requirement for password access to view them.

Thursday, January 3, 2013

Major changes to Unicode for Arabic & Hebrew

UAX #9, Unicode Bidirectional Algorithm, will be updated for Unicode 6.3. The Unicode BIDI algorithm is used for displaying all Arabic and Hebrew text on the web and in application programs, so any changes require careful review.

This proposed update involves a substantial extension of the Unicode Bidirectional Algorithm to allow for the implementation of isolate runs. It also introduces new Bidi_Class property values and formatting characters in support of that extension.

There are also changes to Section 3.3.4 Resolving Neutral Types to resolve paired punctuation marks as a unit. This adds a new rule N0.

See the modifications section of the proposed update for information on specific changes to sections in the document.

The proposed update is available here: http://www.unicode.org/reports/tr9/tr9-28.html