The Unicode Blog: 2010

Wednesday, December 29, 2010

Proposed Update UTS #46, Unicode IDNA Compatibility Processing, Version 6.0.1

The Unicode Consortium has released a proposed update for UTS #46, Unicode IDNA Compatibility Processing, Version 6.0.1. This update is intended to make it easier for implementations to both support IDNA2008, and use the mappings in UTS #46. Those mappings allow implementations to meet user expectations for handling uppercase and lowercase, and other character variants, and maintain compatibility with IDNA2003. The proposed data is found in: http://www.unicode.org/Public/idna/6.0.1/

The proposed draft does not change the UTS #46 status or mapping data for Unicode 6.0 characters; instead, it adds new informative fields to the data file and the conformance test file, fields that provide information as to which characters are allowed under IDNA2008. Because UTS #46 is targeted at client software such as browsers, the conformance tests do not check for the CONTEXTO conditions of IDNA2008, which are optional for client software.

Feedback on the proposed draft is welcome. Of particular interest are independent mechanical verification of the new field values, and feedback as to whether it would be useful to add checks for the CONTEXTO conditions to the conformance tests.

Details of the Public Review Issue are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on January 31, 2011.

If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Monday, December 20, 2010

Galley proofs for chapters 1-7 of the Unicode 6.0 Core Specification now online

Pre-publication versions of Chapters 1-7 of the Unicode Core Specification, Version 6.0, are now available for online viewing at http://www.unicode.org/versions/Unicode6.0.0/ . These pre-publication chapters are in the final copy editing stage and may have minor edits before the final version is published. The final version of the entire core specification will be published in February 2011.

Thursday, December 2, 2010

Unicode Releases Common Locale Data Repository, Version 1.9

Mountain View, CA, December 1, 2010 - The Unicode® Consortium announced today the release of a new version of the Unicode Common Locale Data Repository (Unicode CLDR 1.9), providing key building blocks for software to support the world's languages. The main features of CLDR 1.9 are enhanced collation and transliteration support, new structure, and modifications for data consistency. The details are found in the CLDR 1.9 Release Note (http://cldr.unicode.org/index/downloads/cldr-1-9).

Unicode CLDR is by far the largest and most extensive standard repository of locale data. This data is used by a wide spectrum of companies for their software internationalization and localization: adapting software to the conventions of different languages for such common software tasks as formatting of dates, times, time zones, numbers, and currency values; sorting text; choosing languages or countries by name; transliterating different alphabets; and many others. Unicode CLDR 1.9 is part of the Unicode locale data project, together with the Unicode Locale Data Markup Language (LDML: http://unicode.org/reports/tr35/). LDML is an XML format used for general interchange of locale data, such as in Microsoft's .NET.

For web pages with different views of CLDR data, see http://unicode.org/cldr/charts.html. For more information about the Unicode CLDR project (including charts) see http://cldr.unicode.org .

Saturday, November 20, 2010

New version of Unicode Ideographic Variation Database released

The Unicode Consortium is pleased to announce the release of version 2010-11-14 of the Unicode Ideographic Variation Database. This release adds a new collection, Hanyo-Denshi, with 4,195 sequences in that collection. Details can be found at <http://www.unicode.org/ivd/> and <http://www.itscj.ipsj.or.jp/domestic/sc02/hanyo-denshi/20100331>.

Friday, November 19, 2010

Corrigendum #8 issued for U+070F SYRIAC ABBREVIATION MARK

Corrigendum #8 has been issued, to correct the Bidi_Class for U+070F SYRIAC ABBREVIATION MARK. This corrigendum corrects the Bidi_Class to the value which was intended for Unicode 6.0, so that U+070F will not be separated into distinct directional runs from the other Syriac characters it is used with.

As for other corrigenda, this correction to the Bidi_Class value for U+070F does not modify the content of Unicode 6.0. However, it makes it possible for applications to declare conformance to Unicode 6.0 plus Corrigendum #8, if needed.

For details please see: http://www.unicode.org/versions/corrigendum8.html http://www.unicode.org/standard/versions/components-6.0.0.html#Unicode_6_0_0_With_Corrigendum

Saturday, October 30, 2010

Unicode 6.0 Sorting

Mountain View, CA, USA – October 29, 2010 – The new version of Unicode Technical Standard #10, Unicode Collation Algorithm (UCA), has been updated for Unicode Version 6.0, adding support for 2,088 characters in sorting, searching, and matching. Also in this release new data files for support of the Unicode Common Locale Data Repository (CLDR), which provides customization for different languages.

Reorderable Categories. The data files for CLDR order characters strictly by certain major categories. This allows programmers to parametrically reorder these groups of characters to put them in the desired order for different languages. For example, numbers can be ordered after letters, or Cyrillic before Latin. The reorderable categories are:

whitespace, punctuation, general symbols, currency symbols, and numbers, then Latin, Greek, Coptic, Cyrillic, ..., Egyptian Hieroglyphs, and finally, CJK.

Distinguishing Symbols from Punctuation. UCA provides an option for ignoring certain characters when comparing strings. By default, these are whitespace, punctuation, and general symbols. The data files for CLDR modify that default so that symbols are compared significantly, while still ignoring whitespace and punctuation. Thus, for example, "I♥NY" is not sorted the same as "I☠NY".

Special Database Values. The data files for CLDR provide special weights for two noncharacters:

1. A special noncharacter <HIGH> (U+FFFF) for specification of a range in a database, allowing "Sch" ≤ X ≤ "Sch<HIGH>" to pick all strings starting with "sch" plus those that sort equivalently.

2. A special noncharacter <LOW> (U+FFFE) for merged database fields, allowing "Disílva<LOW>John" to sort next to "Disilva<LOW>John".

The version of CLDR using these new data files is planned for release at the start of December, 2010.

The text of the UCA standard has been clarified in different areas. Implementers should pay special attention to the changes regarding ill-formed sequences, noncharacters, and unassigned code points in CJK blocks.

For more information, see:

* The UCA Standard 6.0.0: http://www.unicode.org/reports/tr10/
* The UCA charts: http://unicode.org/charts/collation/
* The UCA data: http://unicode.org/Public/UCA/6.0.0/
* Merged database fields: http://unicode.org/reports/tr10/#Interleaved_Levels

About The Unicode Consortium

Members are: Adobe, Apple, Google, Government of Bangladesh, Government of India, IBM, Microsoft, Monotype Imaging, Oracle, The Society for Natural Language Technology Research, SAP, The University of California (Berkeley), The University of California (Santa Cruz), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.

For more information, please contact the Unicode Consortium. http://www.unicode.org/contacts.html

Unicode 6.0 Internationalized Domain Names

Mountain View, CA, USA – October 29, 2010 – The new version of Unicode Technical Standard #46, Unicode IDNA Compatibility Processing, has been updated for Unicode Version 6.0, adding support for 2,088 characters in internationalized domain names (IDN).

The specification provides two main features for use with the new specification for internationalized domain names released in August 2010 (IDNA2008):

1. A comprehensive mapping to reflect user expectations for casing and other variants of domain names. This mapping is allowed by IDNA2008, and follows the same principles as in the previous version of that specification (IDNA2003, in force from 2003 until August). It thus provides users consistency between old and new versions.

2. A compatibility mechanism that supports internationalized domain names valid under the IDNA2003 specification and the IDNA2008 specification. This second feature allows browsers, search engines, and other clients to handle both old and new domain names during the transitional period until registries update their rules to follow IDNA2008.

UTS #46 supplies normative data tables that are synchonized with the latest version of Unicode, allowing implementations to update without recalculation.

This new release of UTS #46 also provides a custom option to recognize legacy international domain names containing special ASCII characters such as "_".

About The Unicode Consortium

For more information, please contact the Unicode Consortium. http://www.unicode.org/contacts.html

Tuesday, October 12, 2010

Unicode Version 6.0: Support for Popular Symbols in Asia

The newly finalized Unicode Version 6.0 adds 2,088 characters, with over 1,000 new symbols.

A long-awaited feature of Unicode 6.0 is the encoding of hundreds of symbols for mobile phones. These emoji characters are in widespread use, especially in Japan, and have become an essential part of text messages there and elsewhere. Unicode 6.0 now provides for data interchange between different mobile vendors and across the internet. The symbols include symbols for many domains: maps and transport, phases of the moon, UI symbols (such as fast-forward) and many others.

A late-breaking addition is the newly created official symbol for the Indian rupee. With the help of the Indian government and our colleagues in ISO, the consortium was able to accelerate the encoding process. Once computers and mobile phones update to the new version of Unicode, people will be able to use the rupee sign like they use $ or € now.

This October 2010 release includes the Unicode Character Database (UCD), Unicode Standard Annexes (UAXes), and code charts. With the release of these components, implementers are able update their software to Unicode 6.0 without delay. The final text of the core specification will be available in early 2011.

To access Unicode 6.0, see http://www.unicode.org/versions/Unicode6.0.0.

For more information on emoji, see http://unicode.org/faq/emoji_dingbats.html

For a formatted version of this message with images, see http://unicode.org/press/pr-6.0.html.

Tuesday, August 31, 2010

Public Review Issue #172: Proposed Update Unicode IDNA Compatibility Processing

The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page:

http://www.unicode.org/review/

Review period for the new item closes on September 9, 2010.

Please see the page for links to discussion and relevant documents. Briefly, the new issue is:

#172 Proposed Update UTS #46, Unicode IDNA Compatibility Processing

http://www.unicode.org/reports/tr46/proposed.html

There is a proposed update with the following features: alignment with Unicode 6.0, the addition of conformance test files, and support of the IDNA2003 option UseSTD3ASCIIRules=false.

Feedback is requested both on both the draft text http://www.unicode.org/reports/tr46/proposed.html and draft data files http://unicode.org/Public/idna/6.0.0/

If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

http://www.unicode.org/consortium/distlist.html

Tuesday, August 24, 2010

Public Review Issue #176: Properties of Two Khmer Characters

The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page:http://www.unicode.org/review/.

Review periods for the new items close on October 25, 2010.
Please see the page for links to discussion and relevant documents. Briefly, the new issue is:

PRI #176: Properties of Two Khmer Characters

The UTC is considering potential changes to the General_Category property values and default collation weighting of two Khmer characters, U+17B4 KHMER VOWEL INHERENT AQ and U+17B5 KHMER VOWEL INHERENT AA. The UTC is seeking feedback on this topic. In particular, the UTC would be interested in learning of any current implementations which might be adversely affected by any of the proposed modifications to the General_Category and/or default collation weighting of these two characters. Please see the background document http://www.unicode.org/review/pr-176.html for details on the proposal.

If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html.

If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html.

----
All of the Unicode Consortium lists are strictly opt-in lists for members or interested users of our standards. We make every effort to remove users who do not wish to receive e-mail from us. To see why you are getting this mail and how to remove yourself from our lists if you want, please see http://www.unicode.org/consortium/distlist.html#announcements.

Public Review Issue #175: CLDR 1.9 Collation Changes

The Unicode CLDR committee is making Unicode locale-sensitive collation a major focus for the next release, CLDR 1.9. There are specific changes for a large number of languages, plus a change in the default ordering of punctuation vs symbols for all languages.

Please see the background document for more information: http://www.unicode.org/review/pr-175.html

If you have any feedback on any of the actions, please file a ticket with CLDR as described in the background document.

Review period for this issue closes on October 1, 2010.

If you wish to discuss issues on the CLDR Users mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the mail list are not automatically recorded as input to the committee. You must use the submission mechanism described in the background document to generate comments for consideration. http://www.unicode.org/consortium/distlist.html

----
All of the Unicode Consortium lists are strictly opt-in lists for members or interested users of our standards. We make every effort to remove users who do not wish to receive e-mail from us. To see why you are getting this mail and how to remove yourself from our lists if you want, please see http://www.unicode.org/consortium/distlist.html#announcements

Friday, August 6, 2010

Unicode Security and Domain Names

The Unicode Consortium has released three important specifications related to Internationalized Domain Names (IDNs) and Security.

UTS #46: Unicode IDNA Compatibility Processing
http://www.unicode.org/reports/tr46/

UTR# 36: Unicode Security Considerations
http://www.unicode.org/reports/tr36/

UTR# 39: Unicode Security Mechanisms
http://www.unicode.org/reports/tr39/

UTS #46: Unicode IDNA Compatibility Processing

Client software, such as browsers and emailers, faces a difficult transition from the version of international domain names approved in 2003 (IDNA2003), to the revision approved in 2010 (IDNA2008). The specification in this document provides a mechanism that minimizes the impact of this transition for client software, allowing client software to access domains that are valid under either system. The specification provides two main features: One is a comprehensive mapping to support current user expectations for casing and other variants of domain names.
Such a mapping is allowed by IDNA2008. The second is a compatibility mechanism that supports the existing domain names that were allowed under IDNA2003. This second feature is intended to improve client behavior during the transitional period.

UTR# 36: Unicode Security Considerations

Because Unicode contains such a large number of characters and incorporates the varied writing systems of the world, incorrect usage can expose programs or systems to possible security attacks. This is especially important as more and more products are internationalized.

This document describes some of the security considerations that programmers, system analysts, standards developers, and users should take into account, and provides specific recommendations to reduce the risk of problems.

UTR# 39: Unicode Security Mechanisms

Because Unicode contains such a large number of characters and incorporates the varied writing systems of the world, incorrect usage can expose programs or systems to possible security attacks. This document specifies mechanisms that can be used to detect possible security problems.

Monday, August 2, 2010

New PRI: Proposed Draft UTR #49 Unicode Character Categories

The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review periods for the new items close on October 25, 2010. Please see the page for links to discussion and relevant documents.

Briefly, the new issue is:
PRI #174 Proposed Draft UTR #49, "Unicode Character Categories"
http://www.unicode.org/reports/tr49/tr49-1.html
This proposed draft UTR presents an approach to the categorization of Unicode characters, and documents a data file that implementers can use for defining Unicode character categories.

If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html

Wednesday, July 21, 2010

Submit nominations for the Unicode Bulldog Award

The Unicode Consortium sponsors an annual award for outstanding personal contributions to the philosophy and dissemination of the Unicode Standard. Known as the “Bulldog Award,” it is presented at the Unicode conference to recognize “those tenacious champions of Unicode who have produced solid achievements in promoting its use around the globe”.

The Consortium invites the Unicode community to nominate up to two people they believe are most deserving of this award. Nominations should include a brief rationale why the candidate would be a good choice. Please check http://unicode.org/conference/bulldog.html to see a list of past winners. Executive officers and staff of the Unicode Consortium are not eligible for the award. Send nominations to magda@unicode.org with “Bulldog Award nomination” in the subject line by August 22, 2010.

Tuesday, July 20, 2010

Unicode Locale Identifier Stability

The Unicode Locale stability policies have been extended to cover
Unicode locale identifiers more clearly. See
http://unicode.org/policies/locales_stability.html

Tuesday, July 13, 2010

Unicode 5.0 now in Chinese

Pearson Education Asia Ltd. and Tsinghua University Press just published a translation of the Unicode Standard 5.0 in Simplified Chinese, available at Amazon.cn: http://www.amazon.cn/gp/product/B00328IJ46?ver=gp

Monday, July 12, 2010

New ways to follow Unicode

Follow us now on Facebook and Twitter, or read our blog. New icons below the menu on our home page give convenient links to these sites.
http://www.unicode.org/
What's new?
1. The Unicode Blog: http://unicode-inc.blogspot.com/
Our blog automatically receives the same announcements sent to our official e-mail list making it another convenient way to follow what's new. The blog is also available as an RSS or Atom newsfeed, so you can get the blog posts directly in your reader or e-mail client.
2. Twitter: http://twitter.com/unicode/
We are tweeting announcements as well as other interesting information for the Unicode community.
3. Facebook: http://www.facebook.com/pages/Friends-of-Unicode/127785250588285
If you like Unicode and use Facebook, please consider becoming a Friend of Unicode.

Friday, July 2, 2010

Unicode welcomes Government of Bangladesh

The Unicode Consortium is pleased to welcome the Government of
Bangladesh as a new instutitional member. Their website is at:
http://www.mosict.gov.bd/

Thursday, June 24, 2010

New Public Review Issue #173: Invariant Tests

The Unicode Technical Committee has posted a new issue for public
review and comment. Details are on the following web page:

http://www.unicode.org/review/#pri173

Review periods for the new items close on August 2, 2010.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:

Issue #173 Invariant Tests
This PRI proposes to add to the UCD a new machine-readable file that is
used to test invariants for each release of Unicode. The data documents
what is tested prior to the release of a version of the UCD. UAX #44
would be augmented with a short section documenting the structure and
usage. Details are in the PRI itself and the associated background
documents.

If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

----
All of the Unicode Consortium lists are strictly opt-in lists for members
or interested users of our standards. We make every effort to remove
users who do not wish to receive e-mail from us. To see why you are getting
this mail and how to remove yourself from our lists if you want, please
see http://www.unicode.org/consortium/distlist.html#announcements

Wednesday, June 9, 2010

New Public Review Issues (UTS #46 data, and two others)

The Unicode Technical Committee has posted three new issues for public review and comment. Details are on the following web page:http://www.unicode.org/review/ Review periods for the new items close on August 2, 2010. Please see the page for links to discussion and relevant documents. Briefly, the new issues are:

169 Glyph Variation of Double Oblique Hyphen
171 Proposal to change properties of U+06DE ARABIC START OF RUB EL HIZB
172 Proposed Update UTS #46: Unicode IDNA Compatibility Processing

The data for UTS #46 is being updated to synchronize with Unicode 6.0, and the UTC would like to get feedback on the data tables. Please see the text of the PRI for details.

If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html

Wednesday, June 2, 2010

Unicode 6.0 Beta, including new support for mobile phones

Mountain View, CA, USA – June 2, 2010 – The Unicode® Consortium today announced the availability of the Unicode 6.0 beta. A smooth transition to each new version of the Unicode Standard is vital, because it is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, and smartphones; modern web protocols (HTML, XML,...); and internationalized domain names.

Software developers and other experts are strongly encouraged to review the beta data files and documentation for Unicode 6.0 carefully, and to provide any feedback regarding errors or other issues to the Unicode Consortium. Software developers can also get an early start in testing their programs with the beta data files so they they will be ready for the release of Unicode 6.0 at the end of September.

A long-awaited new feature of Unicode 6.0 is the support of new characters for mobile phones. The emoji (pictographic) characters are in very widespread use, especially in Japan. They have distinct semantics, and are often substituted for related words. For the first time, there is a standard encoding for these characters that allows lossless interchange between different vendors. Unicode 6.0 also adds 222 new CJK unified ideographs in common use in China and Japan, and a number of other symbols and letters used by other languages.

See http://unicode.org/versions/Unicode6.0.0/ for the current draft summary.
See http://unicode.org/versions/beta.html for more information about the beta.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop, extend and promote use of the Unicode Standard and related globalization standards. The membership of the consortium represents a broad spectrum of corporations and organizations in the computer and information processing industry. Members are: Adobe Systems, Apple, DENIC eG, Google, Government of India, Government of West Bengal, IBM, Microsoft, Monotype Imaging, Oracle, SAP, Sybase, The University of California (Berkeley), The University of California (Santa Cruz), Yahoo!, plus well over a hundred Associate, Liaison, and Individual members.

For more information, please contact the Unicode Consortium.

Thursday, April 29, 2010

[Unicode Announcement] Unicode Releases Common Locale Data Repository, Version 1.8.1

The Unicode CLDR 1.8.1 maintenance release is now available. See
http://cldr.unicode.org/index/downloads/cldr-1-8-1 for details.

The next major release is CLDR 1.9, scheduled for the end of October.
Two milestone releases are planned for 1.9 as well. The 1.9 release is
focused on tooling and structural changes, while the CLDR 2.0 release
will involve general data submission. For the tentative schedule, see
http://cldr.unicode.org.

Tuesday, April 27, 2010

[Unicode Announcement] Call for Participation: IUC 34, Oct 18-20, 2010

Mountain View, CA, USA – April 26, 2010 – The Unicode® Consortium today
announced a call for participation in The Thirty-fourth
Internationalization & Unicode® Conference (IUC 34), taking place in
Santa Clara, Calif., USA; October 18-20, 2010. The conference is
produced by OMG™.

The Internationalization & Unicode Conference is the premier annual
technical conference for topics on the design and global deployment of
multilingual applications and web sites. Internationalization and
Unicode experts, implementers, clients, teachers, students, and vendors
are invited to attend this unique conference. The interactive format
makes the Internationalization & Unicode Conference a great place to
meet and exchange ideas with leading experts, find out about the needs
of potential clients, and get information about Unicode-enabled products.

To be considered as a presenter for the conference, please submit a
brief abstract before Wednesday, May 26. Topics should be related to
internationalization and localization; presentations structured as
tutorials are also welcome. Suitable topics include, but are not limited
to:

Best Practices and New Approaches

• New technologies, algorithms and methodologies
• Using internationalization libraries and programming environments
• Handling bidirectional or other complex scripts
• Data formats and evolving standards, e.g. XML, JSON, HTML5, DITA,
• Project management for global development teams
• Localization technologies, Crowd Sourcing, Machine Translation, et al
• Development, test, and deployment techniques and experiences
• Improving globalization capabilities within organizations
• Migrating legacy applications to global markets
• Unicode, Emoji, and character encodings

Application Areas

• Social networks
• Search engines, SEO, discovery and navigation best practices
• Websites, Cloud Computing, SAAS, and Web services
• Libraries and education
• Mobile applications, including iPhone, Android, iPad, Kindle, etc.
• Publishing and broadcasting for a global audience
• Internationalized Domain Names and other identifiers
• Security concerns and practices

Language and Locale Support

• African, Asian, Middle Eastern, and support for other languages
• Unicode Common Locale Data Repository (CLDR)
• Font development

Details of the call for participation are available at:
http://www.unicodeconference.org/iuc34call

Interested individuals or organizations are invited to submit a brief
(up to 600 word) abstract of their proposed conference presentation by
Wednesday, May 26 using this web form:
http://www.unicodeconference.org/abstracts

The Program Committee will notify authors by Wednesday, June 9. Final
presentation materials will be required from selected presenters by
Tuesday August 31. The conference agenda will be available by Tuesday,
June 15 at: http://www.unicodeconference.org/

Sponsorships and exhibit space are available; for more information on
sponsoring contact Ken Berk at kenberk@omg.org, +1-781-444 0404. For
exhibiting questions email event_marketing@omg.org . For all other
questions email: info@unicodeconference.org

###

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop,
extend and promote use of the Unicode Standard and related globalization
standards.

The membership of the consortium represents a broad spectrum of
corporations and organizations in the computer and information
processing industry. Members are: Adobe Systems, Apple, DENIC eG,
Google, Government of India, Government of Tamil Nadu, IBM, Microsoft,
Monotype Imaging, Oracle, The Society for Natural Language Technology
Research, SAP, Sybase, The University of California (Berkeley), The
University of California (Santa Cruz), Yahoo!, plus well over a hundred
Associate, Liaison, and Individual members.

For more information, please contact the Unicode Consortium
http://www.unicode.org/contacts.html. For more information, please
contact the Unicode Consortium http://www.unicode.org.

About the Event Producer

OMG™ is the Event Producer for the Internationalization & Unicode
Conferences. OMG is an open membership, not-for-profit consortium that
produces and maintains computer industry specifications for
interoperable enterprise applications. Our specifications include MDA®,
UML®, CORBA®, MOF™, XMI® and CWM™. OMG's specifications are all
available for download by everyone without charge.

For more information about OMG, visit us online at http://www.omg.org.

Note to editors: Unicode Standard, Unicode and the Unicode Logo are
trademarks of Unicode, Inc. Unicode Consortium is a registered trademark
of Unicode, Inc. OMG and Object Management Group are trademarks of
Object Management Group. All other trademarks are the property of their
respective owners.

Thursday, April 15, 2010

[Unicode Announcement] New Public Review Issue: Two New Provisional Properties for Characters in Indic Scripts

The Unicode Technical Committee has posted a new issue for public review
and comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on May 3, 2010.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:

Issue #168: Two New Provisional Properties for Characters in Indic Scripts
http://www.unicode.org/review/#pri168

The UTC is considering the addition of two new, enumerated provisional
character properties for Indic scripts: Indic_Syllabic_Category and
Matra_Placement. These are to assist in the analysis and processing of
syllables for various Brahmi-derived scripts, providing classificatory
information that is not easy to extract or derive for all of the Indic
scripts in the standard. Feedback is welcome on the construction of the
proposed properties, the details of the proposed assignment of values
for characters, and on the question of the usefulness of defining such
properties.

If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Saturday, April 10, 2010

[Unicode Announcement] Tracking proposed updates to Unicode technical publications

To make it easier to find and track proposed updates to Unicode
technical publications, the editorial committee has made several
improvements:

* Proposed updates can be found by a predictable, stable URL
* This URL is always accessible from the latest approved version of
the document via a header field titled "Latest Proposed Update".

For example, if you look at the approved 5.2.0 version of UAX
#15, Unicode Normalization Forms
(http://www.unicode.org/reports/tr15/), you'll find at the top
of the document under "Latest Proposed Update" a link to
http://www.unicode.org/reports/tr15/proposed.html. That URL
points to the latest proposed update for UAX #15.

* If there is currently no proposed update for a document, the URL
will point to a stub document indicating that there is no current
proposed update.

In addition, a predictable, stable URL is used for the modifications
section within each proposed update. That section summarizes the changes
that have been made from the previous version. These URLs follow the
format http://www.unicode.org/reports/tr15/proposed.html#Modifications .

Thursday, April 8, 2010

[Unicode Announcement] W3C India Conference

Here is an announcement for a conference in India which should be of interest to Unicode members.

---------------

The W3C India Office is organizing an International Conference "World Wide Web: Technology, Standards and Internationalization - 2010" in New Delhi on May 6-7, 2010.

The focus of the conference is to promote and proliferate W3C Standards in India to enable seamless Web access in Indian languages. One of the major aspects to be covered in the conference is Internationalization, especially in light of the complexity of implementing Indian Languages.

Core Technology Tracks in the Conference include:

1. W3C and Web Technologies
2. Internationalization Aspects in W3C
3. Web Access through mobile and hand-held devices
4. CSS and Styling issues
5. Web Architecture and Semantic web
6. Human Machine Interface for the Web
7. Web Content Accessibility in Indian Languages
8. W3C and E-Governance

Kindly visit the W3C India Website http://www.w3cindia.in and the Conference Website http://www.w3cindia.in/conf-site/conference-index.htm for more details. The Conference will also attempt to evolve a Roadmap for proliferation and specific requirements for Indian Languages in W3C and associated standards. We look forward to your active cooperation and participation in the Conference.

Tuesday, March 30, 2010

[Unicode Announcement] Public Review Issue: IVD Submission, PRI #167

The Unicode Consortium has posted a new issue for public review and
comment. Details are on the following web page:

http://www.unicode.org/review/

Review periods for the new items close on June 26, 2010.

Please see the page for links to discussion and relevant documents.
Briefly, the new issue is:

PRI #167 Ideographic Variation Database Submission

The Ideographic Variation Database provides a registry for collections
of unique variation sequences containing unified ideographs, allowing
for standardized interchange according to UTS #37, Ideographic Variation
Database. A submission to the Ideographic Variation Database has been
received for: "Combined registration of the Hanyo-Densi collection and
of sequences in that collection". Details are in the background document.

http://www.unicode.org/ivd/pri/pri167/index.html

Reviewers are encouraged to comment on any aspect of the submissions,
but more particularly on:

* whether the intent of a proposed collection is appropriately described
* whether the glyphic subset corresponding to a proposed sequence is
indeed a glyphic subset of the base character for the sequence
* whether the proposed sequences are congruent with the scope of their
collection, or whether a new collection may be more appropriate

If you have comments for official consideration, please post them by
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please use
the following link to subscribe (if necessary). Please be aware that
discussion comments on the Unicode mail list are not automatically
recorded as input to the IVD registrar. You must use the reporting link
above to generate comments for official consideration.

http://www.unicode.org/consortium/distlist.html

Wednesday, March 17, 2010

[Unicode Announcement] The Unicode Consortium Releases CLDR, Version 1.8

Mountain View, CA, March 17, 2009 - The Unicode Consortium announced
today the release of the new version of the Unicode Common Locale Data
Repository (Unicode CLDR 1.8), providing key building blocks for
software to support the world's languages.

CLDR 1.8 contains data for 186 languages and 159 territories: 501
locales in all. Version 1.8 of the repository contains over 22% more
locale data than the previous release, with over 42,000 new or modified
data items from over 300 different contributors.

For this release, the Unicode Consortium partnered with ANLoc, the
African Network for Localization, a project sponsored by Canada's
International Development Research Centre (IDRC), to help extend modern
computing on the African continent. ANLoc's vision is to empower
Africans to participate in the digital age by enabling their languages
in computers. A sub-project of ANLoc, called Afrigen, focuses on
creating African locales.

The Afrigen-ANLoc project's mission is to create viable locale data for
at least 100 of the over 2000 languages spoken in Africa, and
incorporate the data into Unicode's CLDR project and OpenOffice.org.
Implementation of fundamental locale data within CLDR is a critical step
for providing computer applications that can be localized into these
African languages, thus reaching populations that have never before been
able to use their native languages on computers and mobile phones.

The Afrigen-ANLoc project selected approximately 200 candidate
languages, including all official languages recognized by a national
government and all languages with at least 500,000 native speakers.
Additional languages were incorporated when volunteers stepped forward.
Data was collected through the Afrigen-ANLoc project by native-speaking
volunteers around the world, entered via a web-based utility designed
specifically for this purpose, and then merged into the CLDR repository.
In all, over 150 volunteers gathered locale data for 72 African
languages, with data for 54 of those incorporated into the CLDR 1.8
release. 41 of these languages are completely new to the Unicode CLDR
project while 13 others existed in earlier versions of CLDR and were
enhanced with additional data. These languages are spoken in 26
countries across the entire African continent.

"The partnership with Afrigen has been a huge benefit for us," says John
Emmons, vice-chair of the Unicode CLDR technical committee and lead CLDR
engineer for IBM. "The Afrigen effort has allowed us to bring many new
languages on board that we wouldn't be able to do through our normal
process, while still maintaining the level of quality and consistency
that we require for every language."

For more information about Unicode CLDR 1.8, see
http://cldr.unicode.org/index/downloads/cldr-1-8

The Afrigen-ANLoc data collection tool was developed by Louise
Berthilson of IT46 (http://www.it46.se), and the project is managed by
Martin Benjamin, director of Kamusi Project International
(http://kamusi.org). For more information about the African Network for
Localization, see http://www.africanlocalisation.net. For more
information about the Afrigen-ANLoc project, see
http://www.it46.se/afrigen. For more information about IDRC, see
http://www.idrc.ca.

About the Unicode Consortium

The Unicode Consortium is a non-profit organization founded to develop,
extend and promote use of the Unicode Standard and related globalization
standards.

For more information, please contact the Unicode Consortium.
http://www.unicode.org/contacts.html

Friday, February 26, 2010

[Unicode Announcement] New Public Review Issues for Unicode 6.0 UAXes

The Unicode Consortium is revising the Unicode Standard Annexes for
eventual release of Unicode 6.0. A standard part of our development
process is to open all of the annexes for public review. All the annexes
are currently available on the PRI page. We encourage all interested
parties to participate in this review. To look at the annexes and make
suggestions for additions and improvements to their content, please see
the Public Review Issues page:

http://www.unicode.org/review/

Comments should be submitted by May 3, 2010.

If you have comments for official UTC consideration, please post them by
submitting your comments through our feedback & reporting page:

http://www.unicode.org/reporting.html

If you wish to discuss issues on the Unicode mail list, then please
use the following link to subscribe (if necessary). Please be aware
that discussion comments on the Unicode mail list are not automatically
recorded as input to the UTC. You must use the reporting link above
to generate comments for UTC consideration.

http://www.unicode.org/consortium/distlist.html

Tuesday, February 16, 2010

[Unicode Announcement] Print-on-Demand Survey for Unicode 6.0

The Unicode Standard 5.2 is now online only and can be printed freely.
For the next edition of the standard, Unicode 6.0, the editorial
committee would like to find out whether people are interested in the
ability to obtain bound copies of the standard in the form of
individually available volumes covering different selections of the
text and code charts. (Unlike in the past, Unicode 6.0 will not be
published as a single volume by an outside publisher.)

If there is sufficient interest, the text and code charts could be
delivered using Print-on-Demand technology. To prepare for that
option, the editorial committee would need to do some restructuring.
In either case, users would be able to print any part of the online
files free of charge; the print-on-demand option would simply offer a
convenient way to purchase bound copies of some or all sections of the
standard.

We have created a very brief survey that gives some options and
estimates costs for a print-on-demand version of Unicode 6.0. We would
appreciate your feedback. There is also space on the form for
comments. Feel free to forward this link to others who are not on the
Unicode mailing lists, but who might be interested. The survey will
close on February 24.

The survey can be found here: http://www.unicode.org/pod-survey

The Unicode Editorial Committee

Tuesday, January 5, 2010

[Unicode Announcement] New lower individual member rates

Individual and student members are important contributors to the work of
the Unicode Consortium. From the encoding of rare and minority scripts
to collecting and vetting locale data, individuals help develop the next
generation of Unicode standards and data. To promote participation, the
Unicode Consortium has lowered the price for individual memberships to
$75 per year and student memberships to $35 per year. The new rates are
effective January 1, 2010 and existing memberships will be extended in
accordance with the new lower prices.

The Unicode Consortium has also moved to online publication of the
standard as of Version 5.2, and the entire text of the standard is
freely available for online reading and printing. Copies of the book are
no longer printed and are therefore no longer included as a membership
benefit.

For more information on the benefits of individual membership and how to
join, see: http://www.unicode.org/consortium/join.html

Wednesday, December 29, 2010

Monday, December 20, 2010

Thursday, December 2, 2010

Saturday, November 20, 2010

Friday, November 19, 2010

Saturday, October 30, 2010

Tuesday, October 12, 2010

Tuesday, August 31, 2010

Tuesday, August 24, 2010

Friday, August 6, 2010

Monday, August 2, 2010

Wednesday, July 21, 2010

Tuesday, July 20, 2010

Tuesday, July 13, 2010

Monday, July 12, 2010

Friday, July 2, 2010

Thursday, June 24, 2010

Wednesday, June 9, 2010

Wednesday, June 2, 2010

Thursday, April 29, 2010

Tuesday, April 27, 2010

Thursday, April 15, 2010

Saturday, April 10, 2010

Thursday, April 8, 2010

Tuesday, March 30, 2010

Wednesday, March 17, 2010

Friday, February 26, 2010

Tuesday, February 16, 2010

Tuesday, January 5, 2010

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog