Wednesday, October 21, 2009

[Unicode Announcement] Unicode Collation Algorithm Version 5.2 Released

Version 5.2 of the Unicode Collation Algorithm has been released.
See http://www.unicode.org/reports/tr10/.
This version resynchronizes the Unicode Collation Algorithm with all
of the updates for the Unicode Standard, Version 5.2. Please note
the following changes and issues for implementations:

* The text of UTS #10 has been updated. Among other changes, the
revised text for UTS #10 makes it clear that the BASE for
implicit generation of weights for Han characters does not
include unassigned code points.
* There are small changes in Gujarati, Telugu, Malayalam
(including weighting for chillus), Tamil, and Sinhala. While
these changes move in the direction of expected behavior, good
results will only come from tailoring for particular languages,
such as with CLDR.
* There have been significant changes to the ordering of many
combining marks. Many combining marks that are not in customary
use in modern languages now have the same secondary weight, and
will only be distinguished on a fourth level, by code point
ordering. This can be seen by looking at the Unicode Collation
Charts (http://unicode.org/charts/collation/). In 5.2, many
characters now have a white background, indicating that they
sort exactly the same as the previous character, unless a 4th
(codepoint) level is used.
* Implementations of UCA should take note that the increased
number of characters may cause overflows if the implementing
code makes certain assumptions or optimizations. This can result
either from the new character additions (which increase the
number of distinct weights in the table) or because of changes
in the way the weights, particularly for secondary weight
values, are assigned in the table. The latter change may result
in unexpected numbers of characters having the same weight.

----
All of the Unicode Consortium lists are strictly opt-in lists for members
or interested users of our standards. We make every effort to remove
users who do not wish to receive e-mail from us. To see why you are getting
this mail and how to remove yourself from our lists if you want, please
see http://www.unicode.org/consortium/distlist.html#announcements