Wednesday, September 30, 2015

UAX #29, Unicode Text Segmentation, update to improve Mongolian word segmentation

Mongolian wordUnicode Standard Annex #29, Unicode Text Segmentation, will be updated for Unicode 9.0. A draft of the proposed update is available for general public review and comment.

The Word_Break classification of U+202F NARROW NO-BREAK SPACE (NNBSP) is revised to correct the text segmentation behavior of U+202F for Mongolian usage. For further background on this issue and possible ways to address it, see PRI #308, Property Change for U+202F NARROW NO-BREAK SPACE (NNBSP).

In this revision, the formerly empty Prepend class of the Grapheme_Cluster_Break property is redefined to consist of all prefixed format control characters and a few other characters with certain Indic_Syllabic_Category property values.

The corresponding property value changes will be incorporated in the UCD data files for Unicode 9.0.