Tuesday, November 19, 2013

Unicode Regular Expressions Updated

Regular expressions are used throughout much of the world's software for matching and manipulating text. UTS #18: Unicode Regular Expressions provides the foundation for the handling of Unicode text in those expressions.

Version 17 of this standard adds the Bidi_Paired_Bracket and Bidi_Paired_Bracket_Type properties, both new in Unicode 6.3, and it expands the guidelines and requirements for support of the Script_Extensions property.

Friday, November 15, 2013

Unicode Security Standard version 6.3 Released

Version 6.3 of UTS #39: Unicode Security Mechanisms has been released. Because the Unicode Standard contains such a large number of characters for the writing systems of the world, caution is necessary to avoid exposing programs and systems to possible security attacks. This document provides mechanisms for reducing the risk of problems, while the associated UTR #36: Unicode Security Considerations describes a variety of security considerations for Unicode and guidelines for dealing with them.

UTS #39 includes a new Restriction Level (Single Script), and a number of clarifications for confusable detection, restriction revels, and optional detection. It also contains a new section describing how the identifier data is generated. That identifier data has been expanded to include certain characters from UAX #31: Unicode Identifier and Pattern Syntax, a few extra characters allowed in IDNA2008 (Internationalized Domain Name Architecture, http://tools.ietf.org/html/rfc5890), and certain characters based on user feedback. The version numbering has also been changed to align with versions of the Unicode Standard.

The associated UTR #36 has some smaller changes. There are a few important corrections, and the addition of new sections discussing security issues with transitivity and idempotence. There are also a few related new FAQ entries on http://www.unicode.org/faq/security.html.

Wednesday, November 13, 2013

Version 6.3 of UTS #46, Unicode IDNA Compatibility Processing

Unicode Technical Standard #46 version 6.3 has been released, synchronized with Unicode 6.3. The data tables are identical with the previous version, with the exception of the 5 new Bidi_Control characters. The table derivation has been modified to forbid Bidi_Control characters, now and in the future: this is consistent with the intent of IDNA2003, and with the treatment of these characters in IDNA2008.