Unicode is the foundation for all modern software and communications around the world, including all modern operating systems, browsers, laptops, and smart phones – plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). Thus it is important to ensure a smooth transition to each new version of the Unicode Standard.
Unicode 9.0.0 comprises several additions and changes which require careful migration in implementations. These include asymmetric case mappings, numerous variation sequences, new fractional numeric values, and changes to property values, especially East_Asian_Width values. The line breaking and text segmentation algorithms handle character sequences that represent emoji as indivisible units via the addition of new property values and rules. Implementers need to modify code and check assumptions for all affected processes to support these additions and changes.
The new character repertoire includes 74 emoji symbols, 19 symbols used in Japanese TV broadcasting, and multiple additions to existing scripts. There are six new scripts, of which three are in modern use (Adlam, Osage, and Newa) and three are historic (Bhaiksuki, Marchen, and Tangut). Adlam and Osage have case pairs and require data updates for casing functions. Tangut is a large ideographic script whose addition incurred changes to the Unicode Collation Algorithm (used as the basis for sorting text in all languages).
Please review the documentation, adjust your code, test the data files, and report errors and other issues to the Unicode Consortium by May 2, 2016. Feedback instructions are on the beta page.
See http://unicode.org/versions/beta-9.0.0.html for more information about testing the 9.0.0 beta.
See http://unicode.org/versions/Unicode9.0.0/ for the current draft summary of Unicode 9.0.0.