Tuesday, October 29, 2024

Script Encoding and Cultural Identity: Navigating Digital Exclusion

By Maroua Bezzaoui, SILICON Intern

During the summer of 2024, Unicode’s internship program included interns from Stanford University, Northeastern University, and Google’s Summer of Code. Several of the interns have shared their experiences. The second featured piece is from Maroua Bezzaoui at Stanford University.

Friday, October 25, 2024

ICU 76 Released

ICU LogoUnicode® ICU 76 has just been released. ICU is the premier library for software internationalization, used by a wide array of companies and organizations to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

ICU 76 updates to Unicode 16 (blog), including new characters and scripts, emoji, collation & IDNA changes, and corresponding APIs and implementations. It also updates to CLDR 46 (beta blog) locale data with new locales, significant updates to existing locales, and various additions and corrections. For example, the CLDR and Unicode default sort orders are now very nearly the same.

Most of the java.time (Temporal) types can now be formatted directly using the existing ICU4J date/time formatting classes.

There are some new APIs to make ICU easier to use with modern C++ and Java patterns. Most of the C/C++ APIs added for this purpose are implemented as C++ header-only APIs, and usable on top of binary stable C APIs, which is a first for ICU.

The Java and C++ technology preview implementations of the (also in tech preview) CLDR MessageFormat 2.0 specification have been updated to match recent changes.

ICU 76 and CLDR 46 are major releases, including a new version of Unicode and major locale data improvements.

For details, please see
https://unicode-org.github.io/icu/download/76.html.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
๐Ÿ•‰️๐Ÿ’—๐ŸŽ️๐Ÿจ๐Ÿ”ฅ๐Ÿš€็ˆฑ₿♜๐Ÿ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Unicode CLDR 46 available

Postal Horn emojiUnicode CLDR 46 is now available and has been integrated into version 76 of ICU.

The most significant data changes in this release were: 
  • Updated to Unicode 16.0 (including major changes to collation)
  • Substantial additions and modifications of Emoji search keyword data
  • ‘Upleveling’ the locale coverage (see below)
The most significant changes in the specification were:
  • Updates to Message Format in tech preview
  • Updates to conformance
  • New tech preview section on semantic skeletons
CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?))

Via the Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems. 

In version 46, the following levels were reached:

New / Upleveled Locales

±

New Level

Locales

๐Ÿ“ˆ

Modern

Nigerian Pidgin, Tigrinya

๐Ÿ“ˆ

Moderate

Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof

๐Ÿ“ˆ

Basic

Ewe, Ga, Kinyarwanda, Konkani (Latin), Northern Sotho, Oromo, Sichuan Yi, Southern Sotho, Tswana

๐Ÿ“‰

Basic*

Chuvash, Anii


We are currently planning for CLDR 47 to be a closed release with no data submission period. The focus will be on improving the Survey Tool used for data submission, making necessary infrastructure changes, and some high priority data quality fixes.

For more information

See the CLDR 46 release page , which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
๐Ÿ•‰️๐Ÿ’—๐ŸŽ️๐Ÿจ๐Ÿ”ฅ๐Ÿš€็ˆฑ₿♜๐Ÿ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Tuesday, October 22, 2024

Time and Trust

By Samuel Minev-Benzecry, SILICON Intern


During the summer of 2024, Unicode’s internship program included interns from Stanford University, Northeastern University, and Google’s Summer of Code. Several of the interns have shared their experiences. The first featured piece is from Samuel Minev-Benzecry at Stanford University.

Friday, September 27, 2024

Unicode Technology Workshop on October 22-23 – Program Updates!

By the UTW 2024 Program Committee


Join us for two days of community building around the Unicode technology that makes software work for billions of people. With a deeper emphasis on case studies, unconference, and workshop-style sessions, this event will enable participants to collaborate and learn from each other to tackle the latest challenges. Register Now for this in-person-only event hosted at Google in Sunnyvale, CA. The full program, including session details and bios, is available here:

UTW 2024 Event Website !

⭐ Highlights:

  • Build connections within the internationalization community
  • Learn best practices from peers and case studies
  • Network with the developers and users to help shape the future of Unicode technology
  • Deepen knowledge of how to solve tough problems in the i18n and l10n space and how to engineer products that work better for global users

๐Ÿ“ข Confirmed Sessions!

  • “A User-Centric Approach to a Bidi Text Interface” with Adil Allawi
  • “Common Locale Data Repository - Using the Survey Tool to Expand Language Coverage” with Conrad Nied
  • "Talking Emoji ๐Ÿ”ฅ๐Ÿ˜ฎ‍๐Ÿ’จ๐Ÿ„๐Ÿชฆ๐Ÿ’€๐Ÿท๐Ÿ™๐Ÿ˜ค" with Jennifer Daniel
  • “Design Deep-Dive” with Mark Davis
  • “How Would You Like Your Text Today?” with John Hudson
  • "Indic Script Policy & Planning in the Digital Age" with Karthik Malli
  • “Language and Direction Metadata on the Web” with Addison Phillips
  • “MessageFormat 2 Technical Preview: Where Are We Now?” with Addison Phillips
  • “Tracking Language Digitization in the UNESCO World Atlas” with Jeannette Stewart and Tex Texin
  • “Why Does Unicode Do That?” with Mark Davis
  • "Volunteers for Keyboards for Indigenous Language Communities" with Tex Texin
  • "Optimizing Glyphs for Real-Time Vector Rendering" with Eric Lengyel
  • "How To Not Run Towards The Bear: Directionality & Emoji" with Kamilรฉ Demir and Ben Joeng (Yang)
  • "What is a Valid Person Name?" with Michael McKenna
  • “Case Study - Solving Inflection” with Nebojลกa ฤ†iriฤ‡ (Chair of the Unicode ICU-Language Inflection Working Group) and George Rhoten
  • “Bridging Languages in ICU4X: How Diplomat Brings i18n to the Web and Beyond” with Tyler Knowlton
  • “We Need a New Message Resource Format” with Eemeli Aro
  • "New in CLDR/ICU" with Mark Davis
  • "Could You Give Me an Example? Simplifying the CLDR Survey Tool" with Helena Aytenfisu and Emiyare Ikwut-Ukwa
  • “ICU4X 2.0: Next Level i18n” with Shane F Carr (Chair of the Unicode ICU4X Technical Committee)
  • "From Oral to Digital in One Generation - An Exploration of Amazonian Languages and Their Path to Digital Inclusion" with Samuel Minev-Benzecry
  • “Encoding Expectations: How Long Does It Really Take?” Anushah Hossain and Ahad Bashir
  • "Indic Script Policy & Planning in the Digital Age" with Karthik Malli
  • "Date, Time, and Timezone for Netflix Live Events” with Shawn Xu and Chester Fung
  • "Behind the Curtains: Unicode Technical Groups” with Mark Davis (Unicode Co-founder and CTO)
  • “Ask Unicode Anything” with Toral Cowieson, Mark Davis, Cathy Wissink
Please note that sessions are continually being added for the two tracks.

๐Ÿ‘ Expect workshops, seminars, free-form discussions, and lightning talks on:

  • i18n libraries
  • locale data frameworks
  • globalization tooling
  • input methods
  • text rendering
  • localization pipelines

❓Who should attend?:

  • Whether you’re an experienced GILT professional, an internationalization or Unicode enthusiast, just starting out, or a student, the UTW 2024 sessions will enrich your understanding of key issues!

❗Space is limited so be sure to secure your spot today!

  • Discounts are available for Unicode members and students. Registration fees include continental breakfast, lunch, refreshments, and Mix & Mingle at the end of the first day.

Register Now !


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
๐Ÿ•‰️๐Ÿ’—๐ŸŽ️๐Ÿจ๐Ÿ”ฅ๐Ÿš€็ˆฑ₿♜๐Ÿ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Thursday, September 26, 2024

Unicode CLDR 46 Beta available for specification review

 

The Unicode CLDR 46 Beta is now available for specification review and integration testing. The release is planned for October 24rd, but any feedback on the specification needs to be submitted well in advance of that date. The beta specification is available at Draft LDML Modifications. The biggest changes in the specification are the updates to Message Format in tech preview, the updates to conformance, and the new tech preview section on semantic skeletons. See also the Migration section of the new release page.


CLDR provides key building blocks for software to support the world's languages (dates, times, numbers, sort-order, etc.) For example, all major browsers and all modern mobile phones use CLDR for language support. (See Who uses CLDR?)


Via the Survey Tool, contributors supply data for their languages — data that is widely used to support much of the world’s software. This data is also a factor in determining which languages are supported on mobile phones and computer operating systems.


The beta has already been integrated into the development version of ICU 76. We would especially appreciate feedback from non-ICU consumers of CLDR data and on Migration issues. 


Feedback can be filed at CLDR Tickets.


The most significant changes to the data in this release are:


  • Updates to Unicode 16.0 (including major changes to collation), 

  • Substantial additions and modifications of Emoji search keyword data, 

  • ‘Upleveling’ the locale coverage (see below).


For the details, see  the draft CLDR 46 release page, which has information on accessing the data, reviewing charts of the changes, and — importantly — will cover Migration issues.

New / Upleveled Locales

±

New Level

Locales

๐Ÿ“ˆ

Modern

Nigerian Pidgin, Tigrinya

๐Ÿ“ˆ

Moderate

Akan, Baluchi (Latin), Kangri, Tajik, Tatar, Wolof

๐Ÿ“ˆ

Basic

Ewe, Ga, Kinyarwanda, Konkani (Latin), Northern Sotho, Oromo, Sichuan Yi, Southern Sotho, Tswana

๐Ÿ“‰

Basic*

Chuvash, Anii

For more information


See the draft CLDR 46 release page, which has information on accessing the data, reviewing charts of the changes, and — importantly — Migration issues.



Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
๐Ÿ•‰️๐Ÿ’—๐ŸŽ️๐Ÿจ๐Ÿ”ฅ๐Ÿš€็ˆฑ₿♜๐Ÿ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Monday, September 16, 2024

Bloomberg Joins as Supporting Member of the Unicode Consortium

[image] The Unicode Consortium is pleased to announce that Bloomberg has joined as a Supporting Member.

Bloomberg is a global leader in business and financial information, delivering trusted data, news, and insights that bring transparency, efficiency, and fairness to markets. The company helps connect influential communities across the global financial ecosystem via reliable technology solutions, such as the Bloomberg Terminal, that enable financial professionals to make more informed decisions and foster better collaboration.

For more than four decades, the Bloomberg Terminal has revolutionized the financial services industry by bringing transparency and innovation to the capital markets. Trusted by the world’s most influential decision-makers, the Terminal provides real-time access to news, data, insights and trading tools that help our customers turn knowledge into action.

“Bloomberg is excited to join the Unicode Consortium and to help advance the state of internationalized software for global financial applications,” said Matt O’Conor, leader of Bloomberg’s Internationalization Infrastructure Engineering team. “Unicode and its associated technologies are instrumental to the continued success of modern computing. The Bloomberg Terminal is built on Unicode, in order to support our users who speak a variety of languages. We are honored to take part in this worldwide community and to share our own insights and expertise in the global markets with others, as well as to learn from the greater Unicode community as we continue providing first-class products and services to our clients in their respective locales.”

“We are pleased to welcome Bloomberg as a Unicode Consortium member, recognizing the company’s pivotal role in global financial communications and data exchange,” said Toral Cowieson, CEO of Unicode. “As the first financial technology firm to join Unicode, Bloomberg’s expertise and commitment to Unicode’s work will greatly enhance our collective efforts to ensure internationalization standards meet the evolving needs of industries worldwide.”

Supporting Members of the Consortium have representation on up to two technical committees, and a half vote in each one.

Information on Full, Supporting, and Associate memberships and benefits can be found on Unicode’s website along with the list of current members.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
๐Ÿ•‰️๐Ÿ’—๐ŸŽ️๐Ÿจ๐Ÿ”ฅ๐Ÿš€็ˆฑ₿♜๐Ÿ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.

Tuesday, September 10, 2024

Announcing The Unicode® Standard, Version 16.0

[image] Version 16.0 of the Unicode Standard is now available. This is a major version update that includes new characters and code charts, new data files and annexes, an updated core specification, and updated annexes and synchronized standards.

This version adds 5,185 new characters, including 3,995 additional Egyptian Hieroglyph characters plus seven new scripts, seven new emoji characters, and over 700 symbols from legacy computing environments, for a total of 154,998 characters. See the delta code charts for details on all the new scripts and characters. For additional details regarding new emoji, see Emoji Recently Added, v16.0.

In addition to new characters, new “Moji Jลhล Kiban” (ๆ–‡ๅญ—ๆƒ…ๅ ฑๅŸบ็›ค) Japanese source references have been added for over 36,000 CJK unified ideographs. This is reflected in the code charts for virtually all CJK unified ideograph blocks by additional representative glyphs in the “J” column.

The core specification for Version 16.0 is now available for browsing online as per-chapter web pages with “breadcrumb” and other links for easy navigation.

Two new annexes have been added to this version:
  • UAX #53, Unicode Arabic Mark Rendering: This annex, which was previously published as a Technical Report, specifies an algorithm for handling combining marks when rendering to ensure correct and consistent display of Arabic script text.
  • UAX #57, Unicode Egyptian Hieroglyph Database (Unikemet): This annex documents the format of the Unikemet.txt data file, which provides information clarifying the identity of Egyptian Hieroglyph characters and properties useful for implementations.
For complete details on Unicode Version 16.0, see https://www.unicode.org/versions/Unicode16.0.0/.


Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
๐Ÿ•‰️๐Ÿ’—๐ŸŽ️๐Ÿจ๐Ÿ”ฅ๐Ÿš€็ˆฑ₿♜๐Ÿ€

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.