Tuesday, November 26, 2024

Feedback Requested on Proposed Draft UTS #58 Unicode Linkification

Feedback is requested on Proposed Draft UTS #58 Unicode Linkification, especially by technologists working with browsers and any programs that automatically apply links to URLs, such as email programs. 

So what is Linkification?

With most email programs, when someone pastes in the plain text:
The page https://ja.wikipedia.org/wiki/アルベルト・アインシュタイン contains information about Albert Einstein.
and sends to someone else, they receive it as:
The page https://ja.wikipedia.org/wiki/アルベルト・アインシュタイン contains information about Albert Einstein.

 URLs are also “linkified” in many other applications, such when pasting into a word processor (triggered by typing a space afterward, for example). 

Problem

However, many products (many text messaging apps, video messaging chats, etc.) completely fail to recognize any non-ASCII characters past the domain name. And even among those that do recognize such non-ASCII characters, there are gratuitous differences in where they stop linkifying.

The linkification process for URLs is already fragmented — with different implementations producing very different results — but it is amplified with the addition of non-ASCII characters, which often have very different behavior. That is, developers’ lack of familiarity with the behavior of non-ASCII characters has caused the different implementations of linkification to splinter. Yet non-ASCII characters are very important for readability. People do not want to see the above URL expressed in escaped ASCII:

Proposed Solution

This proposed draft Unicode Technical Standard #58 Unicode Linkification specifies a standard mechanism for detecting URLs embedded in plain text — in particular, detecting URLs containing non-ASCII characters. It also defines the minimally necessary escaping of non-ASCII code points in the Path, Query, and Fragment portions of a URL that aligns with the mechanism for detecting URLs.

How to Provide Feedback

For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the feedback and discussion instructions. The closing date is 2025 January 02 for this draft, but this is only the first step towards approval.


_________________________________________________

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock


As Unicode, Inc. is a US-based open source, open standards, non-profit, 501(c)3 organization, your contribution may be eligible for a tax deduction. Please consult with a tax advisor for details.