Tuesday, February 24, 2026

UTS #58: Making URLs Readable for Humans: From %E0%A4%AE… to महात्मा

People around the world need to use their writing systems in URLs. This is important: in writing their native languages, the majority of humanity uses characters outside of A-Z, and they expect those characters to also work seamlessly.

Browsers and other programs  generally handle Unicode in domain names well. But not all browsers and other programs do a good job with domain names, and many make the rest of the URL unreadable.  For example, consider the common practice of providing user handles such as the following two:

x.com/rihanna

www.youtube.com/@핑크퐁

The first of these works well in practice — because it is all ASCII. Copying from the address bar and pasting into text provides a readable result. However in the second example, in many browsers and other programs, copying the address bar gives an unreadable string:

www.youtube.com/@핑크
youtube.com/@%ED%95%91%ED%81%AC%ED%90%81

The names also expand in size and turn into very long, unreadable strings, such as:

hi.wikipedia.org/wiki/महात्मा_गांधी
hi.wikipedia.org/wiki/%E0%A4%AE%E0%A4%B9%E0%A4%BE%E0%A4%A4%E0%A5%8D%E0%A4%AE%E0%A4%BE_%E0%A4%97%E0%A4%BE%E0%A4%82%E0%A4%A7%E0%A5%80

The other side of the coin is making sure that when programs add links to URLs in a predictable way, linkifying the entire URL, and without extending the link to include sentence punctuation. For example, many programs don’t add links properly to:

… see
https://example.com/αβγ/δεζ?θικ#λμν.


A commonly used email program, for example, stops midway through:

… see
https://example.com/αβγ/δεζ?θικ#λμν.


Others may include the sentence period, question mark, surrounding parenthesis, etc.:

… see
https://example.com/αβγ/δεζ?θικ#λμν.


Users often insert spaces to prevent this. It should be automatic:

… see
https://example.com/αβγ/δεζ?θικ#λμν.


The new UTS #58 specifies how to format and linkify URLs and email addresses in readable, predictable, user-friendly ways. The data files cover all of the 159,000+ characters in Unicode.

We encourage implementers to adopt this specification for a consistent experience for users worldwide.

----------------------------------------------

Adopt a Character and Support Unicode’s Mission

Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀

Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.

Each adoption includes a digital badge and certificate that you can proudly display!

Have fun and support a good cause

You can also donate funds or gift stock