
People around the world need to use their writing systems in URLs. This is important: in writing their native languages, the majority of humanity uses characters outside of A-Z, and they expect those characters to also work seamlessly.
Browsers and other programs generally handle Unicode in domain names well. But not all browsers and other programs do a good job with domain names, and many make the rest of the URL unreadable. For example, consider the common practice of providing user handles such as the following two:
x.com/rihanna
www.youtube.com/@핑크퐁
The first of these works well in practice — because it is all ASCII. Copying from the address bar and pasting into text provides a readable result. However in the second example, in many browsers and other programs, copying the address bar gives an unreadable string:
www.youtube.com/@핑크
⇩
youtube.com/@%ED%95%91%ED%81%AC%ED%90%81
The names also expand in size and turn into very long, unreadable strings, such as:
hi.wikipedia.org/wiki/महात्मा_गांधी
⇩
hi.wikipedia.org/wiki/%E0%A4%AE%E0%A4%B9%E0%A4%BE%E0%A4%A4%E0%A5%8D%E0%A4%AE%E0%A4%BE_%E0%A4%97%E0%A4%BE%E0%A4%82%E0%A4%A7%E0%A5%80
The other side of the coin is making sure that when programs add links to URLs in a predictable way, linkifying the entire URL, and without extending the link to include sentence punctuation. For example, many programs don’t add links properly to:
… see
https://example.com/αβγ/δεζ?θικ#λμν.
A commonly used email program, for example, stops midway through:
… see
https://example.com/αβγ/δεζ?θικ#λμν.
Others may include the sentence period, question mark, surrounding parenthesis, etc.:
… see
https://example.com/αβγ/δεζ?θικ#λμν.
Users often insert spaces to prevent this. It should be automatic:
… see
https://example.com/αβγ/δεζ?θικ#λμν.
The new UTS #58: Unicode Link Detection and Formatting: URLs and Email Addresses specifies how to format and linkify URLs and email addresses in readable, predictable, user-friendly ways. The data files cover all of the 159,000+ characters in Unicode.
We encourage implementers to adopt this specification for a consistent experience for users worldwide.
----------------------------------------------
Adopt a Character and Support Unicode’s Mission
Looking to give that special someone a special something?
Or maybe something to treat yourself?
🕉️💗🏎️🐨🔥🚀爱₿♜🍀
Adopt a character or emoji to give it the attention it deserves, while also supporting Unicode’s mission to ensure everyone can communicate in their own languages across all devices.
Each adoption includes a digital badge and certificate that you can proudly display!
Have fun and support a good cause
You can also donate funds or gift stock
