Wednesday, May 4, 2016

Not Just Emoji

Every programmer knows about Unicode. Most other people have no idea what it is, even though they use Unicode every day. Every character you type on your smartphone or laptop — and every character you read — is defined by the Unicode Consortium.

The awareness of the Unicode Consortium has grown recently, with the spread of emoji. But from the news articles, it’s easy to get the impression that emoji is the only thing we do. In reality, there are over 120,000 characters defined, and as you see below, only a small fraction of them are emoji.

Emoji and Non-Emoji

For example, this June we’ll be adding 7,500 characters — and of those new characters, fewer than 1% of them are emoji. The majority of the characters are from 6 new scripts: some in modern use, and some historic.

CLDR is the other main project for the Unicode Consortium. It provides the building blocks for supporting a variety of different languages. We’ve just released CLDR v29, and are about to start data submission for v30. Especially if you are a native speaker of a “digitally disadvantaged” language, we encourage you to join the other contributors to CLDR to help with this effort.

The Unicode Consortium is a volunteer-driven 501(c)(3) non-profit organization. Some people may work on emoji, while others work on ancient scripts, or Chinese ideographs. Others work on the language support in CLDR, or other projects.

You can help fund the work of the consortium — even if you don’t contribute technically — by adopting your favorite character through the Adopt A Character program.

— Mark Davis, President