Most people think that the internet is a great equalizer. Everyone can access enormous quantities of data with one search. Unfortunately, there are actually several obstacles to this. In addition to difficulties accessing technology or wifi, linguistic barriers restrict the data available to any given person.
A significant portion of the Internet is in English, followed somewhat closely by Chinese and less closely by Spanish and Arabic. While the number of languages used on the Internet has increased significantly within the last decades, so that it is now far from monolingual, a look at Wikipedia shows that information is not equally available across languages. For instance, English Wikipedia, which has the most articles by far, has only a 51% overlap with German Wikipedia, which has the second most articles. Google searches also reveal a lack of linguistic equity. For example, in the West Bank, a search for restaurants has far more results in typed in Hebrew, or even in English, than in Arabic.
Even more telling, many languages are either not used on the Internet, particularly endangered languages or languages with different writing systems. Unicode is used to create letters, symbols, and emojis, but it does not support all the characters in all alphabets. Bengali has “as many native speakers as French, Italian, and German combined,” but several characters from its writing system are completely left out. Sure, there are only so many combinations in Unicode, but there are enough for some really bizarre emojis and even one undeciphered script, yet not enough for all Bengali speakers to write their own names. And what about sign language? Unicode has only a few very basic handshapes.
Far from being an equal access database for the world, the Internet, and even the characters we type in and the emojis we add to texts include a ranking of which languages are the most valuable and whose voices should be heard. However, if it were made more inclusive, the Internet could be a tool to preserve languages instead of propagating linguistics homogenization. Recently, a font designer developed a new font family for the Cherokee syllabary to encourage its use on the Internet. The same typeface has been used for Cherokee for most of its existence. This new script harkens back to historic and handwritten forms of the characters and includes an Italic version. Digitalization could be used not just to preserve old books and ideas, but to keep languages new and alive.
Image Credit to: https://blog.unbabel.com/2015/06/10/top-languages-of-the-internet/