Skip to content

A reminder that Unicode fonts can still spell danger on the Internet

April 23, 2017

Opera 44.0 being halfway fooled by a fake Unicode URL.

Out of the blue, a neglected, eight-year-old security vulnerability caused by the way the Internet supports multi-language Unicode typefaces has resurfaced with a splash.

Two weeks ago an information security researcher demonstrated how easy it was to still use Unicode font tricks to make malicious faked websites look safe and legitimate in modern web browsers.

In an April 14th blog post entitled “Phishing with Unicode Domains

Of course, the website had nothing to do with Apple and neither did the URL—it just looked that way.

The “аррӏе” of Zheng’s URL was made using characters from the Slavic Cyrillic alphabet—an alphabet included in every Unicode typeface. The Cyrillic characters and the order which Zheng used them in just happen to bear a near-perfect resemblance to the Latin alphabet characters that spell “apple”, though the pronunciation of the Cyrillic “аррӏе” would be closer to “arr-eh”.

What Zheng had pulled off is well-known among Internet security experts as a homograph attack, one exploiting the superficial resemblance between characters in different language scripts to fake website URLs.

More specifically it is called an IDN homograph attack, because it exploits the Unicode font support of the Internationalized Domain Name (IDN) system, which has allowed for the creation and registration of non-English domain (website) names for the last eight years.

However, before I go any further to explain the flaw that Zheng exploited, I want to quickly address his claim that certain web browsers are still susceptible to it—specifically that Mozilla (the makers of Firefox) when informed of the vulnerability in January of this year, decided not to issue a security patch and therefore that Firefox continues to be vulnerable.

A “proof of concept” proves not so dangerous after all

Google Chrome 57.0.2987.133 displayed the faked URL as “” but the latest version, 58.0.3029.81, wasn’t taken in.

My personal tests with the eight web browsers that I have on hand confirm that Google Chrome was vulnerable until March 30th but not after that date and that the Opera browser, up to the very latest version, is still vulnerable.

However, none of the Firefox versions I tested (going back to 2014) were fooled in the slightest and even the aged Internet Explorer 10 (from 2012) didn’t actually fail.

Basically, only two of the eight web browsers I tested displayed the spoofed Unicode URL as ““:

  • Chromium-based Opera, version 44.0 (latest since updated to fix this flaw)
  • Google Chrome 57.0.2987.133 (March 30, 2017)

Firefox-based Pale Moon 27.0 and every other Firefox browser I tries displayed the url properly as ““.

Six of the eight web browsers I tested safely displayed the spoofed Unicode URL in its non-Cyrillic ASCII form: ““.

  • Google Chrome 58.0.3029.81 (latest)
  • Midori 0.5.11 (latest)
  • Firefox-based Pale Moon 27.0 (2016)
  • Chromium-based Vivaldi 1.0.435.42 (2016)
  • Maxthon Nitro (2015)
  • Firefox 28 (March 2014)
  • Internet Explorer 10.0.9200.16843 (2012)

The 2012-vintage Internet Explorer 10 safely displayed the ASCII version of the URL—in the address bar at least.

You can test whatever web browsers you use on Zheng’s proof of concept example.

Now back to the previously scheduled technical explanation, already in progress.

The perils of teaching an old dog new tricks

Unicode is an international standard created in 1991 for digital text which recognizes a total of 128,000 characters covering 135 of the world’s modern and historic scripts. Most modern typefaces include only a fraction of the Unicode total—Arial, for example, contains a total of 27,720 characters, including special invisible ones to indicate things like a change in the reading direction.

The Unicode standard, when it came into being, only had desktop computers in mind. The Internet—which was a nearly 20-year-old U.S. government research project when the Unicode was being implemented—wasn’t even on the public radar yet and anyways, thanks to the age of its conception, the Net was already firmly based on U.S. English.

The Domain Name System (DNS), for example, gave every numeric server address on the Internet (email, ftp, website and so on) an equivalent, easier-to-remember (for people at least) English-language domain name, drawn from a subset of the American Standard Code for Information Interchange (ASCII) character set—specifically the letters A through Z, the digits 0 through 9, and the hyphen character.

It wasn’t until 2009 that Internet domain names became truly multilingual (after a fashion) with the implementation of the Unicode Internationalized domain name (IDN) system.

Actually, all server address domain names are still based on the U.S. English ASCII character set. But, just as the ASCII DNS system sits atop the numeric address system, so to the Unicode-based IDN system now sits atop the ASCII-based DNS system, with all Unicode domain names having

To take ‘s deceptive Unicode IDN as an example, the Unicode , which is, itself, just a mask for a numeric address, such as .

Using Unicode to bait a “phish”hook

Opera 44.0 takes the bait of a Cyrillic-styled “paypal” website link found in Google and which now fortunately points nowhere.

Since the introduction of Unicode support to Internet systems in 2009, security experts have warned of a variety of ways in which Unicode typeface characters could be used for so-called “phishing” attacks.

Phishing is the term for fraudulent schemes to trick computer users into giving up sensitive personal data, usually by tricking them into downloading from the Internet and installing onto their computers disguised software, or “malware”, designed to secretly collect and transmit their passwords and credit card numbers.

A key to this kind of online con job is to use the most compelling clickbait email offers and/or trustworthy-looking counterfeit websites.

Beyond Zheng’s example, consider the word at the end of this sentence: “Раураӏ” It’s made of the following Cyrillic characters: “Er-a-u-er-a-palochka”. I dare you to distinguish it from the ASCII word “Paypal”.

Which is to reiterate that Unicode font support allows for the creation of some very convincing-looking fake domains of famous Western websites, using lookalike characters from non-roman Unicode alphabets, such as Cyrillic but that’s not all Unicode offers would-be fraudsters.

In a 2015 post on Unicode spoofing tricks, I covered a range of possible Unicode homograph attacks, from website domain name spoofing—which may (or may not) still just be a speculative risk—to the well documented use of invisible Unicode RLO characters, designed to indicate changes of reading direction,  to disguise dangerous malware programs as harmless picture files, in order to trick computer users into opening them.

Steps have been taken over the years to mitigate various vulnerabilities inherent in Unicode support. Web browsers will now always render domain names which mix Unicode and ASCII characters in their ASCII/punycode equivilent.

However, Zheng’s pure Unicode domain name was (and is) able to slip past some current web browsers. If nothing else, we have to hope that the fact serves as a wake up call to the various browser makers to once and for all try to patch all the known security vulnerabilities presented by Unicode font support.

For my part, I’ll certainly be laying off using Opera until I hear that the still-glaring Unicode flaw in that browser has been patched.

Update: I didn’t have to wait long! According to the Opera blog, Opera Stable 44.0.2510.1449 update, released April 25, “has been enriched with a security patch to prevent possible phishing with Unicode domains” The Opera blog even includes screen shots to show how the update now renders ‘s fake Cyrillic Apple domain in its punycode equivalent. Click the images to enlarge them.

From → Internet

  1. I got a message in WhatsApp, not 20 minutes ago, that used this.The w and the t were swapped so it looks like шһатѕарр. Thank god I remembered reading about this, and was able to also warn my friends, since it came from a local number


    • The “ш” is Unicode character 448, Cyrillic small letter “Sha” and the “т” is Unicode 0442, Cyrillic small letter “Te”. Was this part of a website link, or was there a website link in the message? That is to say, what action was the message trying to get you to make? I understand Whatsapp messages expire but if their’s any way for you to get a screen shot, please email it to me at


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: