Lookalike Domain Names Yet Another Browser Scam

John Lister's picture

A security firm says efforts to make the Internet truly global could make scams easier. It also says a program for registering domain names in numerous languages can be abused for scam purposes.

The issue involves the Internationalized Domain Name (IDN) system. This builds on the original Domain Name System (DNS) that helps 'translate' a web site name (such as www.infopackets.com) into an IP address. These numbers then identify the location of the server, which then allows communication between the server and client machines (such as a web browser) to take place.

The basic Domain Name System only includes 26 letters in the Latin alphabet that are used for languages such as English, along with some basic variations such as accented letters used in European languages. That means it doesn't work well with languages that use other alphabets.

The Internationalized Domain Name system gets around this by allowing for new top level domains that work in the same way as those used for specific countries (for example, the .ca domain is used for many Canadian websites). These new top level domains will have their own rules for which letters and symbols can be used in website addresses seen by the public.

Addresses Near Identical

Farsight Security says this creates a problem. In some cases, a character in a non-Latin alphabet will be remarkably similar to one in the Latin alphabet. The difference can be as small as a tiny mark besides an otherwise identical character. The company has coined the term "homograph" for such situations. (Source: globenewswire.com)

For example, the letter o with umlaut (ö) appears in the German alphabet and looks similar to the letter "o" used in the Latin alphabet. This means it's possible to register a website that has a domain name that looks almost identical to one in the Latin alphabet that's used for a recognizable brand name. It can be particularly different to spot on a small phone screen.

This could make it easier to trick people into clicking on a link that they thought pointed to a legitimate recognized website but was actually a scam. Users could then be fooled into typing in login details or other security information.

27 Percent Of Sites Could Be Scams

Farsight says it examined 100 million domain names created under the Internationalized Domain Name system and believed 27 percent of them may have been registered for scam purposes. (Source: bbc.co.uk)

One limitation to such scams is that although the main part of the website address will appear to be the recognizable name, the suffix of the domain will not contain the ".com," but rather an alternative top level domain (TLD).

Top level domains typically designate the category of website; .com is used for "commercial" websites, and ".net" are used for "networking" websites, though the two have been used interchangeably. As of April 2018, there are 1543 variations of top level domains.

What's Your Opinion?

Are such scams inevitable? Is it a price worth paying for opening up the web to a wider range of languages? Should browsers warn when you visit a website with a non-Latin alphabet address?

Rate this article: 
Average: 5 (5 votes)


Dennis Faas's picture

This type of scam can be difficult to spot unless your eyesight is keen. That said, web browsers could be programmed to "spot" the difference by comparing UTF characters with similar ASCII variances in the domain name. UTF characters are used with non-Latin alphabets, whereas ASCII contains mostly Latin. I have had to deal with interchanging UTF and ASCII characters with programs I've written for web servers, and it is a real pain to deal with, but it is certainly possible.

Focused100's picture

Hi Dennis,
I like your idea of getting the browser to keep track of this.
What about HTTPS? wouldn't that be a tip that the bad site is not legit?

Dennis Faas's picture

HTTPS, if enabled, only ensures that the page you're viewing is secure (using an SSL certificate). The purpose of HTTPS is so that (a) you can verify that the site you're connected to in fact has a secure certificate, and (b) most importantly, third parties cannot intercept the viewing of the page as it is transmitted to your web browser and vice versa. HTTPS does nothing to prevent website forgeries, such as phishing scams described in the article.

It is possible for scam websites to incorporate HTTPS using a certificate just like any other website. The certificates don't cost that much, but there is some technical know-how to set it up. That said, I would think the majority of scam sites would not bother with this due to the extra cost and deployment.

SteveMann's picture

I thought that IPv6 was supposed to fix problems like this. Every device, every node and every user would have a unique IP address, making it easy to verify the sender. Even spoofed IP addresses could be quickly detected.

IPv6 provides 340,282,366,920,938,463,463,374,607,431,768,211,456 unique IP addresses. Or looked at it in another way, that would provide 39,614,081,257,132,168,796,771,975,168 IP addresses for every person on earth. It makes the mere 40 IP addresses on my local net look really tiny.

Why is IPv6 implementation taking so long?

Dennis Faas's picture

Domain names are not IP addresses, but IPs point to domain names (website names). So, whether the domain uses IPv4 or IPv6 has nothing to do with the problem. The problem is with the name itself. So, you might have Coke.com which is the real domain name for the Coca Cola website, and and a fake domain such as Cöke.co - they look incredibly similar (for a reason), but are not. This is the issue at hand.
Chief's picture

The internet was designed to use 26 letters. Opening it up to additional letters borders on lunacy. Sadly, the super smart people can also be super short-sighted.

rohnski's picture

Great Idea.

Sure, I am a native English speaker. The 26 char of the "latin" alphabet is my comfort zone. But the same point can be made for speakers of any language. I don't look for umlat's and other "freaky" stuff. It is easy to miss. ESPECIALLY when we have to deal with shortened domain names and browser address windows that don't / can't show the whole domain address at once. And teeny tiny 7 inch or smaller phone and tablet screens make it that much easier to sneak these characters past users without some sort of automated watch dog.

Our computers are insanely powerful compared the 8086/286 and 386 computers that first built the internet. Look at your task manager. These days, a computer being used for web surfing is 99.99% idle. So throwing some program logic to burn a few CPU cycles to validate domain names makes total sense to me.

Spitballing, how about running a check on the characters in domain name against the characters that are native to the base language the computer is running on. If it finds any foreign characters there needs to be a warning. Especially in TLDs, but almost as important in the rest of the name.

I can also see a context sensitive component. If the user types a non-native character, that is probably safe. A web link with a non-native character is suspect. An IP address that resolves to non-native character, that is suspect.

Adding this sort of feature to our browsers makes perfect sense. It is a natural enhancement to have the browser inform us about suspect characters in a domain name. It is much the same as the "green lock" icon being displayed for a properly formatted HTTPS connection. Heck, it could even be wired in to that mechanism.

Automation can only do so much to protect us. Let's make that automation work for us. The final, informed, decision has to be made by educated users.

HALLANE_10197's picture

This 'essentially identical-looking characters in a URL' problem will increase, and definitely needs to be protected against.

Could a browser plug-in fix this problem prior to the browser companies fixing it?
- Hal Lane