A user types paypal.com into the address bar. The site loads. The padlock is green. The TLS certificate is valid. The branding is correct. And the credentials they enter go straight to an attacker, because the domain wasn’t paypal.com. It was pаypal.com, with a Cyrillic “а” in place of the Latin one. To the eye, identical. To DNS, a completely different domain.
This is a homograph attack, and it’s one of the few impersonation techniques that exploit a flaw in human perception rather than a software flaw. It doesn’t depend on tricking distracted users into misreading a URL. Even a careful person, looking directly at the address bar, sees the brand they expect.
For adtech, affiliate operations, and brand teams, the relevance is direct. Homograph and look-alike domains increasingly impersonate networks, advertisers, and publishers in phishing, ad-fraud landing infrastructure, and reputation attacks.
What follows is the mechanism itself, where defenses break, and how to think about exposure operationally rather than theoretically.
Contents
- The Side-by-Side Most Defenders Underestimate
- Spot the Difference
- What Is a Homograph Attack?
- How the Attack Actually Works
- How a Homograph Attack Moves Through the Chain
- Where the Address Bar Stops Protecting You
- Why Adtech Is a Particular Target
- Real Incidents Worth Knowing
- Homograph & Typosquat Incidents in the Wild
- The Confusable Universe Is Larger Than Most Defenders Think
- Categories of Domain Confusables
- A Defense Layer Comparison That Actually Helps
- When Defensive Domain Registration Stops Making Sense
- The False Positive Problem Nobody Discusses
- What to Do Differently on Monday
- The Underlying Point
- FAQ
The Side-by-Side Most Defenders Underestimate
Visual · Homograph Attack
Spot the Difference
Certificate Detail
Issued To
PayPal, Inc.
CA
DigiCert EV
Punycode
paypal.com
↑ position 2
U+0430 — Cyrillic small letter a
Visually identical to Latin “a” (U+0061)
Certificate Detail
Issued To
— (DV only)
CA
Let’s Encrypt
Punycode
xn--pypal-4ve.com
If your eye can’t tell which is which, that’s the point. The rest of this article is about how to build defenses that don’t depend on your eyes.
What Is a Homograph Attack?
A homograph attack, sometimes called an IDN homograph attack, uses visually identical or near-identical characters from different scripts (or different code points within the same script) to register a domain that looks like a legitimate one but resolves to attacker-controlled infrastructure.
The technical foundation is Internationalized Domain Names (IDN). DNS itself only allows ASCII, so non-Latin domains are encoded into ASCII through Punycode. The Cyrillic-spoofed pаypal.com is internally xn--pypal-4ve.com. Browsers decide whether to render the Punycode form (visibly suspicious) or the Unicode form (invisibly identical to the legitimate domain) based on script-mixing rules that, as we’ll see, are not foolproof.
The Unicode Consortium maintains a formal catalog of which characters are visually confusable with which others, inUnicode Technical Standard #39, “Unicode Security Mechanisms”, and lists thousands of pairings. A homograph attack is, in essence, the weaponization of that catalog against domain trust.
Three terms readers often blur, and shouldn’t:
- Homograph attack / IDN homograph attack: characters from different scripts that look the same — Latin “a” vs Cyrillic “а”, Latin “o” vs Greek “ο”.
- Homoglyph attack: broader umbrella that also includes within-script substitutions: lowercase “l” vs digit “1”, uppercase “I” vs lowercase “l”, “rn” rendered as “m”.
- Typosquatting: plausible misspellings – gooogle.com, propelerads.com. Different attack classes, often combined with homoglyphs in the same campaign.
The distinction matters because the defenses for each are different, and conflating them is one of the more common reasons brand-protection programs leave gaps. Most “look-alike domains attack” coverage in the wild is actually a mix of all three running at once.
How the Attack Actually Works
The mechanism is simpler than the defensive complexity around it suggests.
Attack Chain · Defender Visibility
How a Homograph Attack Moves Through the Chain
The attacker sees every step. Defenders usually see only the earliest registration signals — and the damage report at the end.
Stage 01
Registration
Attacker Action
Registers a visually identical IDN domain using lookalike Unicode characters.
Defender Visibility
Visible only if WHOIS, zone files, or IDN-aware monitoring are watching.
Stage 02
Certificate Setup
Attacker Action
Obtains a basic DV certificate so the spoofed domain appears locked and credible.
Defender Visibility
Visible through certificate transparency logs if the brand monitors variants.
Stage 03
Delivery
Attacker Action
Sends the domain through email, ads, search results, or social channels.
Defender Visibility
Usually hidden unless the delivery channel itself is instrumented.
Stage 04
Rendering
Attacker Action
Relies on visual similarity and browser-specific IDN rules.
Defender Visibility
Victims see a domain that appears to belong to the real brand.
Stage 05
Exploitation
InvisibleAttacker Action
Captures credentials, drops malware, or stages “scam exposé” content for reputation attacks.
Defender Visibility
Surfaces only when victims report — long after damage is done.
The Asymmetry
Defenders have only two real windows of visibility — WHOIS and CT logs — both at the very start of the chain. Everything from delivery onward happens in the dark unless the channel itself is instrumented.
What makes the attack durable is the asymmetry of effort. Registering a confusable IDN costs the same as any other domain. A valid TLS certificate is free from public CAs and can be issued in minutes.
The attacker only needs the chain to work once for a given target: one click, one credential, one indexed page. Defenders have to hold a perimeter around every brand string the attacker might plausibly target, across every TLD, every script combination, and every confusable variant.
That asymmetry is the actual reason this attack class survives. It isn’t technical sophistication.
Where the Address Bar Stops Protecting You
Most modern browsers implement IDN display rules to defuse the obvious cases. Chrome and Chromium-based browsers apply a layered set of conditions before showing a domain in Unicode rather than Punycode, including checks for mixed scripts, character whitelisting, and TLD-script alignment.
Firefox uses a similar but distinct algorithm maintained by Mozilla. Where those rules trigger, the user sees xn--pypal-4ve.com, and the impersonation collapses.
Where they don’t trigger, and there are gaps, the user sees the spoof.
The most-cited demonstration remains researcher Xudong Zheng’s 2017 disclosure of xn--80ak6aa92e.com rendering as аррӏе.com, a fully Cyrillic string that bypassed Chrome and Firefox’s same-script heuristic precisely because it didn’t mix scripts. Both browsers were patched, but the underlying logic, “single-script domains are probably safe,” remains a structural assumption that adversaries continue to probe.
This is the first hidden failure mode worth naming. Browser IDN rules are heuristic, not deterministic. They’re tuned against a known catalog of confusables and a known set of bypass patterns. They don’t generalize gracefully to new TLDs, new scripts entering common use, or single-script lookalikes within recently added Unicode blocks.
Treating the address bar as the trust boundary is treating the last and weakest layer as if it were the first.
Mobile makes it worse
Desktop discussion dominates this topic, but most adtech traffic is mobile. On mobile browsers, the address bar is shorter, often hidden during scroll, and frequently truncates the domain to the leftmost characters. A user who would catch pаypal.com on a 27-inch monitor often won’t catch accounts.pаypal-secure.com on an iPhone, where only accounts.pаypal-… is visible.
iOS Safari, Chrome on Android, and most in-app browsers (Instagram, TikTok, Telegram WebView) all carry their own quirks in IDN rendering and certificate-warning behavior. None of them is a stronger trust boundary than desktop Chrome, and several are weaker.
If your threat model includes mobile traffic, and in adtech it always does, browser-level protection is materially less reliable than the desktop case studies suggest.
Why Adtech Is a Particular Target
Homograph and look-alike domains tend to surface in three operational contexts inside ad-supported ecosystems, each with different consequences and different ownership inside the buying organization.
In malvertising landing-page impersonation, a creative directs traffic to what visually appears to be a known retailer, fintech brand, or software vendor, but the destination is a credential-harvesting clone.
From the buy-side perspective, this is a brand-safety incident; from the sell-side perspective, it’s a creative-review failure; from the user’s perspective, it’s indistinguishable from a real visit until the damage is done.
Partner and supplier phishing is the costlier category in practice. Operations teams at networks, agencies, and exchanges receive a high volume of legitimate inbound from external counterparties, and a spoofed sender domain that visually matches a known partner: pаrtner.com, examp1e.com – is often opened, clicked, and acted on within the same hour.
The cost here is rarely a stolen password – it’s wire instructions changed, a payment redirected, an API key disclosed.
Reputation infrastructure uses homograph or near-homograph domains to host defamatory or “scam exposé” content designed to capture branded search traffic. This overlaps more with typosquatting than with pure homograph attacks, but the operational playbook is the same.
ADEX’s own brand-protection team documented one such case in which a one-letter-off domain was used to host coordinated negative content against a major ad network, which was eventually transferred via WIPO proceedings. The domain-intelligence to UDRP case study is worth reading for the registrar data and OSINT mechanics, which apply directly to homograph cases as well.
These three contexts are usually the same actors at different stages of the same campaign. The phishing infrastructure, the malvertising landing pages, and the reputation domains tend to share registrar patterns, DNS providers, and registrant fingerprints. Treating them as separate problems is one of the reasons remediation feels endless.
Real Incidents Worth Knowing
Notable Cases
Homograph & Typosquat Incidents in the Wild
A non-exhaustive timeline of public disclosures shaping today’s IDN threat landscape.
anple.com
2017A proof-of-concept fully Cyrillic domain rendered as apple.com in unpatched Chrome and Firefox builds (Xudong Zheng). Patched — but the structural single-script bypass logic remains.
Reference: Zheng’s original disclosure on his personal blog.
binance.com · kraken.com · coinbase.com
2018–2022Crypto-exchange phishing waves. Multiple campaigns used IDN look-alikes with Latin/Cyrillic substitutions to host credential-harvesting clones.
Documented by Group-IB and Recorded Future in periodic threat-landscape reports.
Epik registrar wave
2018Coordinated homoglyph batch. Farsight Security identified hundreds of registrations targeting major banks and fintechs, channeled through a small set of registrars and resold for phishing infrastructure.
Source: Farsight Security research blog.
Healthcare-sector wave
2020–2021COVID-era impersonation. Multiple national CERTs reported homograph domains impersonating ministries of health and vaccine portals, used for credential phishing and malware drop pages.
Documented by ENISA in its 2021 threat landscape report.
propelerads.com
ADEX · 2025Adtech reputation case. A typosquatting domain with one missing letter was used to host defamatory “scam reveal” content targeting branded search traffic.
Source: ADEX brand-protection investigation.
The pattern across these cases is that the technical novelty is low and the operational discipline is high. Attackers reuse the same registrar relationships, DV-cert pipelines, and content templates. Defenders who look for novelty miss the campaign; defenders who look for repetition catch it.
The Confusable Universe Is Larger Than Most Defenders Think
There’s a tendency to picture homograph attacks as the Cyrillic-vs-Latin example and stop there. The actual confusable surface is wider, and parts of it are routinely missed by defensive monitoring.
Categories of Domain Confusables
Five branches of look-alike risk — and two systemic blind spots most brand programs under-monitor.
Confusable
Domains
Root Node
Cross-script IDN homographs
Latin / Cyrillic / Greek substitutions across writing systems.
Within-script homoglyphs
Same-alphabet look-alikes: l / 1 / I, rn / m, O / 0.
Diacritic & accent confusables
△ Blind SpotMarked letters: á / a, ç / c, ñ / n. Treated as legitimate within local language context.
Punycode & encoded edge cases
△ Blind SpotBidirectional text, zero-width characters, combining marks. Often slip past registrar abuse detection.
Typosquatting overlap
Single-letter omissions, doublings, transpositions — frequently combined with confusables.
Browser Detection:
Often Rarely / Inconsistently NeverDiacritic and zero-width categories are where most programs leave gaps.
Diacritic substitutions (“á” for “a”, “ç” for “c”) rarely trigger browser Punycode display because most browsers treat them as legitimate within their language context.
Zero-width and combining characters render as nothing visible and can be inserted to create domains that are technically distinct but visually identical. These aren’t theoretical. They appear in security research regularly, and they aren’t uniformly caught by abuse-detection systems at registrars.
A Defense Layer Comparison That Actually Helps
Most defense advice for homograph attacks reads as a checklist of equally weighted measures. In production, the layers don’t contribute equally, and they fail in different ways. The table below is the breakdown that the ADEX Brand Protection team uses internally when scoping a program.
| Defense layer | What it actually does | Where it works well | Where it fails |
| Browser IDN display rules | Forces Punycode display when script-mixing or whitelist conditions trigger | Cross-script Latin/Cyrillic spoofs in major TLDs | Single-script lookalikes, new TLDs, niche scripts, mobile and in-app browsers |
| Defensive registration | Pre-registers high-risk confusables and common typos | Closing the obvious attack surface for high-value brands | Combinatorial explosion across TLDs and scripts; cost grows non-linearly with brand portfolio size |
| Certificate Transparency monitoring | Surfaces newly issued TLS certs for confusable domains in near-real time | Early detection, often hours after registration | High false-positive volume; requires confusable-aware matching, not just substring matching |
| DMARC / SPF / DKIM enforcement | Prevents direct spoofing of the legitimate domain in email | The “From: yourbrand.com” attack | Does not stop email from a homograph domain — the attacker isn’t spoofing you, they’re impersonating you |
| Registry-level script-mixing policies | Some TLD operators (e.g., Verisign for .com IDN tables, some ccTLDs) restrict which scripts can be combined | Reduces the cross-script attack surface for protected TLDs | Coverage varies by registry; most gTLDs accept far more permissive registrations than ccTLDs |
| Registrar abuse reporting | Triggers takedown via the registrar’s policies | Clear-cut TOS violations with documented harm | Slow, inconsistent across registrars, often requires repeated escalation |
| UDRP / legal escalation | Forces the transfer of trademark-infringing domains | When trademark rights are clear, and bad-faith use is documented | Months-long process; weak protection while the proceeding runs; requires registered trademarks |
| Endpoint and email-gateway URL inspection | Catches confusable domains in user-facing channels | Phishing emails, internal click protection | Out-of-band channels (SMS, chat apps, social DMs); user-facing sites the gateway doesn’t see |
No single layer closes the attack surface, and the layers have fundamentally different latency profiles. Defensive registration is slow and expensive but permanent.
CT monitoring is fast but noisy. UDRP is decisive but trails the attack by weeks or months. A program that overweights any one of them ends up with a familiar failure pattern: the obvious typos are pre-registered, the email channel is locked down, and the actual incident comes through a confusable domain in a Telegram message to an affiliate manager.
When Defensive Domain Registration Stops Making Sense
There’s a point at which the standard advice (“register your common confusables”) becomes a budget sink that doesn’t improve security posture. For a brand with a five-character primary domain, the number of plausible homograph and homoglyph variations across major TLDs runs into the thousands. Registering them all is a procurement project, not a security strategy.
A more defensible threshold: defensively register variants that are either (a) high-prevalence in user-typed traffic, visible in Search Console misspell data, or (b) flagged as confusable under Unicode TR39’s Single-Script Confusables set for the brand’s primary script, in the TLDs where the brand is commercially active. Everything beyond that is better served by detection – CT monitoring, branded-search alerts, abuse intake, than by pre-emptive ownership.
This is unglamorous, but it’s the threshold most mature brand-protection programs converge on after one or two budget cycles of trying the maximalist approach.
The False Positive Problem Nobody Discusses
The trustworthiness side of any homograph-detection or domain-spoofing-detection system is bounded by its tolerance for false positives. Confusable matching is mathematically a fuzzy comparison. The more aggressive the match, the more legitimate domains get flagged: partner brands, regional variants, internationalized versions of the brand’s own properties.
In our monitoring environments, naive confusable matching on a brand string produces double-digit daily alerts, and the majority are non-malicious: regional resellers, lookalike but unrelated business names, archived domains. Tightening the match reduces noise but starts to miss the cases that matter, particularly the diacritic and zero-width attacks where the visual distance is essentially zero, but the string distance is meaningful.
The programs that hold up are the ones that accept this trade-off explicitly. They treat confusable detection as a triage signal, not a verdict, and they invest in the human review layer rather than chasing a higher-precision automated filter that doesn’t exist. The detection pipeline is a layer; the decision is still a person’s.
What to Do Differently on Monday
Three concrete actions that meaningfully change exposure, in order of effort-to-impact.
- Instrument Certificate Transparency for your brand strings with confusable expansion.
Open-source CT log monitors are widely available; the value comes from feeding them a list that includes Punycode variants, diacritic forms, and within-script homoglyphs of your brand, not just the literal string. This is the single fastest way to compress the gap between a hostile registration and your awareness of it. A well-tuned pipeline gets you from “registration” to “alert in queue” in under six hours. - Audit the channels where your team and your partners actually receive URLs.
Email gateways are usually well-instrumented. Slack, Telegram, WhatsApp, LinkedIn, and SMS usually aren’t. The asymmetry between channel coverage and channel use is where most successful homograph and look-alike domain deliveries land. You don’t need full DLP on Telegram. You need a written rule that financial or credential-bearing changes never get acted on from a chat-app URL. - Make sure the trademark and the legal escalation path are in place before you need them.
The UDRP process assumes a registered trademark and documented bad-faith use. Assembling those after a campaign is already running costs weeks the brand can’t afford.
Across the cases we’ve worked, including the typosquatting-to-UDRP matter referenced earlier, the speed of resolution correlates almost entirely with the quality of the documentation accumulated before escalation, not with the merits of the case itself.
The Underlying Point
Homograph attacks persist not because they’re technically sophisticated. They aren’t. They persist because they exploit a layer of the system: the visual rendering of identifiers that was never designed to be a trust boundary, and that the rest of the security stack tacitly assumes is one. Every defense above is intended to compensate for an assumption that shouldn’t be made: that the user can verify a domain by looking at it.
Drop that assumption, and the work becomes operational.
- You can measure the latency between a hostile registration and your detection of it.
- You can measure your takedown time.
- You can list which channels are instrumented and which aren’t.
- You can name which scripts and confusable categories are in your monitoring set.
Most brand-protection programs aren’t structured around those measurements, which is why this attack class continues to work 15 years after the underlying mechanism was first publicly described.
FAQ
Is a homograph attack the same as typosquatting?
No. Typosquatting relies on plausible misspellings of a legitimate domain (gooogle.com). A homograph attack uses visually identical characters from different scripts or character sets to register a domain that looks identical to the original. The two are often combined in the same campaign, but the defenses differ – typosquatting can be partly addressed by defensive registration, while homograph requires confusable-aware monitoring.
Does Chrome block homograph attacks?
Partially. Chrome applies a layered set of IDN display rules that force Punycode rendering when scripts are mixed or when the domain doesn’t meet whitelist conditions. These rules catch the obvious cross-script cases. They don’t catch single-script lookalikes (e.g., a fully Cyrillic spoof), zero-width character attacks, or many diacritic substitutions. Mobile and in-app browsers are weaker still.
What is Punycode, and why does it matter for fake URL attacks?
Punycode is the ASCII encoding used to represent Unicode characters in DNS, since DNS itself only allows ASCII. A Cyrillic-spoofed pаypal.com is internally xn--pypal-4ve.com. When browsers render Punycode, the spoof is obvious. When they render the Unicode form, it’s invisible. Whether the browser shows one or the other depends on its IDN display algorithm.
How do I detect homograph and look-alike domains targeting my brand?
Three layers in combination: Certificate Transparency log monitoring with confusable-expanded brand strings; branded-search alerting for “scam,” “fraud,” “review,” and similar modifiers; and inbound abuse intake from partners, customers, and your own employees. Pure WHOIS monitoring is too slow on its own.
Are diacritic domains (pаypál.com) considered homograph attacks?
They sit on the boundary. Most browser IDN rules treat them as legitimate within a language context, so they often render in Unicode without warning. Operationally, they should be in your monitoring set, because they’re under-defended by browser logic and registrar abuse policies alike.
How long does a UDRP proceeding take to transfer a homograph domain?
Roughly two to three months from filing to decision in standard cases, assuming a registered trademark and clear bad-faith evidence. The 2025 ADEX case referenced above ran approximately two months from complaint to transfer order. Speed correlates more with documentation quality than with case complexity.


