Unicode Confusables Detector

The Unicode Consortium publishes a confusables.txt data file listing character pairs that are visually indistinguishable. This tool uses a curated subset of that data to detect the most security-relevant confusables in any text.

A confusable (Unicode term) is any character that a reasonable reader could mistake for a different character. The Unicode Security Mechanisms specification (UTS#39) defines "skeleton" algorithms for normalizing confusable sequences and detecting mixed-script identifiers. The full confusables.txt database contains over 7,000 pairs; the most security-critical are the Latin-lookalikes from Cyrillic, Greek, and fullwidth blocks covered by the Homoglyph Detector.

Script categories detected

Cyrillic (highlighted red) — А В С Е Н І К М О Р Ѕ Т У Х and their lowercase equivalents. Primary source of IDN homograph attacks and phishing domains.
Greek (highlighted orange) — Α Β Ε Ζ Η Ι Κ Μ Ν Ο Ρ Τ Υ Χ and lowercase α ο ρ υ χ ν. Used in mixed-script usernames and code identifiers.
Fullwidth Latin (highlighted blue) — Ａ–Ｚ, ａ–ｚ, ０–９ (U+FF21–U+FF5A, U+FF10–U+FF19). Designed for CJK typesetting, abused in social-media name spoofing.
Mathematical variants (highlighted purple) — 𝐀–𝐙, 𝐚–𝐳 (mathematical bold), and italic capitals. Used in emoji-style display names and package name spoofing.

How to detect Unicode confusables

Paste the text you want to check into the Homoglyph Detector input area.
Click Analyze. Each confusable character is highlighted by script category with its Unicode codepoint name and the ASCII character it resembles.
Use Clean to produce an ASCII-only version, or Compare mode to diff two strings at the codepoint level.

UTS#39 and the Unicode Confusables Data

The Unicode Technical Standard #39 (Unicode Security Mechanisms) defines the algorithm for "confusable detection" used by browser engines when evaluating IDN domain names. The source data at unicode.org/Public/security/latest/confusables.txt is updated with each Unicode version. This tool implements a curated security-focused subset covering the characters most frequently observed in real-world attacks.

Related tools

Homoglyph Detector — interactive detection and cleaning tool
Cyrillic Homoglyph Reference — per-character Cyrillic table
Phishing Text Checker — check suspicious URLs
Invisible Character Detector — zero-width and BiDi attacks
Unicode Lookup — look up any codepoint by name or number