jarvisbox

隐形 Unicode 字符检测器

即时检测文本中隐藏的零宽空格、BiDi 控制字符及其他不可见 Unicode 字符,显示风险等级与码位,并输出干净文本。

100% client-side · no upload
Paste text above and click Analyze — results will appear here.

How to use

  1. Paste or type the text you want to inspect into the input area above.
  2. Click Analyze. The tool scans every Unicode code point and lists any invisible characters it finds, along with their names and risk ratings.
  3. Review the risk table — BiDi overrides and tag characters carry the highest risk. Copy the Cleaned text output to get the same content with all invisible characters stripped.

Why invisible Unicode characters matter

Text copied from web pages, documents, or AI-generated output often contains invisible Unicode characters that cause subtle bugs. Zero-width spaces (U+200B) and zero-width joiners (U+200D) break string comparisons, YAML parsing, and code identifiers. Non-breaking spaces (U+00A0) look identical to regular spaces but cause unexpected line-breaking or padding in HTML. Most dangerously, BiDi control characters (U+202A–U+202E) can make code comments appear innocent while hiding malicious logic — the CVE-2021-42574 "Trojan Source" attack exploited exactly this in major code editors.

Risk levels explained

Critical — Unicode tag characters (U+E0001–U+E007F): completely invisible, used in prompt-injection attacks against AI systems. Remove immediately. High — BiDi control characters: can reorder displayed text to deceive code reviewers (Trojan Source CVE-2021-42574). Medium — Zero-width joiners/non-joiners and BOM: alter text rendering and can break string matching in source code. Low — Non-breaking spaces, soft hyphens, and typographic spaces: usually benign but may cause unexpected behavior in YAML, Markdown, or shell scripts.

常见问题

What invisible Unicode characters does this tool detect?
It detects zero-width spaces (U+200B), zero-width joiners (U+200D), zero-width non-joiners (U+200C), BiDi control characters (U+202A–U+202E, U+2066–U+2069), left-to-right and right-to-left marks (U+200E, U+200F), non-breaking spaces (U+00A0), soft hyphens (U+00AD), word joiners (U+2060), BOM/ZWNBSP (U+FEFF), and Unicode tag characters (U+E0001–U+E007F) used in invisible prompt-injection attacks.
Does my text leave my device?
No. All analysis runs entirely in your browser using JavaScript. Your text is never sent to any server.
Why are BiDi override characters marked as high risk?
Bidirectional (BiDi) override characters (U+202A–U+202E, U+2066–U+2069) can make source code appear to do one thing while actually doing another — this is the CVE-2021-42574 "Trojan Source" vulnerability. Code reviewers and editors that render text visually may miss logic hidden in comments or strings.
What are Unicode tag characters and why are they dangerous?
Tag characters (U+E0001–U+E007F) are completely invisible in most editors and have been used in "prompt injection" attacks against AI systems — hidden instructions embedded in text that language models process but humans cannot see. This tool flags them with a Critical risk rating.
What does the cleaned text output contain?
The cleaned version strips every detected invisible character while keeping all visible characters and standard printable spaces intact.

Last updated:

反馈这个工具的问题