You copy text from a PDF, a website or a Word document — and then paste it somewhere else. Instead of clean text, you get weird symbols, double spaces, random line breaks in the middle of sentences and formatting that refuses to cooperate. This happens to everyone, and it's not your fault. It's a technical mismatch between different text encoding and formatting systems.
Here's what's actually going on — and how to fix it.
Text is more than the letters you can see. Documents carry invisible formatting codes alongside every character — spacing rules, font information, line-ending conventions and special character encodings. When you copy text from one system and paste it into another, those invisible codes either conflict with the new system or appear as garbled characters.
The specific problems differ depending on where the text came from:
PDFs don't store text linearly — they store character positions on a page. When you copy-paste from a PDF, line breaks are inserted at each visual line end, even in the middle of sentences. Hyphenated words that were split across lines stay split. Columns merge awkwardly. Non-standard fonts may map characters incorrectly, turning certain letters into symbols or question marks.
Microsoft Word uses "smart quotes" (curly " " ' ') rather than straight ones (" "). When pasted into plain-text editors, databases or code, smart quotes often appear as “ or similar — a classic UTF-8 encoding collision. Word also uses non-breaking spaces (hard to detect visually) and em-dashes that encode differently from plain hyphens.
Copying from web pages brings along HTML structure: paragraph tags become double line breaks, list items gain their own line break, hyperlinks lose their URL but retain their anchor text, and styled text (bold, italic) may paste as plain or retain markdown-style asterisks depending on the receiving editor.
Autocorrect and autocapitalise run silently on mobile text input. Pasted text from phone keyboards often has incorrectly capitalised words mid-sentence, smart apostrophes, and extra spaces inserted around punctuation.
For a handful of problematic characters, find-and-replace works fine. For large volumes of text — a 50-page PDF report, a scraped webpage, a client's Word document — manual cleaning is too slow and error-prone. A text cleaner processes the entire document at once, applying multiple fixes in sequence:
The result is clean, portable plain text that will paste cleanly anywhere — a CMS, a database, a code editor or another document.
Paste messy text and get clean text back in one click — removes extra spaces, broken line breaks, smart quotes and encoding issues.
Open Text Cleaner →