Remove Duplicate Lines from Any Text
Paste a list, log, or any block of text and strip out repeated lines in one click. Order-preserving by default, with toggles for case-insensitive comparison, whitespace trimming, and keep-first or keep-last selection. Every keystroke processed locally in your browser.
How it works
The tool is built around three standard-library primitives: String.prototype.split to break the input into lines, Set to remember which canonical keys have already been seen, and Array.prototype.join to stitch the kept lines back together. Nothing else is on the critical path — no third-party library, no network call, no server.
- Split. The input is split on the regular expression
/\r?\n/, which accepts every line terminator the web platform recognises — LF (Unix/macOS), CRLF (Windows), and any mix of the two. A single trailing newline is detected and stripped before splitting so a file like"a\nb\n"is reported as two lines, not three. - Canonicalise. Each line is turned into a comparison key. If the Trim toggle is on, leading and trailing whitespace are removed from the key. If the Case-sensitive toggle is off, the key is lower-cased using ECMA-262
String.prototype.toLowerCase(Unicode simple case folding). The original line text is kept unchanged — only the key used for comparison is transformed. - Walk. The lines are scanned once in the chosen direction (left-to-right for Keep first, right-to-left for Keep last). For each line the key is looked up in a
Set; if absent, the key is added and the original line is recorded as kept; if present, the line is dropped. With Keep blanks enabled, blank lines short-circuit this check so they never participate in the set. Average complexity is O(n) — a two-million-character paste finishes in tens of milliseconds. - Re-emit.The kept lines are joined with the input's predominant line terminator (CRLF if any CRLF was present in the original, otherwise LF), and a single trailing newline is re-added if the input had one. The output is byte-identical to your input when no duplicates are found, which keeps diffs minimal for round-trip editing.
- Cross-check. A second algorithm — a frequency
Mapthat counts each canonical key — computes the removed-count independently. The Verified badge stays green only when both algorithms agree. Two paths agreeing on every input is a strong signal that the output you see is correct.
One deliberate non-feature: the tool does notapply Unicode normalisation (NFC) before comparing. If you paste the letter "é" once as the precomposed code point U+00E9 and once as the decomposed pair U+0065 U+0301, the lines remain distinct. That mirrors what every text editor shows you and avoids silently collapsing intentionally-different encodings — which would be the wrong default for source code and data processing tasks.
One thing that isUnicode-aware: case folding works across scripts. Comparing "Café" against "café" with case-insensitive mode on correctly treats them as duplicates because toLowerCase handles the accented letter the same way it handles ASCII.
Worked examples
Frequently asked questions
Sources & references
- ECMA-262 — Set objects (insertion-ordered iteration)
- ECMA-262 — String.prototype.split (regex separator semantics)
- ECMA-262 — String.prototype.toLowerCase (Unicode case folding)
- WHATWG Encoding Standard — line terminators (LF, CR, CRLF)
- Unicode TR-10 — Unicode Collation Algorithm (informative)
- MDN — Set reference (Set.prototype.add, has, size)
The dedupe semantics and line-ending handling on this page were last cross-checked on 2026-05-11. The page is reviewed whenever a TC39 proposal changes the relevant String or Set algorithms.
Related tools
Comments & feedback
Spotted a bug or want an improvement? Tell us — our team reviews every comment, and good ideas get built. Comments are public and anonymous.
Found an edge case, an unexpected count, or want a new toggle?
Email me at [email protected] — most fixes ship within 24 hours.