Free Tool

Duplicate Content Deduplicator

Upload your Screaming Frog Near Duplicates or Semantically Similar CSV. Mirror pairs are removed, URLs are clustered, and you get a clean export in seconds. Everything runs in your browser.

Your CSV is processed entirely in your browser. Nothing is uploaded to any server.
Clusters
Features

What This Tool Does

Screaming Frog exports duplicate pairs in both directions. This tool fixes that and adds clustering.

Removes Mirror Pairs

If page A is similar to B, Screaming Frog creates rows for both A→B and B→A. This tool keeps only one, cutting your spreadsheet in half.

Clusters by Primary URL

URLs with the most similarity connections become cluster primaries. All their similar pages are grouped beneath them, so you see the full picture at a glance.

Clean CSV Export

Export the deduplicated results as a CSV with one row per cluster: primary URL, total similar count, and each similar URL with its similarity score.

How It Works

Three Steps to Clean Data

1

Export from Screaming Frog

In Screaming Frog, go to Content → Near Duplicates or Semantically Similar. Export the report as CSV.

2

Upload the CSV

Drop the file onto this page or click to browse. The tool auto-detects which report type it is.

3

Review & Export

See your clusters, search or sort them, and download the clean CSV with one click.

FAQ

Frequently Asked Questions

It supports two export types: Near Duplicates (columns: Address, Near Duplicate Address, Closest Similarity Match) and Semantically Similar (columns: Address, Closest Semantically Similar Address, Semantic Similarity Score). Both come from Screaming Frog SEO Spider's content analysis reports.
When Screaming Frog finds that pages A and B are similar, it creates two rows: one showing A is similar to B, and another showing B is similar to A. This bidirectional reporting doubles your row count and makes analysis harder. This tool removes the mirrors and groups everything cleanly.
The tool builds a graph of unique URL pairs. The URL with the most connections (highest degree) becomes the cluster primary. All its similar URLs are grouped beneath it, sorted by similarity score. This repeats until every pair is assigned to a cluster.
No. Everything runs entirely in your browser using JavaScript. Your CSV data never leaves your machine. There are no server calls, no tracking, and no data storage.
Review each cluster to decide which URL should be the canonical version. For near-duplicate clusters, consider consolidating content, setting up redirects, or adding canonical tags. For semantically similar content, evaluate whether pages are cannibalizing each other's keywords and differentiate them.
Related Tools

More SEO Tools

Need Help With Duplicate Content?

Our tools are just the start. Get a free SEO audit and we'll help you fix duplicate content, cannibalization, and content quality issues.