CSV Diff

Compare two CSV files by rows or a key column.

Overview

The CSV diff tool compares two CSV files and shows you what changed — added rows, removed rows, and modified rows — either by exact row equality or by joining on a key column. It is the answer to "what changed between yesterday's export and today's?" without having to load both files into a spreadsheet and squint at them side by side.

Sales operations teams reconciling CRM exports, data engineers validating pipeline output, and QA testers checking fixture changes reach for this every time a wide table evolves. Long-tail searches that lead here include "compare two CSV files online", "diff CSV by id column", and "find added and removed rows between CSVs".

How it works

The parser follows the RFC 4180 dialect: comma delimiters, optional double quoting, and "" escaping inside quoted values. With a key column selected, rows from the left and right files are matched on that key; rows that exist on only one side are reported as added or removed, and rows that match by key but differ in any other field are reported as modified with the per-column changes highlighted.

Without a key column, the comparison falls back to whole-row equality: any row in the right file that is missing from the left is "added", and vice versa. The headers are compared too — a column appearing only on one side is surfaced explicitly so you do not miss a schema change.

Examples

Diff customers_yesterday.csv and customers_today.csv on customer_id to see new signups and churned accounts.
Compare a fixture file against its updated version to confirm a test data change is intentional.
Catch a renamed header (email_address → email) before it breaks a downstream consumer.
Verify that a deduplication script removed exactly the duplicate rows you expected.

FAQ

Do the two files need identical column orders?
No. Columns are matched by header name, not position, so re-ordered columns are not reported as a diff.

What if the key column has duplicate values?
The first occurrence wins on each side. If your "key" is not actually unique, consider a composite key or pre-deduplicate before diffing.

Are leading and trailing spaces significant?
Yes, by default. A field with a trailing space is treated as different from the same field without it — this catches subtle data-quality issues that would otherwise stay hidden.

Does it handle very large files?
Both files are read into memory for the diff, so very large exports (millions of rows) are better handled with a streaming command-line tool. Typical CRM and analytics exports comfortably fit.

What about different delimiters?
The diff assumes commas. Convert semicolon- or tab-delimited files to standard CSV first, or normalise them with a separate dialect-conversion step.

Try CSV Diff