CSV Column Statistics
Per-column distinct count, blanks, min / max / average, most common value.
Overview
CSV column statistics summarises every column in a pasted CSV — count of distinct values, count of blanks, min and max, average for numeric columns, and the most common value. It's a one-glance profile of a dataset before you load it into a database or pivot it.
Analysts, data engineers, and anyone wrangling a fresh export reach for a csv profiler to catch surprises early: a column that's supposed to be numeric but has free-text entries, a join key with unexpected duplicates, or a date column with several formats. It saves the back-and-forth of opening the file in Excel just to scan a few columns.
How it works
The tool parses the CSV per RFC 4180, treating the first row as headers. For each column it tallies non-blank values, builds a distinct-value frequency map, and tracks min/max/sum/count for numeric data. A value is treated as numeric if double.TryParse succeeds under invariant culture.
The most common value is the mode of the frequency map. Ties are broken by first-seen order so the output is stable across reruns of the same input.
Examples
Input:
country,age,score
US,30,87.5
US,42,91.0
CA,35,
US,30,76.2
Output:
country distinct=2 blanks=0 mode=US (3)
age distinct=3 blanks=0 min=30 max=42 avg=34.25 mode=30 (2)
score distinct=3 blanks=1 min=76.2 max=91.0 avg=84.9
Input:
status
active
active
churned
trial
Output:
status distinct=3 blanks=0 mode=active (2)
FAQ
How are blank cells defined?
A cell counts as blank if it's an empty string or whitespace only. Explicit NULL text is treated as a value, not a blank — convert it first if you want it counted as missing.
Can it handle very large CSVs?
The full file is processed in memory so distinct counts are exact. For multi-gigabyte data the browser tab will run out of memory; in that case sample the file first with a head command and profile a representative slice.
Does it detect column types automatically?
It infers numeric vs. text by trying to parse each value. For a richer breakdown (date, integer, decimal, boolean) use the CSV type sniffer.