Sentence Splitter
Split paragraphs into sentences.
Overview
Split paragraphs into individual sentences, one per line. The splitter handles common end-of-sentence punctuation (period, question mark, exclamation mark) and is careful around abbreviations ("Dr.", "U.S.A.", "etc.") that look like sentence ends but aren't.
NLP practitioners pre-processing a corpus, transcribers cleaning up auto-captioned speech, researchers building training data, and content writers reformatting long-form prose into more readable lines all reach for it. It's a small but surprisingly hard task once you account for abbreviations and quoted speech.
How it works
A naive splitter just breaks on ., !, and ?. A real one needs to handle dozens of edge cases: titles ("Mr. Smith"), initials ("J.R.R. Tolkien"), abbreviations ("Inc.", "vs.", "i.e."), ellipses, quotation marks around full sentences, and dates ("Sept. 10, 1999"). The tool uses a combination of lookup tables for known abbreviations, capitalisation hints (a period followed by a capital starting a new word usually marks a sentence end), and language-specific heuristics.
Examples
Input: Dr. Smith arrived. He said "Hello!" then sat down. Was it cold?
Output:
Dr. Smith arrived.
He said "Hello!" then sat down.
Was it cold?
Input: I love coffee, tea, etc. Especially in winter.
Output:
I love coffee, tea, etc.
Especially in winter.
FAQ
Will it split on every period?
No. Periods inside known abbreviations, decimal numbers, and ellipses are recognized and not treated as sentence boundaries.
What about languages other than English?
The default heuristics are tuned for English. Other languages use different punctuation conventions (Spanish inverted question marks, Chinese full-width periods); language-specific modes handle those when available.
How does it handle quotes?
Sentence-final punctuation inside a closing quotation mark counts as a sentence boundary. The tool tries to keep the quote with its enclosing sentence when possible.