HTML Table Extractor
Extract tabular data from pasted HTML tables.
Overview
The HTML table extractor pulls tabular data from HTML markup and emits CSV. Paste the raw <table> markup from a web page — or the page itself — and get back a clean rectangular dataset ready for a spreadsheet or database.
It's a quick alternative to scraping with a library or Excel's web-import feature when you just want one table from one page. Data analysts, journalists, and anyone copying league standings, financial tables, or reference data out of a webpage use an html table to csv converter to skip the manual cell-by-cell copy.
How it works
The extractor parses the HTML with a tolerant parser (similar to what browsers do), so missing closing tags and other malformed markup don't stop it. Every <table> element is processed; if there's more than one, each becomes its own CSV.
Row spans (rowspan) and column spans (colspan) are flattened so the output is rectangular: a cell that spans three rows is repeated in each of those rows, and a cell that spans two columns is repeated across both. Nested tables and inline formatting (<b>, <i>, <span>) are stripped to their text content. Embedded <br> tags become newlines inside the CSV cell, preserved with proper RFC 4180 quoting.
Examples
<table>
<thead><tr><th>City</th><th>Pop</th></tr></thead>
<tbody>
<tr><td>Berlin</td><td>3,769,495</td></tr>
<tr><td>Lisbon</td><td>545,796</td></tr>
</tbody>
</table>
Output CSV:
City,Pop
Berlin,"3,769,495"
Lisbon,"545,796"
Row with colspan="2" in source:
"Total Q1",,$5000
FAQ
What if the page has multiple tables?
Each <table> is extracted as its own CSV. The output lists them in document order with a separator line so you can copy the one you need.
Are commas inside cell values handled?
Yes — embedded commas force the cell to be quoted per RFC 4180, so a value like 3,769,495 is wrapped in double quotes and reads back correctly in any spreadsheet.
Does it follow links or fetch live URLs?
No. Paste the HTML directly. For URL-based extraction, fetch the page yourself first — the tool intentionally avoids outbound network calls.