HTML Table Extractor

Extract tabular data from pasted HTML tables.

Overview

The HTML table extractor pulls tabular data from HTML markup and emits CSV. Paste the raw <table> markup from a web page — or the page itself — and get back a clean rectangular dataset ready for a spreadsheet or database.

It's a quick alternative to scraping with a library or Excel's web-import feature when you just want one table from one page. Data analysts, journalists, and anyone copying league standings, financial tables, or reference data out of a webpage use an html table to csv converter to skip the manual cell-by-cell copy.

How it works

The extractor parses the HTML with a tolerant parser (similar to what browsers do), so missing closing tags and other malformed markup don't stop it. Every <table> element is processed; if there's more than one, each becomes its own CSV.

Row spans (rowspan) and column spans (colspan) are flattened so the output is rectangular: a cell that spans three rows is repeated in each of those rows, and a cell that spans two columns is repeated across both. Nested tables and inline formatting (<b>, <i>, <span>) are stripped to their text content. Embedded <br> tags become newlines inside the CSV cell, preserved with proper RFC 4180 quoting.

Examples

<table>
  <thead><tr><th>City</th><th>Pop</th></tr></thead>
  <tbody>
    <tr><td>Berlin</td><td>3,769,495</td></tr>
    <tr><td>Lisbon</td><td>545,796</td></tr>
  </tbody>
</table>

Output CSV:
City,Pop
Berlin,"3,769,495"
Lisbon,"545,796"

Row with colspan="2" in source:
"Total Q1",,$5000

FAQ

What if the page has multiple tables?

Each <table> is extracted as its own CSV. The output lists them in document order with a separator line so you can copy the one you need.

Are commas inside cell values handled?

Yes — embedded commas force the cell to be quoted per RFC 4180, so a value like 3,769,495 is wrapped in double quotes and reads back correctly in any spreadsheet.

Does it follow links or fetch live URLs?

No. Paste the HTML directly. For URL-based extraction, fetch the page yourself first — the tool intentionally avoids outbound network calls.

Try HTML Table Extractor