Convert · Table extraction

Tabular PDF in, clean CSV
out.

Pull tables out of any text-based PDF — bank statements, ledgers, GST returns, vendor invoices — into accounting-software-ready CSV. Auto-detected rows and columns; live preview so you can tune row / column tolerance before exporting.

Explore More Tools
No signup, ever 100% local · nothing uploaded Live preview Esc to close
Auto
Column detection
4
Delimiter formats
Local
100% in browser
Free
Always · no signup

01 — What you create

Tabular PDF in spreadsheet CSV out.

Auto-detect rows and columns from any text-based PDF. Tune row / column tolerance with a live preview before exporting. Choose your delimiter (comma / semicolon / tab / pipe) and encoding for instant import into your accounting tool.

CSV Form
All pages · UTF-8
Source PDF
vendor-ledger-q1.pdf · 7 pages
Page mode
All pages, one CSV
Row tolerance
Normal (4pt)
Column tolerance
Normal (10pt)
Header row
First detected row
Delimiter
Comma (CSV)
Encoding
UTF-8
Output base
vendor-ledger-q1
Output142 rows · 6 cols
OUTPUT.CSV
Import-ready

vendor-ledger-q1.csv

Extracted from 7-page PDF · 142 rows · 6 columns

UTF-8 · comma-delimited

DateVendorInvoice #AmountGSTTotal
02-Apr-26Westline HardwareWL-2604-0221,42,20025,5961,67,796
04-Apr-26BlueDart SurfaceBD-0408-1174,5008105,310
08-Apr-26Crossword BooksCW-0418-0882,24002,240
12-Apr-26IndiGo AirlinesIG-77418,4201,1809,600
15-Apr-26Trident HotelsTR-2025-4418,9002,26821,168
18-Apr-26Adobe IncADOBE-44211,2402231,463
22-Apr-26AWS MarketplaceAWS-MAY-2611,2002,01613,216

+ 135 more rows across the full CSV, ready to import into Xero / Tally / QuickBooks

Need more power?

When this tool isn't enough, pdfFiller takes over.

Scanned invoices, multi-page batches, multi-currency stacks, and direct push into your accounting system. Free for 30 days, no card required.

Try Premium Free

Free 30 days · no credit card · cancel anytime

02 — How it works

From tabular PDF to accounting-ready CSV.

Most accounting tools accept CSV imports but most data lives in PDFs. This tool bridges the gap — auto-detected rows and columns, live preview to verify, then a one-click clean export. Use it for monthly statement imports, GST reconciliations, vendor-ledger transfers, and any "PDF only, sorry" data source.

01

Drop the PDF

Drag a bank statement, ledger, GST return, or vendor invoice into the picker. The tool reads its text layer in the browser — nothing uploads.

02

Tune the detection

A live preview shows the detected rows and columns. Bump row / column tolerance up for sparse layouts, down for dense ones. Pick where the header row sits.

03

Export the CSV

One click writes a CSV (or TSV / pipe / semicolon-delimited file) in UTF-8, with optional BOM for Excel. Imports straight into Xero, QuickBooks, Tally, or any spreadsheet.

03 — Built for accounting

Extract tables — properly.

Auto row & column detection

Groups text items by y-coordinate (rows) and clusters x-positions (columns) per page. Works on most text-based PDFs without manual column anchors.

Live preview

See the first page rendered as a table the moment you tune tolerances. No need to export to find out the detection went wrong.

4 delimiter formats

Comma (CSV), semicolon (EU CSV), tab (TSV), pipe — pick whatever your downstream tool expects. The file extension auto-matches.

UTF-8 + BOM option

Default is plain UTF-8. Flip on the BOM if you're opening the file in older Excel on Windows so non-ASCII characters render correctly.

Page-mode flexibility

All pages into one CSV (default), per-page sections with headings, or a custom range like "1-3, 5, 9-end". Page-local column anchors per page.

100% in browser

PDFs, text items, and the generated CSV all stay on your machine. Extraction runs via pdfjs locally — no upload, no third-party APIs, no logging.

PdfFiller · 30-Day Free Trial

When one-off documents aren't enough.

Bulk OCR, batch invoicing, multi-party e-signing, redaction, audit logs — pdfFiller picks up where Sonchoy ends. Free for 30 days, no credit card.

Try Premium FreeNo card · Cancel anytime

Batch & bulk

Run 100+ invoices, statements, or conversions in one go.

OCR scanned PDFs

Turn paper invoices into searchable, exportable data.

E-sign & request

Multi-party signatures with full audit trails.

Redact & approve

Mask sensitive ledger lines before sending to auditors.

04 — Common questions

Everything about extracting tables.

01Does this work on scanned PDFs?

No — this tool needs a text layer to extract. If the PDF was created by scanning paper documents (image-only PDF), the extractor will flag it as scanned and stop. For scans, you need an OCR step first; the pdfFiller premium tier handles that. Most modern PDFs (statements emailed by banks, invoices generated by accounting tools) have text layers and work fine.

02How does the column detection actually work?

For each page, the tool collects every text item with its (x, y) coordinates. Items with similar y are grouped into rows. Then x-positions across the whole page are clustered (using your column tolerance) to discover where columns sit. Each item in each row is bucketed into its nearest column. This works on most uniformly-laid-out tables; weirdly-formatted PDFs may need tolerance tweaks.

03My table came out with rows that should be cells. What now?

That usually means row tolerance is too tight — items that should share a row are landing on separate rows because they're a couple of points off vertically. Bump row tolerance from Tight to Normal, or Normal to Loose. The live preview updates instantly.

04My table came out with too many columns. What now?

Opposite problem — column tolerance is too tight, so single columns are getting split. Move column tolerance up from Tight to Normal or Loose. If two visually-separate columns keep merging together, drop it back down.

05What's the difference between comma, semicolon, and TSV?

Comma (CSV) is the standard. Semicolon (EU CSV) is required in countries that use comma as the decimal separator (Germany, France, etc.) — Excel parses semicolon-delimited files there by default. TSV (tab) is great when your data contains lots of commas (vendor names, addresses) since tabs are far less likely to appear in values. Pipe is the same idea, even safer.

06Does my data leave the browser?

Never. The PDF is read into memory, parsed by pdfjs locally, tabularised in JavaScript, serialised to CSV in your browser, and saved via the standard file-download mechanism. No upload, no third-party API, no logging.

05 — Related tools

Often used together.

Browse all 91 tools