PDF to CSV — Free Online Tool

01 — What you create

Tabular PDF in → spreadsheet CSV out.

Auto-detect rows and columns from any text-based PDF. Tune row / column tolerance with a live preview before exporting. Choose your delimiter (comma / semicolon / tab / pipe) and encoding for instant import into your accounting tool.

CSV Form

All pages · UTF-8

Source PDF

vendor-ledger-q1.pdf · 7 pages

Page mode

All pages, one CSV

Row tolerance

Normal (4pt)

Column tolerance

Normal (10pt)

Header row

First detected row

Delimiter

Comma (CSV)

Encoding

UTF-8

Output base

vendor-ledger-q1

Output142 rows · 6 cols

OUTPUT.CSV

Import-ready

vendor-ledger-q1.csv

Extracted from 7-page PDF · 142 rows · 6 columns

UTF-8 · comma-delimited

Date	Vendor	Invoice #	Amount	GST	Total
02-Apr-26	Westline Hardware	WL-2604-022	1,42,200	25,596	1,67,796
04-Apr-26	BlueDart Surface	BD-0408-117	4,500	810	5,310
08-Apr-26	Crossword Books	CW-0418-088	2,240	0	2,240
12-Apr-26	IndiGo Airlines	IG-7741	8,420	1,180	9,600
15-Apr-26	Trident Hotels	TR-2025-44	18,900	2,268	21,168
18-Apr-26	Adobe Inc	ADOBE-4421	1,240	223	1,463
22-Apr-26	AWS Marketplace	AWS-MAY-26	11,200	2,016	13,216

+ 135 more rows across the full CSV, ready to import into Xero / Tally / QuickBooks

02 — How it works

From tabular PDF to accounting-ready CSV.

Most accounting tools accept CSV imports but most data lives in PDFs. This tool bridges the gap — auto-detected rows and columns, live preview to verify, then a one-click clean export. Use it for monthly statement imports, GST reconciliations, vendor-ledger transfers, and any "PDF only, sorry" data source.

Drop the PDF

Drag a bank statement, ledger, GST return, or vendor invoice into the picker. The tool reads its text layer in the browser — nothing uploads.

Tune the detection

A live preview shows the detected rows and columns. Bump row / column tolerance up for sparse layouts, down for dense ones. Pick where the header row sits.

Export the CSV

One click writes a CSV (or TSV / pipe / semicolon-delimited file) in UTF-8, with optional BOM for Excel. Imports straight into Xero, QuickBooks, Tally, or any spreadsheet.

03 — Built for accounting

Extract tables — properly.

Auto row & column detection

Groups text items by y-coordinate (rows) and clusters x-positions (columns) per page. Works on most text-based PDFs without manual column anchors.

Live preview

See the first page rendered as a table the moment you tune tolerances. No need to export to find out the detection went wrong.

4 delimiter formats

Comma (CSV), semicolon (EU CSV), tab (TSV), pipe — pick whatever your downstream tool expects. The file extension auto-matches.

UTF-8 + BOM option

Default is plain UTF-8. Flip on the BOM if you're opening the file in older Excel on Windows so non-ASCII characters render correctly.

Page-mode flexibility

All pages into one CSV (default), per-page sections with headings, or a custom range like "1-3, 5, 9-end". Page-local column anchors per page.

100% in browser

PDFs, text items, and the generated CSV all stay on your machine. Extraction runs via pdfjs locally — no upload, no third-party APIs, no logging.

PdfFiller · 30-Day Free Trial

When one-off documents aren't enough.

Bulk OCR, batch invoicing, multi-party e-signing, redaction, audit logs — pdfFiller picks up where Sonchoy ends. Free for 30 days, no credit card.

Try Premium FreeNo card · Cancel anytime

Batch & bulk

Run 100+ invoices, statements, or conversions in one go.

OCR scanned PDFs

Turn paper invoices into searchable, exportable data.

E-sign & request

Multi-party signatures with full audit trails.

Redact & approve

Mask sensitive ledger lines before sending to auditors.

04 — Common questions

Everything about extracting tables.

01Does this work on scanned PDFs?

No — this tool needs a text layer to extract. If the PDF was created by scanning paper documents (image-only PDF), the extractor will flag it as scanned and stop. For scans, you need an OCR step first; the pdfFiller premium tier handles that. Most modern PDFs (statements emailed by banks, invoices generated by accounting tools) have text layers and work fine.

02How does the column detection actually work?

For each page, the tool collects every text item with its (x, y) coordinates. Items with similar y are grouped into rows. Then x-positions across the whole page are clustered (using your column tolerance) to discover where columns sit. Each item in each row is bucketed into its nearest column. This works on most uniformly-laid-out tables; weirdly-formatted PDFs may need tolerance tweaks.

03My table came out with rows that should be cells. What now?

That usually means row tolerance is too tight — items that should share a row are landing on separate rows because they're a couple of points off vertically. Bump row tolerance from Tight to Normal, or Normal to Loose. The live preview updates instantly.

04My table came out with too many columns. What now?

Opposite problem — column tolerance is too tight, so single columns are getting split. Move column tolerance up from Tight to Normal or Loose. If two visually-separate columns keep merging together, drop it back down.

05What's the difference between comma, semicolon, and TSV?

Comma (CSV) is the standard. Semicolon (EU CSV) is required in countries that use comma as the decimal separator (Germany, France, etc.) — Excel parses semicolon-delimited files there by default. TSV (tab) is great when your data contains lots of commas (vendor names, addresses) since tabs are far less likely to appear in values. Pipe is the same idea, even safer.

06Does my data leave the browser?

Never. The PDF is read into memory, parsed by pdfjs locally, tabularised in JavaScript, serialised to CSV in your browser, and saved via the standard file-download mechanism. No upload, no third-party API, no logging.

Tabular PDF in, clean CSV
out.