Pull tables out of any text-based PDF — bank statements, ledgers, GST returns, vendor invoices — into accounting-software-ready CSV. Auto-detected rows and columns; live preview so you can tune row / column tolerance before exporting.
01 — What you create
Auto-detect rows and columns from any text-based PDF. Tune row / column tolerance with a live preview before exporting. Choose your delimiter (comma / semicolon / tab / pipe) and encoding for instant import into your accounting tool.
vendor-ledger-q1.csv
Extracted from 7-page PDF · 142 rows · 6 columns
UTF-8 · comma-delimited
| Date | Vendor | Invoice # | Amount | GST | Total |
|---|---|---|---|---|---|
| 02-Apr-26 | Westline Hardware | WL-2604-022 | 1,42,200 | 25,596 | 1,67,796 |
| 04-Apr-26 | BlueDart Surface | BD-0408-117 | 4,500 | 810 | 5,310 |
| 08-Apr-26 | Crossword Books | CW-0418-088 | 2,240 | 0 | 2,240 |
| 12-Apr-26 | IndiGo Airlines | IG-7741 | 8,420 | 1,180 | 9,600 |
| 15-Apr-26 | Trident Hotels | TR-2025-44 | 18,900 | 2,268 | 21,168 |
| 18-Apr-26 | Adobe Inc | ADOBE-4421 | 1,240 | 223 | 1,463 |
| 22-Apr-26 | AWS Marketplace | AWS-MAY-26 | 11,200 | 2,016 | 13,216 |
+ 135 more rows across the full CSV, ready to import into Xero / Tally / QuickBooks
Scanned invoices, multi-page batches, multi-currency stacks, and direct push into your accounting system. Free for 30 days, no card required.
Try Premium FreeFree 30 days · no credit card · cancel anytime
02 — How it works
Most accounting tools accept CSV imports but most data lives in PDFs. This tool bridges the gap — auto-detected rows and columns, live preview to verify, then a one-click clean export. Use it for monthly statement imports, GST reconciliations, vendor-ledger transfers, and any "PDF only, sorry" data source.
Drag a bank statement, ledger, GST return, or vendor invoice into the picker. The tool reads its text layer in the browser — nothing uploads.
A live preview shows the detected rows and columns. Bump row / column tolerance up for sparse layouts, down for dense ones. Pick where the header row sits.
One click writes a CSV (or TSV / pipe / semicolon-delimited file) in UTF-8, with optional BOM for Excel. Imports straight into Xero, QuickBooks, Tally, or any spreadsheet.
03 — Built for accounting
Groups text items by y-coordinate (rows) and clusters x-positions (columns) per page. Works on most text-based PDFs without manual column anchors.
See the first page rendered as a table the moment you tune tolerances. No need to export to find out the detection went wrong.
Comma (CSV), semicolon (EU CSV), tab (TSV), pipe — pick whatever your downstream tool expects. The file extension auto-matches.
Default is plain UTF-8. Flip on the BOM if you're opening the file in older Excel on Windows so non-ASCII characters render correctly.
All pages into one CSV (default), per-page sections with headings, or a custom range like "1-3, 5, 9-end". Page-local column anchors per page.
PDFs, text items, and the generated CSV all stay on your machine. Extraction runs via pdfjs locally — no upload, no third-party APIs, no logging.
Bulk OCR, batch invoicing, multi-party e-signing, redaction, audit logs — pdfFiller picks up where Sonchoy ends. Free for 30 days, no credit card.
Run 100+ invoices, statements, or conversions in one go.
Turn paper invoices into searchable, exportable data.
Multi-party signatures with full audit trails.
Mask sensitive ledger lines before sending to auditors.
04 — Common questions
No — this tool needs a text layer to extract. If the PDF was created by scanning paper documents (image-only PDF), the extractor will flag it as scanned and stop. For scans, you need an OCR step first; the pdfFiller premium tier handles that. Most modern PDFs (statements emailed by banks, invoices generated by accounting tools) have text layers and work fine.
For each page, the tool collects every text item with its (x, y) coordinates. Items with similar y are grouped into rows. Then x-positions across the whole page are clustered (using your column tolerance) to discover where columns sit. Each item in each row is bucketed into its nearest column. This works on most uniformly-laid-out tables; weirdly-formatted PDFs may need tolerance tweaks.
That usually means row tolerance is too tight — items that should share a row are landing on separate rows because they're a couple of points off vertically. Bump row tolerance from Tight to Normal, or Normal to Loose. The live preview updates instantly.
Opposite problem — column tolerance is too tight, so single columns are getting split. Move column tolerance up from Tight to Normal or Loose. If two visually-separate columns keep merging together, drop it back down.
Comma (CSV) is the standard. Semicolon (EU CSV) is required in countries that use comma as the decimal separator (Germany, France, etc.) — Excel parses semicolon-delimited files there by default. TSV (tab) is great when your data contains lots of commas (vendor names, addresses) since tabs are far less likely to appear in values. Pipe is the same idea, even safer.
Never. The PDF is read into memory, parsed by pdfjs locally, tabularised in JavaScript, serialised to CSV in your browser, and saved via the standard file-download mechanism. No upload, no third-party API, no logging.
05 — Related tools
Structured statement extractor with debit / credit columns.
Line items, totals, tax columns from any invoice.
Every table from any PDF, column types preserved.
Strip the password first, then extract to CSV.