Drop any text-based PDF — ledger, statement, GST return, vendor invoice — and the tool detects rows and columns, coerces numbers to number cells, and exports a proper .xlsx workbook. One sheet per page, or all pages flattened.
01 — What you create
Auto-detect rows and columns from any text-based PDF, then export a real .xlsx — not just a CSV with a different extension. Numbers come through as number cells; one sheet per page or all pages flattened, your choice.
vendor-ledger-q1.xlsx
Extracted from 7-page PDF · 142 rows · 6 columns · numbers as numbers
Sheet: Extracted
| Date | Vendor | Invoice # | Amount | GST | Total |
|---|---|---|---|---|---|
| 02-Apr-26 | Westline Hardware | WL-2604-022 | 142,200 | 25,596 | 167,796 |
| 04-Apr-26 | BlueDart Surface | BD-0408-117 | 4,500 | 810 | 5,310 |
| 08-Apr-26 | Crossword Books | CW-0418-088 | 2,240 | 0 | 2,240 |
| 12-Apr-26 | IndiGo Airlines | IG-7741 | 8,420 | 1,180 | 9,600 |
| 15-Apr-26 | Trident Hotels | TR-2025-44 | 18,900 | 2,268 | 21,168 |
| 18-Apr-26 | Adobe Inc | ADOBE-4421 | 1,240 | 223 | 1,463 |
| 22-Apr-26 | AWS Marketplace | AWS-MAY-26 | 11,200 | 2,016 | 13,216 |
+ 135 more rows · numeric cells are real numbers (SUM, AVG, sort all work)
Scanned invoices, multi-page batches, multi-currency stacks, and direct push into your accounting system. Free for 30 days, no card required.
Try Premium FreeFree 30 days · no credit card · cancel anytime
02 — How it works
The difference between a CSV export and a proper .xlsx is whether the recipient can immediately SUM a column. This tool coerces numbers to number cells so the workbook is genuinely editable — pivot it, sort it, chart it without manual re-typing.
Drag a ledger, statement, report, or vendor invoice in. The tool reads its text layer locally with pdfjs — nothing uploads.
Live preview shows the table the moment you change settings. Cells flagged in green become real number cells in the output.
One click writes a proper .xlsx — one sheet per page or all pages combined — with a _meta sheet recording how the extraction was configured.
03 — Built for accounting
Currency-prefixed, comma-thousands, EU-decimal, accounting-negative, percentage — all detected and upgraded to number-typed cells. SUM and AVG work without manual cleanup.
Render every page as its own named sheet (great for multi-section reports), or flatten everything into one sheet (great for long tables that span pages).
See the first page rendered as a table the moment you tune tolerances. Cells in green will become real numbers in the output — adjust before exporting.
Skip the cover or appendices by extracting just "2-5, 7" — page-local column anchors per page so each section's columns line up.
Every output workbook includes a tiny _meta sheet recording the source file, mode, tolerance, header setting, and number-coercion choice. Reproducibility for free.
PDFs and the assembled workbook never touch the network. Extraction runs via pdfjs, output assembled via SheetJS — entirely locally.
Bulk OCR, batch invoicing, multi-party e-signing, redaction, audit logs — pdfFiller picks up where Sonchoy ends. Free for 30 days, no credit card.
Run 100+ invoices, statements, or conversions in one go.
Turn paper invoices into searchable, exportable data.
Multi-party signatures with full audit trails.
Mask sensitive ledger lines before sending to auditors.
04 — Common questions
Both use the same table detector under the hood. CSV is a flat text format; .xlsx is a real workbook with typed cells, multiple sheets, column widths, and number formats. Use PDF to CSV when the downstream tool expects CSV (most accounting systems). Use PDF to Excel when you want a real spreadsheet you can open, sort, pivot, and edit in Excel / Numbers / Google Sheets without further cleanup.
Those are the cells the number-coercion step will upgrade to actual number cells in the output. Currency-prefixed values ("INR 4,521.50"), accounting negatives ("(1,200.00)"), and percentages ("12.5%") all get detected and converted. Cells in default colour stay as strings.
No — this tool needs a text layer. Scanned (image-only) PDFs need an OCR step first; the pdfFiller premium tier handles that. Most modern PDFs (statements, invoices, GST returns) have text layers and work fine.
A small one-page sheet at the end of the workbook that records the source PDF name, page count, mode, tolerance, header setting, and number-coercion choice. Useful for audit trails ("how was this extracted?") and for sharing with colleagues who need to know whether to trust the numbers.
Two knobs: row tolerance (if too tight, items that should share a row land on separate rows; loosen it) and column tolerance (if too tight, single columns get split into many; loosen it). The live preview shows the result instantly so you can iterate without exporting.
Never. The PDF is read with pdfjs locally, tabularised in JavaScript, and serialised to .xlsx with SheetJS locally. The browser triggers the download. No upload, no third-party API, no logging.
05 — Related tools