Drop a JPG / PNG of a printed invoice. Tesseract OCR runs in your browser, then a field detector pulls out invoice number, PO ref, dates, vendor, buyer, tax IDs, subtotal, tax, and total — all into a clean three-sheet .xlsx (Summary · Amounts · Raw text).
01 — What you create
Tesseract OCR runs locally, a field detector pulls out invoice number / PO / dates / vendor / buyer / tax IDs / subtotal / tax / total, and the result is a three-sheet .xlsx workbook (Summary, Amounts, Raw text) — ready for an accountant to review.
invoice-INV-2026-0042.xlsx
3 sheets · Summary · Amounts · Raw text
field-fill 82% · OCR 91%
SUMMARY SHEET
Scanned invoices, multi-page batches, multi-currency stacks, and direct push into your accounting system. Free for 30 days, no card required.
Try Premium FreeFree 30 days · no credit card · cancel anytime
02 — How it works
Most vendor invoices arrive as PDFs you can copy text from — the standard Invoice PDF → Excel tool is the right fit. This tool is for the harder case: a scanned or photographed paper invoice with no text layer. OCR plus a field detector gets you 80% of the way; spot-check the remaining 20%.
JPG / PNG of a printed invoice — phone photo or scan. The image stays on your machine.
Tesseract reads the text locally; a field detector pulls out invoice #, dates, vendor, buyer, tax IDs, subtotal / tax / total. Confidence shown for both passes.
One click writes a three-sheet .xlsx: Summary (header fields), Amounts (every detected currency value), Raw text (full OCR output, one line per row).
03 — Built for AP teams
Tesseract runs locally via WebAssembly. Your invoice image bytes never touch a server.
Regex passes for invoice number, PO ref, issue date, due date, vendor, buyer, GST / VAT / EIN / TIN / PAN tax IDs, contact info, subtotal, tax, and total.
Summary (header fields ready to import), Amounts (every detected currency value with position), Raw text (one row per OCR line for audit).
Subtotal, tax, and total land as real number cells in the Summary sheet, so SUM and AVG formulas just work.
OCR confidence (how confident Tesseract is about the recognition) and field-fill confidence (how many of the key invoice fields were populated).
Image, OCR, field detection, and workbook assembly all run locally. Tesseract.js loads from a public CDN on first use; that's the only network step.
Bulk OCR, batch invoicing, multi-party e-signing, redaction, audit logs — pdfFiller picks up where Sonchoy ends. Free for 30 days, no credit card.
Run 100+ invoices, statements, or conversions in one go.
Turn paper invoices into searchable, exportable data.
Multi-party signatures with full audit trails.
Mask sensitive ledger lines before sending to auditors.
04 — Common questions
Invoice PDF → Excel works on PDFs with a text layer (most digitally-created invoices). It's fast and accurate. This OCR tool works on images and scanned invoices that have no text layer — it has to read the pixels first. OCR is slower and less accurate than reading a text layer, so use the PDF tool whenever possible and only fall back to this one for true paper invoices.
On clean printed invoices, 80–90% of key fields land correctly on first pass. Invoice number, issue date, vendor, total are usually right. Buyer, tax IDs, and PO refs depend heavily on the invoice layout; some templates put them in places the regex passes don't look. Always review the Summary sheet before importing into accounting.
Clean printed invoices with labelled fields ("Invoice #:", "Date:", "Total:") work very well. Highly stylised "designy" invoices with non-standard labels score lower. Scanned faxed invoices score lower still. Phone-photo invoices under good lighting are fine; tilted, blurry, or shadow-heavy phones photos significantly hurt OCR accuracy.
Yes — the Summary sheet stores them as number-typed cells so SUM and AVG formulas work. The currency is captured in the adjacent cell as text. If the detector got confused (e.g., picked up a line-item total instead of the grand total), the cell will be wrong but still typed correctly.
Every currency-prefixed value the OCR detected, with its source position in the raw text. Useful for cross-checking: if the Summary sheet shows the wrong total, look at the Amounts sheet to find the correct one and copy it over.
Tesseract.js runs entirely in your browser via WebAssembly. The OCR engine itself is loaded from a public CDN (jsDelivr) on first use — that's a one-time engine download, not an image upload. Your invoice image, the recognition pass, the field detection, and the output workbook all stay on your machine.
05 — Related tools