Any PDF goes in,Typed JSON comes out.
One REST call turns any invoice, receipt, or statement into typed JSON: fields, tables, and relations with per-field confidence and pixel-level traceability, read by our own proprietary extraction engine.
- Proprietary engine
- Enterprise encryption
- US data residency
// measurementPrecision = exact field match on an internal benchmark of 12.4k labeled documents (invoices, contracts, forms); p50 latency per page under mixed loads. SLA observed · 12m is uptime measured across all tiers over the last 12 months; contractual SLA is set per plan (99% to 99.99%). Request the full methodology.
Three steps, one REST call.
Send a document, the engine reads it, you get typed data back. No templates to maintain, no manual review in the loop.
Upload PDF
Drag & drop via dashboard or POST via API. We support PDFs, images, and Word docs with automatic format detection.
Engine reads it
Our proprietary engine identifies tables, key-value pairs, and entities with 98.4% field-level precision, measured on a 12.4k-document benchmark.
Structured Output
Receive clean JSON, Markdown, or CSV, ready to drop straight into your database or LLM context.
Three operations cover most of the document work.
Parsift exposes a small, well-defined set of semantic primitives that you compose into your product's flow. There are no models to train, no prompts to write, and no brittle templates to maintain.
KIEKey Information Extraction
Identifies fields by meaning, not by position. Issuer, dates, amounts, identifiers, and clauses, returned as typed JSON with a per-field confidence score. Works on unseen layouts, mixed languages, and noisy scans.
TablesTable Structure Recognition
Reconstructs tables with merged cells, hierarchical headers, and multi-page breaks. Output in JSON, CSV, or Markdown with traceability back to the original pixels, ready for validation or direct ingestion into your database.
LayoutLayout metadata
Detects paragraphs, lists, signatures, and stamps while preserving reading order. Every element carries coordinates, page, and original order, ideal for reprocessing, auditing, and composing into downstream LLMs.
From PDF to typed JSON in one call.
No custom regex parsers to maintain. A documented REST API you can call from any language. Rotation, skew, and low-quality scans are handled for you.
# Turn an invoice PDF into typed JSON curl -X POST https://api.parsift.com/v1/extract \ -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: multipart/form-data" \ -F "[email protected]" \ -F "schema_type=invoice"
"status": "success",
"data": {
"vendor": "AWS Inc.",
"total": 4500.00
}
}
Typed JSON, every call
Fields, tables, and relations come back with per-field confidence, ready to validate against your own schema.
Webhooks for batch jobs
Submit large documents and get a callback when results are ready, so a long batch never blocks a request.
Pixel-level traceability
Every value carries its coordinates on the page, so you can show users exactly where a number came from.
Pick the plan that matches your document volume.
For developers and small projects shipping their first extractions
- 1,000 pages/month
- Proprietary extraction engine
- Built-in templates for common documents
- REST API access
- JSON export
- Proprietary engine, processing you control
- Email support
For growing teams with steady extraction volume
- 3,000 pages/month
- Everything in Starter
- Add-on page packs: 1,000 pages / $10
- Unlimited templates
- REST API + webhooks
- JSON, CSV and Markdown export
- Schema validation
- 90-day extraction history
- Priority support
For high-volume operations with compliance and deployment requirements
- Unlimited pages/month
- Everything in Pro
- Custom templates
- Guaranteed SLA
- SSO / SAML
- On-premise deployment
- Custom integrations
- Dedicated 24/7 support
All plans include SSL, data encryption, and a proprietary engine with processing you control.
Built for the teams that have to prove compliance.
We operate with auditable practices and provide the contractual instruments needed for deployment in fintechs, law firms, insurers, and healthcare providers.
LGPD · GDPR
Designed to comply with the LGPD and to be compatible with the GDPR. A data processing agreement (DPA) and standard contractual clauses are available on request.
Private processing
Extraction runs on our own engine, and you control how your documents are processed: they are not handed to external model providers without your choice.
Encryption
Encrypted in transit with TLS and at rest in our storage and database layers. Access to production data is restricted, authenticated, and logged.
Residency
Processing in US regions on Azure. Contractual guarantee that we never train on your documents.
Every control above is documented and verifiable. Read the full posture, or request the report and agreement your review needs.
Five hundred pages to test the API.
Parsift is in early access. Join the waitlist and we will send your invite when account creation opens. A one-time grant of 500 pages to test the API, separate from any plan, no credit card to start.