Open to career opportunities and collaborations. Learn more .

Parsr

An open-source document parsing toolchain that converts PDFs, images and office files into structured JSON/Markdown/CSV.

Detailed Introduction

Parsr is an open-source, lightweight document parsing toolchain developed by AXA. It converts PDFs, images, DOCX, EML and similar document formats into immediately usable structured outputs (JSON, Markdown, CSV/Pandas DataFrame or plain text). The project focuses on document cleaning and hierarchy reconstruction, producing labeled text, paragraphs, tables and metadata for downstream analysis and automated pipelines.

Main Features

  • Multi-format support: handles PDFs, scanned images, office documents, and emails.
  • Cleaning and hierarchy reconstruction: restores lines, paragraphs and document structure; detects headings, lists, page numbers, headers/footers and links.
  • Table and list extraction: exports tables into structured CSV/DataFrame formats with support for complex layouts.
  • Deployment-friendly: provides REST API, CLI, Docker images and a visual viewer; suitable for private deployments.

Use Cases

Useful for archiving and preprocessing for search, invoice and report extraction, contract and compliance review, OCR-driven data extraction, and any ETL workload that converts unstructured documents into analyzable data. Can be run locally or within controlled private environments to meet compliance and privacy needs.

Technical Features

  • Modular pipeline: composed of cleaning, layout analysis, OCR integration, table parsing and export modules for easy extension and component replacement.
  • Multi-engine compatibility: integrates with Tesseract, PDF.js, Camelot and other third-party tools to improve recognition.
  • Programmable interfaces: offers a REST API and a Python client for integration with data-science and pipeline tooling.
  • Open-source license: Apache-2.0 license enables enterprise private deployments and customization.
Parsr
Resource Info
💾 Data 🧲 Utility 🌱 Open Source 📱 Application