This article systematically compares major open source PDF to Markdown tools including Dolphin, MarkItDown, MinerU, and Marker. It focuses on structure fidelity, image/table extraction, AI capabilities, and usability to help technical readers quickly select and understand the best tool for their scenario.
Overview: Feature Comparison of Tools
When choosing a PDF to Markdown tool, structure fidelity, image/table handling, AI parsing, and usability are key considerations. The table below summarizes the main feature differences of four popular tools for quick comparison.
Feature Dimension | ByteDance Dolphin | Microsoft MarkItDown | OpenDataLab MinerU | Datalab Marker |
---|---|---|---|---|
Table of Contents | Basic section retention, occasional order errors | Not retained, plain text only | Retained, supports heading classification | Retained, precise hierarchy recognition |
Image Content | Detects and outputs images | Placeholder only, no image export | Exports images with captions | Automatically exports image files |
Table Styles | Markdown tables, complex tables may lose fidelity | Simple tables or plain text, styles lost | HTML embed, preserves styles | Markdown tables, LLM optimizes complex tables |
Hyperlink Retention | Text only, link targets missing | May lose links, text only | Link targets not explicitly exported | Recognizes and outputs Markdown hyperlinks |
Figure Caption Linking | Recognizes and binds captions | Not retained | Smart matching of captions and figures | Detects captions and references, outputs reference links |
AI Parsing | Vision model OCR, two-stage parsing | Optional Azure Document AI or GPT | OCR + multi-model pipeline, auto recognition | OCR/layout model, optional LLM |
Usage | Local CLI, no GUI | CLI/Docker, no web UI | CLI/Python API/Web demo/App | CLI/GUI/API/Online platform |
Free/Open Source | MIT license, free | MIT license, free | Code friendly, models under AGPL | GPL/research license, commercial use requires authorization |
Installation/Deploy | Clone code + dependencies + model download | pip install/Docker | pip/uv/Docker, auto model download | pip install, supports GUI/server |
Underlying Tech | Vision Transformer OCR | PDFMiner + rule conversion | Layout detection + OCR + table + formula multi-model | Lightweight model + rules + LLM assist |
Project Background | ByteDance research team, ACL paper | Microsoft Autogen team, active community | Tsinghua & Shanghai Institute, frequent updates | EndlessAI startup, commercial support |
Extensibility | Limited output formats, needs code changes | Plugin mechanism, easy to extend | Customizable pipeline, rich config | Supports custom logic and LLM prompt |
MinerU: High-Fidelity Parsing with Multi-Model Fusion
MinerU, open sourced by OpenDataLab, integrates multiple AI models to maximize document structure and content restoration:
- Automatically detects heading levels, outputs clear Markdown structure.
- Extracts images, tables, and formulas completely; complex tables embedded as HTML.
- Supports OCR for 84 languages, auto-detects scanned documents.
- High formula recognition rate, LaTeX-friendly output.
- Installation via pip/uv/Docker, auto-downloads models on first run.
- High resource usage, GPU recommended.

MinerU is suitable for academic papers and complex reports requiring high fidelity. Deployment is complex but parsing quality approaches commercial tools. Its documentation and community are active, making support and communication easy. MinerU also provides a client and web interface for non-technical users.
Marker: Efficient and Versatile Modern Parsing
Marker, developed by EndlessAI, balances speed and structure fidelity:
- Retains sections, paragraphs, lists, footnotes, and more; logical reading order.
- Automatically exports images and tables, supports LLM optimization for complex tables and formulas.
- Preserves hyperlinks and references, supports multiple formats and languages.
- Offers CLI, GUI, API, and online service; highly user-friendly.
- GPL/research license, commercial use requires authorization.

Marker is ideal for batch conversion, complex documents, and multilingual scenarios. It is fast and feature-rich, with licensing restrictions to note. In testing, Marker excelled at image handling, preserving high-quality originals, but support for complex tables is weaker. The author used Marker for ebook translation .
Dolphin: Structure Restoration Driven by Vision Models
Dolphin, open sourced by ByteDance, uses Vision Transformer OCR and layout understanding to restore PDF layout and output structured Markdown/JSON. Its strengths include:
- Automatically retains sections, paragraphs, tables, formulas, images, and headings.
- Embeds images and formulas in Markdown, formulas support LaTeX.
- Outputs tables as Markdown, complex tables may lose fidelity.
- Hyperlinks retained as text only, URLs not restored.
- Relies on deep learning two-stage parsing, suitable for complex layouts and scanned documents.
- Runs locally via CLI, no internet required, model weights must be downloaded.
Dolphin is suitable for scenarios requiring high layout fidelity and local self-hosting, but complex tables and heading order may need manual post-processing.
MarkItDown: Multi-Format Support and Plugin Extensibility
MarkItDown, open sourced by Microsoft, is a general-purpose file-to-Markdown tool focused on multi-format support and ease of use:
- Supports PDF, Word, PPT, Excel, images, and more.
- PDF conversion extracts plain text only, no heading levels or layout.
- Tables are mostly plain text, complex styles lost; images output as placeholders.
- Plugin mechanism allows extension for new formats and custom processing.
- Optional Azure Document AI or GPT for image descriptions.
- Easy installation via pip, active community.
MarkItDown is suitable for quick text extraction or batch multi-format processing, but limited structure fidelity requires manual organization afterward.
Other Open Source Tools and Emerging AI Projects
Beyond the mainstream tools above, the following solutions are also worth considering:
- Pandoc: The “Swiss Army knife” of document conversion, supports multi-format conversion, ideal for well-structured PDFs.
- pdf2md (Node.js): Lightweight CLI, suitable for batch processing and web integration.
- markitdown-go: Go-specific, efficient CLI, easy integration.
- olmOCR: Focused on scanned document OCR, ideal for image text recognition.
- pdf-to-markdown-gpt: AI-driven, suitable for lightweight projects.
- Docling, appjsonify, DocXChain: Emerging AI projects supporting structured parsing and custom workflows, suitable for academic and complex scenarios.
The table below summarizes the features and use cases of these emerging tools:
Tool Category | Typical Example | Best Use Case |
---|---|---|
General, well-structured | Pandoc | Structured docs with sections, formulas, footnotes |
Lightweight JS tools | pdf2md (Node.js) | Fast batch processing, web integration |
Go-specific | markitdown-go | Efficient CLI, Go project integration |
Scanned/complex image PDF | olmOCR + combo | Strong OCR, image text recognition |
AI-driven high fidelity | pdf-to-markdown-gpt, Docling | AI understands structure, preserves more formatting |
Academic PDF deep parsing | appjsonify, DocXChain | Paper layout and structure analysis |
How to Choose a PDF to Markdown Tool?
Based on hands-on testing, MinerU converts quickly and recognizes complex tables rendered via HTML, but image handling may be less friendly, sometimes cropping images incompletely. Marker performs well in structure fidelity and image/table handling, and supports multiple usage modes, but has more licensing restrictions. Dolphin is suitable for high layout fidelity needs but handles complex tables poorly. MarkItDown is good for quick text extraction but limited in structure fidelity. All these tools share a common issue: PDF document outline recognition is not accurate enough, especially for multi-level headings and section order, which may require manual adjustment. Overall, Marker and MinerU are recommended as first choices, with Dolphin and MarkItDown as supplementary tools. You can also combine tools as needed: Marker is recommended for book-structured documents, MinerU for more open and free-form documents.
Summary
This article systematically reviews the features and use cases of major open source PDF to Markdown tools: Dolphin, MarkItDown, MinerU, and Marker. Each tool has strengths in structure fidelity, image/table extraction, AI parsing, and usability. When choosing, consider document complexity, deployment environment, and licensing requirements, and prioritize solutions with high structure fidelity and usability. For academic papers and complex reports, MinerU or Marker are recommended; for quick batch processing or multi-format support, Pandoc or MarkItDown are suitable. Looking ahead, AI-driven document parsing tools will continue to improve in quality and automation.