Deep Dive into Open Source PDF to Markdown Tools: Marker, MinerU, and Alternatives

This article systematically compares major open source PDF to Markdown tools including Dolphin, MarkItDown, MinerU, and Marker. It focuses on structure fidelity, image/table extraction, AI capabilities, and usability to help technical readers quickly select and understand the best tool for their scenario.

Overview: Feature Comparison of Tools

When choosing a PDF to Markdown tool, structure fidelity, image/table handling, AI parsing, and usability are key considerations. The table below summarizes the main feature differences of four popular tools for quick comparison.

Feature Dimension	ByteDance Dolphin	Microsoft MarkItDown	OpenDataLab MinerU	Datalab Marker
Table of Contents	Basic section retention, occasional order errors	Not retained, plain text only	Retained, supports heading classification	Retained, precise hierarchy recognition
Image Content	Detects and outputs images	Placeholder only, no image export	Exports images with captions	Automatically exports image files
Table Styles	Markdown tables, complex tables may lose fidelity	Simple tables or plain text, styles lost	HTML embed, preserves styles	Markdown tables, LLM optimizes complex tables
Hyperlink Retention	Text only, link targets missing	May lose links, text only	Link targets not explicitly exported	Recognizes and outputs Markdown hyperlinks
Figure Caption Linking	Recognizes and binds captions	Not retained	Smart matching of captions and figures	Detects captions and references, outputs reference links
AI Parsing	Vision model OCR, two-stage parsing	Optional Azure Document AI or GPT	OCR + multi-model pipeline, auto recognition	OCR/layout model, optional LLM
Usage	Local CLI, no GUI	CLI/Docker, no web UI	CLI/Python API/Web demo/App	CLI/GUI/API/Online platform
Free/Open Source	MIT license, free	MIT license, free	Code friendly, models under AGPL	GPL/research license, commercial use requires authorization
Installation/Deploy	Clone code + dependencies + model download	pip install/Docker	pip/uv/Docker, auto model download	pip install, supports GUI/server
Underlying Tech	Vision Transformer OCR	PDFMiner + rule conversion	Layout detection + OCR + table + formula multi-model	Lightweight model + rules + LLM assist
Project Background	ByteDance research team, ACL paper	Microsoft Autogen team, active community	Tsinghua & Shanghai Institute, frequent updates	EndlessAI startup, commercial support
Extensibility	Limited output formats, needs code changes	Plugin mechanism, easy to extend	Customizable pipeline, rich config	Supports custom logic and LLM prompt

Table 1: Feature Comparison of Mainstream Open Source PDF to Markdown Tools

MinerU: High-Fidelity Parsing with Multi-Model Fusion

MinerU, open sourced by OpenDataLab, integrates multiple AI models to maximize document structure and content restoration:

Automatically detects heading levels, outputs clear Markdown structure.
Extracts images, tables, and formulas completely; complex tables embedded as HTML.
Supports OCR for 84 languages, auto-detects scanned documents.
High formula recognition rate, LaTeX-friendly output.
Installation via pip/uv/Docker, auto-downloads models on first run.
High resource usage, GPU recommended.

Figure 1: My favorite feature of MinerU is its precise recognition and use of HTML for table rendering.

MinerU is suitable for academic papers and complex reports requiring high fidelity. Deployment is complex but parsing quality approaches commercial tools. Its documentation and community are active, making support and communication easy. MinerU also provides a client and web interface for non-technical users.

Marker: Efficient and Versatile Modern Parsing

Marker, developed by EndlessAI, balances speed and structure fidelity:

Retains sections, paragraphs, lists, footnotes, and more; logical reading order.
Automatically exports images and tables, supports LLM optimization for complex tables and formulas.
Preserves hyperlinks and references, supports multiple formats and languages.
Offers CLI, GUI, API, and online service; highly user-friendly.
GPL/research license, commercial use requires authorization.

Figure 2: Marker preserves high-quality images from PDFs.

Marker is ideal for batch conversion, complex documents, and multilingual scenarios. It is fast and feature-rich, with licensing restrictions to note. In testing, Marker excelled at image handling, preserving high-quality originals, but support for complex tables is weaker. The author used Marker for ebook translation .

Dolphin: Structure Restoration Driven by Vision Models

Dolphin, open sourced by ByteDance, uses Vision Transformer OCR and layout understanding to restore PDF layout and output structured Markdown/JSON. Its strengths include:

Automatically retains sections, paragraphs, tables, formulas, images, and headings.
Embeds images and formulas in Markdown, formulas support LaTeX.
Outputs tables as Markdown, complex tables may lose fidelity.
Hyperlinks retained as text only, URLs not restored.
Relies on deep learning two-stage parsing, suitable for complex layouts and scanned documents.
Runs locally via CLI, no internet required, model weights must be downloaded.

Dolphin is suitable for scenarios requiring high layout fidelity and local self-hosting, but complex tables and heading order may need manual post-processing.

MarkItDown: Multi-Format Support and Plugin Extensibility

MarkItDown, open sourced by Microsoft, is a general-purpose file-to-Markdown tool focused on multi-format support and ease of use:

Supports PDF, Word, PPT, Excel, images, and more.
PDF conversion extracts plain text only, no heading levels or layout.
Tables are mostly plain text, complex styles lost; images output as placeholders.
Plugin mechanism allows extension for new formats and custom processing.
Optional Azure Document AI or GPT for image descriptions.
Easy installation via pip, active community.

MarkItDown is suitable for quick text extraction or batch multi-format processing, but limited structure fidelity requires manual organization afterward.

Other Open Source Tools and Emerging AI Projects

Beyond the mainstream tools above, the following solutions are also worth considering:

Pandoc: The “Swiss Army knife” of document conversion, supports multi-format conversion, ideal for well-structured PDFs.
pdf2md (Node.js): Lightweight CLI, suitable for batch processing and web integration.
markitdown-go: Go-specific, efficient CLI, easy integration.
olmOCR: Focused on scanned document OCR, ideal for image text recognition.
pdf-to-markdown-gpt: AI-driven, suitable for lightweight projects.
Docling, appjsonify, DocXChain: Emerging AI projects supporting structured parsing and custom workflows, suitable for academic and complex scenarios.

The table below summarizes the features and use cases of these emerging tools:

Tool Category	Typical Example	Best Use Case
General, well-structured	Pandoc	Structured docs with sections, formulas, footnotes
Lightweight JS tools	pdf2md (Node.js)	Fast batch processing, web integration
Go-specific	markitdown-go	Efficient CLI, Go project integration
Scanned/complex image PDF	olmOCR + combo	Strong OCR, image text recognition
AI-driven high fidelity	pdf-to-markdown-gpt, Docling	AI understands structure, preserves more formatting
Academic PDF deep parsing	appjsonify, DocXChain	Paper layout and structure analysis

Table 2: PDF to Markdown Tool Selection Recommendations

How to Choose a PDF to Markdown Tool?

Based on hands-on testing, MinerU converts quickly and recognizes complex tables rendered via HTML, but image handling may be less friendly, sometimes cropping images incompletely. Marker performs well in structure fidelity and image/table handling, and supports multiple usage modes, but has more licensing restrictions. Dolphin is suitable for high layout fidelity needs but handles complex tables poorly. MarkItDown is good for quick text extraction but limited in structure fidelity. All these tools share a common issue: PDF document outline recognition is not accurate enough, especially for multi-level headings and section order, which may require manual adjustment. Overall, Marker and MinerU are recommended as first choices, with Dolphin and MarkItDown as supplementary tools. You can also combine tools as needed: Marker is recommended for book-structured documents, MinerU for more open and free-form documents.

Summary

This article systematically reviews the features and use cases of major open source PDF to Markdown tools: Dolphin, MarkItDown, MinerU, and Marker. Each tool has strengths in structure fidelity, image/table extraction, AI parsing, and usability. When choosing, consider document complexity, deployment environment, and licensing requirements, and prioritize solutions with high structure fidelity and usability. For academic papers and complex reports, MinerU or Marker are recommended; for quick batch processing or multi-format support, Pandoc or MarkItDown are suitable. Looking ahead, AI-driven document parsing tools will continue to improve in quality and automation.