Marker converts documents to markdown, JSON, chunks, and HTML quickly and accurately.
Tool Features
Marker converts various document formats including:
- PDF files
- Image files
- PPTX, DOCX, XLSX files
- HTML files
- EPUB files
- Files in all languages
Formatting Capabilities
Marker handles various document elements:
- Formats tables, forms, equations, inline math
- Extracts links, references, and code blocks
- Extracts and saves images
- Removes headers/footers and other artifacts
Extensibility
Marker offers excellent extensibility:
- Extensible with your own formatting and logic
- Does structured extraction, given a JSON schema (beta)
- Optionally boost accuracy with LLMs (and your own prompt)
- Works on GPU, CPU, or MPS
Use Cases
Marker is suitable for scenarios that require converting various document formats to structured text, such as:
- Converting PDF documents to editable Markdown format
- Extracting structured data from documents
- Preparing training data for machine learning projects
- Document digitization and archiving
- Automating document processing workflows