Overview
PDFMathTranslate translates scientific PDFs while attempting to preserve formulas, charts, table-of-contents and annotations. It supports multiple translation services, offers a command-line interface, a GUI, Docker images, and integrations such as a Zotero plugin.
Key features
- Layout preservation: retains mathematical formulas, tables and figures as much as possible to minimize post-editing.
- Multiple backends: supports Google, DeepL, OpenAI, Ollama and custom backends with caching.
- Flexible deployment: CLI, GUI, Docker images, and plugins for different workflows.
Use cases
- Batch translating academic papers while keeping readable layouts for reviewers and collaborators.
- Generating bilingual documents for comparison, teaching, or accessible reading.
- Deploying in air-gapped or enterprise environments via Docker or local installs.
Technical details
- Uses document parsing libraries (e.g., PyMuPDF, pdfminer.six) and layout recognition modules to handle complex formatting.
- Supports concurrent translation, chunking, and caching to improve throughput and reliability.
- Exposes Python API and HTTP endpoints for integration into downstream systems such as literature managers and automated summarizers.