DeepSeek-OCR

An open-source OCR model and toolkit based on Contexts Optical Compression for LLM-centric multimodal inference scenarios.

DeepSeek · Since 2025-10-17

Loading score...

GitHub Website

Detailed Introduction

DeepSeek-OCR is an open-source OCR model and toolkit that introduces the “Contexts Optical Compression” approach to improve visual-text compression and understanding from an LLM-centric perspective. The repository includes inference examples for vLLM and Transformers, training and evaluation scripts, and pipelines for batch document and image recognition. It balances throughput and multimodal fusion, supporting multiple resolutions and dynamic cropping strategies.

Main Features

The Contexts Optical Compression methodology.
Adapters and examples for vLLM and Hugging Face Transformers inference.
Multiple resolution and dynamic modes (tiny/small/base/large) to trade off accuracy and performance.
Batch evaluation and PDF/image streaming inference scripts for deployment and benchmarks on GPUs like A100.

Use Cases

DeepSeek-OCR suits scenarios that require high-throughput multimodal understanding and structured outputs, such as large-scale PDF/document OCR pipelines, research and benchmarking, image-text retrieval combined with LLMs, and integration testing within the vLLM inference stack.

Technical Features

LLM-centric design emphasizing the benefits of visual encoders and context compression for downstream understanding.
Support for trust_remote_code transformers loading and safetensors for faster weight I/O.
Compatibility with vLLM pipelines and sampling parameter configurations, demonstrating concurrency and streaming outputs.
MIT-licensed, accompanied by an arXiv paper and community model releases.

DeepSeek-OCR

Detailed Introduction

Main Features

Use Cases

Technical Features

Score Breakdown

Related Resources

FlashMLA

DeepGEMM

DeepEP