UniLM

UniLM is a unified pre-training paradigm and project collection from Microsoft Research, spanning language understanding and generation and spawning multiple foundation and multimodal subprojects.

Author: Microsoft Research

Added Date: 2025-10-03

Open Source Since: 2019-01-01

Visit Website GitHub

Overview

UniLM is Microsoft Research’s unified pre-training approach and project repository that supports both understanding and generation tasks, and has produced foundation models and multimodal projects such as MiniLM, LayoutLM and BEiT used widely in research and production.

Key Features

Unified pre-training objectives that cover both understanding and generation, facilitating transfer to diverse downstream tasks.
A broad collection of subprojects addressing text, document, vision and speech, plus engineering-ready implementations and model checkpoints.
Tooling, examples and pretrained weights that simplify reproduction and deployment.

Use Cases

Researchers reproducing papers and comparing models; engineering teams building downstream applications and fine-tuning pipelines.
Document understanding, OCR, vision+language tasks, text generation and multilingual applications.

Technical Details

Integrates efficient architectures and pretraining methods (e.g., MiniLM, BEiT, X-MoE), emphasizing scalability and practical efficiency.
Open-source licensing and extensive documentation enable community collaboration and engineering adoption.

UniLM

Overview

Key Features

Use Cases

Technical Details

Resource Info

Related Resources

Agent Lightning

Nano-vLLM

DeepSeek-OCR