IPEX-LLM

IPEX-LLM is Intel's XPU acceleration library for PyTorch, designed to speed up inference and fine-tuning of large language models on Intel hardware.

Author: Intel

Since: 2016-08-29

GitHub

Introduction

IPEX-LLM is Intel’s XPU acceleration library for PyTorch, providing optimizations to run LLMs efficiently on Intel XPU (integrated GPUs, Arc dGPUs, NPU) and CPUs.

Key Features

Broad compatibility: integrates with llama.cpp, Ollama, vLLM, HuggingFace, LangChain, LlamaIndex and more.
Low-bit & mixed precision: supports INT4/FP4/FP8 and mixed-precision optimizations to improve throughput and reduce memory footprint.

Use Cases

High-performance inference and fine-tuning on Intel-based local and cloud deployments.
Optimized LLM inference on resource-constrained devices (integrated GPUs or NPUs).

Technical Highlights

Deep integration with PyTorch and support for hardware-specific optimizations and pipeline parallelism (e.g., DeepSpeed AutoTP).
Production tooling: Docker/Helm deployment guides and benchmarking tools for performance evaluation.

IPEX-LLM

Introduction

Key Features

Use Cases

Technical Highlights

Resource Info

Related Resources

Kata Containers

Golem

Aspire