CUTLASS

CUDA Templates for Linear Algebra Subroutines (CUTLASS), a high-performance matrix computation template library provided by NVIDIA.

NVIDIA · Since 2017-11-30

Loading score...

GitHub

Introduction

CUTLASS is a CUDA template library from NVIDIA for linear algebra subroutines (such as GEMM), designed to help developers build high-performance, reusable matrix computation kernels. It includes various optimization strategies and examples, making it easy to achieve efficient computation across different GPU architectures.

Key Features

Templated GEMM and linear algebra building blocks for easy customization and extension.
Performance optimizations and example implementations targeting multiple GPU architectures.
Comprehensive documentation and examples for easy integration and tuning.

Use Cases

Implementing custom high-performance matrix multiplication and linear algebra operators.
Quickly building hardware-specific kernels using CUTLASS templates and examples.
Replacing default operators in training and inference pipelines for better performance.

Technical Highlights

Highly customizable operator building blocks implemented with CUDA and template metaprogramming.
Optimized paths for different data types and memory layouts.

CUTLASS

Introduction

Key Features

Use Cases

Technical Highlights

Score Breakdown

Related Resources

PersonaPlex

KAI Scheduler

NVIDIA GPU Operator