A curated list of AI tools and resources for developers, see the AI Resources .

TileLang

TileLang is a domain-specific language for high-performance AI kernels that simplifies writing GPU/CPU/accelerator operators.

Overview

TileLang (tile-lang) is a DSL designed for implementing high-performance operators (e.g., GEMM, FlashAttention) on GPUs and CPUs. Built on top of TVM, it provides concise Pythonic syntax and tooling for performance engineering.

Key features

  • Concise DSL and Python API for operator expression and layout annotations.
  • Multi-backend support (CUDA, HIP, CPU) with device-specific optimizations and examples.
  • Comprehensive examples and benchmark suites, including MLA decoding, FlashMLA and dequantize GEMM.

Use cases

  • Implementing and optimizing kernels for deep learning workloads.
  • Performance tuning on cloud GPUs and accelerators (H100, A100, MI300X, etc.).
  • Research and engineering workflows connecting high-level models to low-level, optimized kernels.

Technical details

  • Core implementation uses C++ and Python; relies on TVM for compilation and JIT workflows.
  • Offers source build instructions, pip packages and nightly builds for quick experimentation.
  • Includes benchmark scripts and device-specific examples to reproduce reported performance results.

Comments

TileLang
Resource Info
🌱 Open Source 🏗️ Framework 📊 Benchmark