Apache Iceberg

A high-performance table format for huge analytic tables, offering snapshots, transactions and multi-engine compatibility.

Apache · Since 2018-11-19

Loading score...

GitHub Website

Introduction

Apache Iceberg is a high-performance table format for large analytic datasets. It brings ACID snapshots, time travel, partition evolution, and a stable metadata layer to data lakes, enabling multiple engines (Spark, Flink, Trino, etc.) to safely operate on the same tables.

Key features

Standardized table format with versioned snapshots and atomic commits.
Engine interoperability across Spark, Flink, Trino and more.
Support for Parquet/ORC/Arrow and optimized metadata layout for fast reads.
Strong community governance under the Apache Software Foundation.

Use cases

Data lake governance and reliable table management.
Multi-engine analytics where different compute frameworks share data.
Building cloud-native data warehousing architectures.

Technical characteristics

Reference Java implementation with modular components and integrations.
Well-documented spec and production-tested implementations.
Compatible with S3, HDFS, GCS and other storage backends.

Core Content

Core Content

Technology

Technology

More

More

AI Infrastructure

AI Infrastructure

Explore

Explore

Connect

Connect

Quick Links

Quick Links

LinkedIn

LinkedIn

Follow on X

Follow on X

Apache Iceberg

Introduction

Key features

Use cases

Technical characteristics

Score Breakdown

Related Resources

Apache Doris

Gravitino

3FS