Paimon

A table format for realtime Lakehouse architectures, enabling unified streaming and batch storage and query with Flink and Spark.

Author: Apache

Since: 2022-01-12

Visit Website GitHub

Paimon is a table format designed for realtime Lakehouse architectures, supporting unified streaming and batch workloads and integrating with engines like Flink and Spark. It offers transactional semantics, low-latency write paths, and optimized query performance for hybrid workloads.

Key features

Unified streaming and batch: Simplifies pipelines that require both real-time ingestion and analytical queries.
Transactional guarantees: Supports versioning and atomic operations to ensure consistency.
Multi-engine compatibility: Works with Flink, Spark and other ecosystem tools.
Active community and documentation for production adoption.

Use cases

Real-time analytics: Serve as a storage layer for low-latency ingestion and consistent queries.
Lakehouse modernization: Migrate data lakes to table formats that support streaming and batch workloads.

Technical highlights

Table-centric metadata and storage layout optimized for write amplification and read performance.
Tooling for data migration and version management to ease operations.

Paimon

Key features

Use cases

Technical highlights

Resource Info

Related Resources

Apache Doris

Apache Hudi

Gravitino