Apache Kafka

Apache Kafka is a distributed streaming platform for publishing/subscribing, storing, and processing high-throughput real-time data streams.

Apache Software Foundation · Since 2011-08-15

Loading score...

GitHub Website

Detailed Introduction

Apache Kafka is a high-throughput, low-latency distributed streaming platform designed for publishing/subscribing, durable storage, and processing of real-time data streams. Originally developed at LinkedIn and now maintained by the Apache Software Foundation, Kafka is widely used for log aggregation, event-driven architectures, and real-time analytics. Data is organized into topics and partitions, enabling horizontal scaling and strong interoperability with common stream processing frameworks.

Main Features

High throughput and low latency: Sequential disk writes and partitioned parallelism enable very large message throughput on commodity hardware.
Durability and fault tolerance: Messages are persisted to disk and replicated across brokers for data reliability.
Horizontal scalability: Partitions and brokers allow linear scaling.
Rich ecosystem: Includes Kafka Connect and Kafka Streams for easy integration with storage, databases, and stream processors.

Use Cases

Kafka is used for log aggregation, metrics collection, event-driven microservices, stream ETL, real-time analytics, and asynchronous communication between applications. In large data platforms it often serves as an event bus and buffer, decoupling producers and consumers and supporting replay of historical data.

Technical Characteristics

Kafka’s architecture is based on topics, partitions, offsets, and brokers. It uses sequential I/O and zero-copy to maximize disk throughput; replication and leader election provide fault tolerance; and the offset-based consumer model supports at-least-once or exactly-once processing semantics depending on consumer implementation. See the official site for details: Apache Kafka.

Apache Kafka

Detailed Introduction

Main Features

Use Cases

Technical Characteristics

Score Breakdown

Related Resources

Apache Superset

Apache Hadoop

Apache Spark