Home/Data & Storage/Apache Kafka
Data Engineering
kafka

Apache Kafka

JavaOpen SourceSelf-hostedCloud

The dominant distributed event streaming platform. Kafka handles high-throughput, durable message queues for real-time data pipelines and event-driven architectures. The backbone of modern data infrastructure at scale.

License

Apache 2.0

Language

Java / Scala

92
Trust
Excellent

Why Apache Kafka?

You need high-throughput, durable event streaming between services

You're building a real-time data pipeline or event-driven architecture

You need replay capability — consumers can re-read historical events

Signal Breakdown

What drives the Trust Score

Maven DL
28M / mo
Commits (90d)
342 commits
GitHub stars
28k ★
Stack Overflow
39k q's
Community
Very High
Weighted Trust Score92 / 100

Download Trend

Last 12 months

Tradeoffs & Caveats

Know before you commit

You need simple task queuing (BullMQ or Celery are much simpler)

Your team can't manage brokers — use Confluent Cloud or Redpanda Cloud

Message volume is low — a simple queue is sufficient

Pricing

Free tier & paid plans

Free tier

Open-source self-host free · Confluent: $400 free credits

Paid

Confluent Cloud: $0.11/GB ingested

MSK (AWS): ~$0.21/hr per broker

Often Used Together

Complementary tools that pair well with Apache Kafka

airflow

Apache Airflow

Data Engineering

93Excellent
View
elasticsearch

Elasticsearch

Search & Indexing

98Excellent
View
docker

Docker

DevOps & Infra

93Excellent
View
kubernetes

Kubernetes

DevOps & Infra

99Excellent
View
dbt

dbt

Data Engineering

52Limited
View

Learning Resources

Docs, videos, tutorials, and courses

Get Started

Repository and installation options

View on GitHub

github.com/apache/kafka

npmnpm install kafkajs
pippip install kafka-python
dockerdocker run -p 9092:9092 apache/kafka

Quick Start

Copy and adapt to get going fast

from kafka import KafkaProducer, KafkaConsumer
import json

producer = KafkaProducer(
    bootstrap_servers='localhost:9092',
    value_serializer=lambda v: json.dumps(v).encode()
)
producer.send('user-events', {'user_id': 123, 'event': 'signup'})
producer.flush()

consumer = KafkaConsumer(
    'user-events',
    bootstrap_servers='localhost:9092',
    group_id='my-group',
    value_deserializer=lambda m: json.loads(m.decode())
)
for message in consumer:
    process_event(message.value)

Code Examples

Common usage patterns

Transactional producer

Guarantee exactly-once delivery with transactions

const producer = kafka.producer({ transactionalId: 'my-transactional-producer' });
await producer.connect();

const transaction = await producer.transaction();
try {
  await transaction.send({
    topic: 'orders',
    messages: [{ value: JSON.stringify(order) }],
  });
  await transaction.commit();
} catch (err) {
  await transaction.abort();
  throw err;
}

Docker Compose Kafka cluster

Run a local Kafka + Zookeeper setup

services:
  zookeeper:
    image: confluentinc/cp-zookeeper:7.6.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
  kafka:
    image: confluentinc/cp-kafka:7.6.0
    depends_on: [zookeeper]
    ports: ["9092:9092"]
    environment:
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1

Dead letter queue pattern

Route failed messages to a DLQ for inspection

await consumer.run({
  eachMessage: async ({ topic, message }) => {
    try {
      await processMessage(JSON.parse(message.value!.toString()));
    } catch (err) {
      await producer.send({
        topic: `${topic}.dlq`,
        messages: [{
          key: message.key,
          value: message.value,
          headers: { error: err.message, originalTopic: topic },
        }],
      });
    }
  },
});

Community Notes

Real experiences from developers who've used this tool