#🧱 Building Block: Apache Kafka
#Overview
Kafka is a distributed event streaming platform capable of handling trillions of events a day. It is used for building real-time data pipelines and streaming apps.
#Key Concepts
- Topic: A category or feed name to which records are published.
- Partition: Topics are divided into partitions for parallelism and scalability.
- Producer: Publishes records to one or more Kafka topics.
- Consumer: Subscribes to one or more topics and processes the stream of records.
- Consumer Group: A set of consumers which cooperate to consume data from a set of topics.
#Delivery Guarantees
- At-most-once: Messages may be lost but are never redelivered.
- At-least-once: Messages are never lost but may be redelivered.
- Exactly-once: Each message is delivered exactly once.
#When to Use
- Event Sourcing: Storing every change to the state of an application as a sequence of events.
- Messaging: Decoupling microservices.
- Metrics/Logging: High-throughput ingestion of telemetry data.
#Trade-offs
- Pros: Massive throughput, high durability, strong ordering within partitions.
- Cons: High operational complexity, latency is higher than dedicated MQs like RabbitMQ.