MMS • Alex Soto
Article originally posted on InfoQ. Visit InfoQ
How data is processed/consumed nowadays is different from how it was previously practiced. In the past, data was stored in a database and it was batch processed to get some analytics. Although this approach is correct, more modern platforms let you process data in real-time as data comes to the system.
Apache Kafka (or Kafka) is a distributed event store and stream-processing platform for storing, consuming, and processing data streams.
One of the key aspects of Apache Kafka is that it was created with scalability and fault-tolerance in mind, making it appropriate for high-performance applications. Kafka can be considered a replacement for some conventional messaging systems such as Java Message Service (JMS) and Advanced Message Queuing Protocol (AMQP).
Apache Kafka has integrations with most of the languages used these days, but in this article series, we’ll cover its integration with Java.
The Kafka Streams project helps you consume real-time streams of events as they are produced, apply any transformation, join streams, etc., and optionally write new data representations back to a topic.
Kafka Streams is ideal for both stateless and stateful streaming applications, implements time-based operations (for example grouping events around a given time period), and has in mind the scalability, reliability, and maintainability always present in the Kafka ecosystem.
But Apache Kafka is much more than an event store and a streaming-processing platform. It’s a vast ecosystem of projects and tools that fits solving some of the problems we might find when developing microservices. One of these problems is the dual writes problem when data needs to be stored transactionally in two systems. Kafka Connect and Debezium are open-source projects for change data capture using the log scanner approach to avoid dual writes and communicate persisted data correctly between services.
In the last part of this series of articles, we’ll see how to provision, configure and secure an Apache Kafka cluster on a Kubernetes cluster.