Getting Ready for Event-Driven Architecture with Kafka
by Christian Mülhaupt, Head of Architecture
Event-driven architecture (EDA) provides an environment that allows for high-frequency changes of business services. Since we continuously deploy changes, EDA became our primary integration paradigm. The underlying messaging system is a foundational piece and this article describes how we use Kafka in conjunction with our business services.
While it’s relatively easy to get started with Kafka, business services require additional work. Fortunately, Apache Kafka and its ecosystem have matured since Kafka’s release 10 years ago. Today, there are brilliant production-grade offerings and standards to choose from. Let’s take a look at how we became comfortable with Kafka at Jimdo.
The cluster
Since Kafka was to become the backbone of our business services, we went for a managed service offering. As most of our resources live inside Amazon Web Services, we were happy to give Amazon Managed Streaming for Apache Kafka (MSK) a try and we really like the service. It’s easy to set up a redundant deployment across three availability zones and MSK provides Open Monitoring endpoints for Prometheus. To our delight, MSK labs had put together a Grafana dashboard which was ready to be used. We enjoyed an unexpected bonus when we realized that the major JVM distributions (OpenJDK at the time) included Amazon Root CA certificates. This made encrypting traffic with SSL, the basis for SCRAM with SASL and AWS IAM, a walk in the park - we’re using both of these as AWS IAM client libraries are only available for the JVM.
As a result, cluster operations - check! ✅😉
AWS MSK deployment diagram (source)
The clients (producer/consumer)
Jimdo uses different stacks for backend services, but our latest wave of services has been built with Spring Boot. The following examples are taken from these.
When implementing messaging for distributed systems, we prefer to make message processing robust and to add tracing information for context and analysis. After a little research into current (best) practices, we settled on the following standards:
- We produce Avro messages integrated with Confluent’s Schema registry
- Message metadata adheres to the CloudEvents specification
- “Don’t break distributed tracing”. This isn’t really a standard. Fortunately, we didn’t have to worry about this, as we were already auto-instrumenting our services with the OpenTelemetry Java agent. We were especially relieved about this as there seems to be a lot to consider when instrumenting messaging.
With these standards in place, we moved on to implementing clients.
The Producer
In order to send Avro messages the “Confluent way”, we deployed a Confluent Schema registry from a Docker image. The setup was pretty straightforward. I recommend the article 17 Ways to Mess Up Self-Managed Schema Registry. After reading this, we began to back up the schemas topic 😉.
A bit more challenging was the combination of de-/serializing the Avro messages with the CloudEvents envelope. While the specification was clear (we went for binary mode), both projects offer their de-/serialization libraries with multiple possible ways to mix & match them. Since only one de-/serializer per client configuration could be used, we chose Conflluent’s KafkaAvro(De)Serializer. This seemed to better respect the principle of least astonishment. This meant that we needed to manually wire the writing of the envelope. This felt like a good use case for a Kafka ProducerInterceptor. The code below shows a Spring Boot configuration using DefaultKafkaProducerFactoryCustomizer in conjunction with the CloudEvents Java SDK
package com.jimdo.springbootkafkademo.config
import io.cloudevents.core.v1.CloudEventBuilder
import io.cloudevents.kafka.impl.KafkaSerializerMessageWriterImpl
import org.apache.avro.generic.GenericContainer
import org.apache.kafka.clients.producer.ProducerConfig
import org.apache.kafka.clients.producer.ProducerInterceptor
import org.apache.kafka.clients.producer.ProducerRecord
import org.apache.kafka.clients.producer.RecordMetadata
import org.springframework.boot.autoconfigure.kafka.DefaultKafkaProducerFactoryCustomizer
import org.springframework.context.annotation.Bean
import org.springframework.context.annotation.Configuration
import java.net.URI
import java.time.OffsetDateTime
import java.time.ZoneOffset
import java.util.UUID
@Configuration
class KafkaSerializerConfig {
@Bean
fun kafkaProducerFactoryCustomizer(): DefaultKafkaProducerFactoryCustomizer {
return DefaultKafkaProducerFactoryCustomizer {
it.updateConfigs(mapOf(ProducerConfig.INTERCEPTOR_CLASSES_CONFIG to WrapWithCloudEventsProducerInterceptor::class.java.name))
}
}
}
class WrapWithCloudEventsProducerInterceptor : ProducerInterceptor<String, GenericContainer> {
override fun configure(configs: MutableMap<String, *>?) {
// nothing to do
}
override fun onSend(record: ProducerRecord<String, GenericContainer>): ProducerRecord<String, GenericContainer> {
KafkaSerializerMessageWriterImpl(record.headers()).writeBinary(
CloudEventBuilder()
.withId(UUID.randomUUID().toString())
.withType(record.value().schema.name)
.withDataContentType("application/vnd.${record.value().schema.name}.v1+avro")
.withSubject("something specific to the event that can't be derived from type (scheman.name) or event payload")
.withSource(URI.create("https://springbootkafkademo.jimdo-platform.net"))
.withTime(OffsetDateTime.now(ZoneOffset.UTC))
.build()
)
return record
}
override fun onAcknowledgement(metadata: RecordMetadata?, exception: Exception?) {
// nothing to do
}
override fun close() {
// nothing to do
}
}
The Consumer
In order to consume the above messages, we simply configured Confluent’s KafkaAvroDeserializer, which takes care of fetching the Avro schemas from the Schema registry to deserialize the messages. While not specific to our setup, error handling with Kafka message consumption deserves a closer look. The good news is that, with Kafka, we can retrieve previously consumed messages (depending on the configured retention for seven days by default) to compensate for potential errors. Besides basic retries, Spring Kafka’s documentation also describes more sophisticated error handling patterns.
Summary
In this article, we described how we’ve gone about integrating Kafka to build a central messaging system for our business services:
- AWS MSK as a managed Kafka cluster
- Confluent Schema Registry to store Avro schemas
- Spring Boot to implement Kafka clients
- CloudEvents specification for message metadata
- Confluent KafkaAvro(De)Serializer to integrate with the Schema Registry
- KafkaProducerInterceptor to wrap CloudEvents metadata around our Avro messages
We’ve been running this setup for more than a year now and we’re very satisfied with its overall performance and maintainability. Our Engineering teams rely on it heavily to build out event-driven architectures, and this better supports product growth and ultimately increases customer value.
Related articles:
Derek Power, Head of Engineering, explains how we set up our on-call strategy and how we improved it to meet the business needs without burning out people.