An Intro to Apache Kafka

I’ve been evaluating queues and storage for an event sourced system lately, and I seem to have found what I am looking for in Apache Kafka. Kafka is used in a surprising number of places. I only learned today, for example, that the Azure Event Hub has a Kafka surface, and can be used as a Kafka cluster itself.

I have the following requirements:

  • Topic-based subscriptions
  • Event-based
  • Infinite storage duration
  • Schema validation
  • MQTT connection for web and mobile clients

I’ve tried out a number of different solutions, but the one I am thinking about right now is based on the Confluent platform, a cloud-managed Kafka cluster. It is relatively easy to set up a cluster with the above requirements. Confluent has a nice clear option to turn on infinite storage duration, and provides a schema registry which supports multiple definition languages, such as JSON schema, Avro and ProtoBuf. Schema registries are a nice way of ensuring your event streams stay clean, preventing buggy messages from even entering the queue.

I say “event-based”, but really we just need to be able to identify the schema type and use JSON. That’s pretty standard, but it needed to be mentioned.

MQTT is a little bit of a challenge, but not really. I’d recommend checking out CloudMQTT, a simple site for deploying cloud-based Mosquitto instances. Setting up the MQTT broker took 2 minutes, and then it was off to Kafka Connect to hook it up. Adding the MQTT source is really easy as expected: provide the URL and credentials and the rest just happens automatically. You can additionally subscribe to topics to push back to MQTT. This works perfectly for web and mobile clients, whose tasks are to push events, and to receive notifications. MQTT will allow for a very nice async request/response mechanism that doesn’t use HTTP and doesn’t have timeouts.

Finally, as I mentioned, Azure Event Hub has a Kafka surface, so you can even push certain topics (auditing, e.g.) out to Azure Event Hub to eventually make its way into the SIEM. There’s a number of useful connectors for Kafka, but I haven’t really looked at them yet except to note that there’s a connector for MongoDB.

Kafka is a publish/subscribe based event broker that includes storage. This makes it ideal for storing DDD aggregates. Having the broker and the database in the same place simplifies the infrastructure, and it’s a natural role for the broker to fill.

The net result of this architecture is that we no longer need to talk HTTP once the Blazor WASM SPA has loaded. All communication with the back-end system is done via event publishing and subscribing over MQTT.

I’m happy with this architecture. Confluent seems to be reasonably priced for what you get (the options I have chosen run about $3USD per hour). CloudMQTT is almost not worth mentioning price-wise, and Kafka Connect leaves open a lot of integration possibilities with other event streams. As it is a WASM application, a lot of the processing is offloaded to the client, and the HTTP server backend stays quiet. The microservices subscribe to their topics and react accordingly, and everything that ever touches the system is stored indefinitely.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: