Kafka Mirror Making

Last month the team was requested to find a way to make some production data that is going through Kafka available in a development environment so it could be used by Data teams to test their models, AI agents, etc. For this we needed to copy messages from production topics into our development cluster. We found that Kafka Mirror Maker would nicely fit our requirements and was pretty straight forward to set up.

There are currently two different versions of Mirror Maker available when we install Kafka and in my opinion both are valid and suite different purposes.

Mirror Maker 1

This is the legacy implementation and it works in a really simple way, it creates a simple Kafka consumer that consumes from a source cluster and a producer that produces to a target cluster. This setup can be run in a dedicated machine but is usually started on the target machine in a remote-consume/local-produce pattern.

This way of mirror making, although really straight forward can become a bit too simplistic for most production grade use cases. It has some shortcomings that are hard to solve if we want a resilient and exchangeable replica. With this solution clients won’t be able to switch clusters seamlessly as they will not have consistent information between topics and partitions. However, this solution is good enough for situations where we only need the ability to copy data from topic to topic and don’t need to have perfect topic replication.

Mirror Maker 2

Mirror Maker 2 was released with Kafka 2.4.0 and was built specifically to solve the limitations that the previous version had for many use cases, including backup, disaster recovery, and fail-over scenarios.

This new solution is based on the Kafka Connect framework which offers increased reliability, scalability and performance. It can be viewed at its core as a combination of a Kafka source and sink connector and brings a lot of new features that make it easier, securer and more resilient to replicate clusters.

It gives us the ability to run multiple replication flows preventing replication cycles by introducing the notion of remote topics while in the previous version we were limited to active/passive synchronization because the exact name of the source topic is used to create the replicated topic which in an active/active setup would generate an infinite loop.
Topic configuration is synchronized from source to target clusters, new topics and partitions are detected and created automatically allowing us to dynamically change configurations and removing the need to create tools to keep consistency between clusters or having to do it manually. Moreover, consumer rebalancing due to topic changes, which can have a major performance impact, are reduced to a minimum.
In the previous version the __consumer_offsets are not tracked at all. This is critical for disaster recovery and fail-over situations. How would the consumer find the correct offset in the replicated cluster? Mirror Maker 2 tracks and maps offsets for consumer groups using an offset sync and checkpoint topics. Allowing consumer migration between clusters to be virtually seamless.
It periodically checks if the connection between clusters is healthy or if something is wrong through a dedicated MirrorHeartbeatConnector.

With all this improvements, active-active clusters and disaster recovery are use cases that are now supported out of the box and Mirror Maker is much more appealing to production solutions.

Note that the old version is still working and that there are plans to have a legacy mode in Mirror Maker 2 that would work just like the original Mirror Maker.

Check KIP-382 for more details.

What did we choose and why?

We were running Kafka 2.4.1 so we could go with either, but after considering our options we ended up choosing Mirror Maker 1. In our use case we needed something really simple, we didn’t actually need the new features, the benefits would not make up for the added complexity that would come with them. The only thing we wanted was to copy all messages from a subset of our production topics to a different cluster, and for that we did not need to keep track of any consumer offsets, topic configurations or anything. Summing up, we did not want a replica for consumers to failover, the intervening clusters were completely independent and we wanted them to stay like that.

Besides, we did not need production grade reliability, this was going to be run on a Dev machine and the messages were going to be used to feed machine learning models in test and thus losing a couple of messages was not critical.

Ultimately, all we needed was offered by Mirror Maker 1 and there was no need to pollute our production cluster with information that in the end we were not going to use (heartbeats, configurations, consumer offsets).

How did we set it up?

We are using Chef to provision our machines and we wanted to set up the recommended remote-consume/local-produce pattern, thus the only thing we needed to do was to update the application cookbooks to write a couple of files in the target development machines and run Mirror Maker on startup.

We were already prefixing the topics with the cluster environment (dev-topic-name, prd-topic-name) thus we did not need to worry about topic being referenced with the same name in both clusters (with no cluster prefix) which could cause name clashing on the destination broker otherwise.

To start Mirror Maker we use the script that comes with Kafka installation.

$ kafka-mirror-maker  --consumer.config consumer.properties --producer.config producer.properties --num.streams 3 --whitelist="prd-topic-name"

To run Mirror Maker we pass some configurations, two property files, one for the consumer and other for the producer, the number of consumer stream threads and the list of topics that we want to replicate.

# /opt/kafka/config/consumer.properties
bootstrap.servers=production-cluster-vip.prd:9092
group.id=mm-consumer
exclude.internal.topics=true
enable.auto.commit=false
auto.offset.reset=earliest
partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor

# /opt/kafka/config/producer.properties
bootstrap.servers=development-cluster01.dev:9092,development-cluster02.dev:9092
acks=all
enable.idempotence=true
client.id=mm-producer

Finally, we decided to wrap the command into a systemd service. With this approach it can be auto restarted when the service fails for some unexpected reason and restart consumption from where it left off, granting us at-least-once delivery. To do that we define a file where we save some configuration variables for the service and the actual mirrormaker.service file.

# /etc/sysconfig/mirrormaker
export JAVA_HOME="/usr/java/latest"

KAFKA_BASEDIR="/opt/kafka"
KAFKA_CFG_DIR="$KAFKA_BASEDIR/config"
MM_WHITELIST="prd-(source01|source02|source03)-data"
MM_NUM_STREAMS=3

# /etc/systemd/system/mirrormaker.service
[Unit]
Description=Kafka Mirror Maker
After=kafka.service
[Service]
Type=simple
Restart=on-failure
User=kafka
Group=ldap_logbot
ExecStart=/bin/sh -c 'source /etc/sysconfig/mirrormaker && $KAFKA_BASEDIR/bin/kafka-mirror-maker.sh  --consumer.config $KAFKA_CFG_DIR/consumer.properties --producer.config $KAFKA_CFG_DIR/producer.properties --num.streams $MM_NUM_STREAMS --whitelist=$MM_WHITELIST'