Overview
We are covering how Kafka MirrorMaker operates, how to set it up, and how to test mirror data.
MirrorMaker 2.0 is the new replication feature of Kafka 2.4, defined as part of the Kafka Improvement Process - KIP 382. Kafka MirrorMaker 2 is designed to replicate or mirror topics from one Kafka cluster to another. It uses the Kafka Connect framework to simplify the configuration and scaling. MirrorMaker dynamically detects changes to source topics and ensures source and target topic properties are synchronized, including topic data, offsets, and partitions. The topic, together with topic data, offsets, and partitions, is replicated in the target cluster when a new topic is created in the source cluster.
Use Cases
Disaster Recovery
Though Kafka is highly distributed and provides a high level of fault tolerance, disasters can still happen, and data can still become temporarily unavailable—or lost altogether. The best way to mitigate the risks is to have a copy of your data in another Kafka cluster in a different data center. MirrorMaker translates and syncs consumer offsets to the target cluster. That way, we can switch clients to it relatively seamlessly, moving to an alternative deployment on the fly with minor or no service interruptions.
Closer Read / Writes
Kafka producer clients often prefer to write locally to achieve low latency, but business requirements demand the data be read by different consumers, often deployed in multiple regions. This can easily make deployments complex due to VPC peering. MirrorMaker can handle all complex replication, making it easier to write and read local mechanisms.
Data Analytics
Aggregation is also a factor in data pipelines, which might require the consolidation of data from regional Kafka clusters into a single one. That aggregate cluster then broadcasts that data to other clusters and/or data systems for analysis and visualization.
Supported Topologies
- Active/Passive or Active/Standby high availability deployments - (ClusterA => ClusterB)
- Active/Active HA Deployment - (ClusterA => ClusterB and ClusterB => ClusterA)
- Aggregation (e.g., from many clusters to one): (ClusterA => ClusterK, ClusterB => ClusterK, ClusterC => ClusterK)
- Fan-out (opposite of Aggregation): (ClusterK => ClusterA, ClusterK => ClusterB, ClusterK => ClusterC)
- Forwarding: (ClusterA => ClusterB, ClusterB => ClusterC, ClusterC => ClusterD)
Salient Features of MirrorMaker 2
- Mirrors Topic and Topic Configuration - Detects and mirrors new topics and config changes automatically, including the number of partitions and replication factors.
- Mirrors ACLs - Mirrors Topic ACLs as well, though we found issues in replicating WRITE permission. Also, replicated topics often contain source cluster names as a prefix, which means existing ACLs need to be tweaked, or ACL replication may need to be managed externally if the topologies are more complex.
- Mirrors Consumer Groups and Offsets - Seamlessly translates and syncs Consumer Group Offsets to target clusters to make it easier to switch from one cluster to another in case of disaster.
- Ability to Update MM2 Config Dynamically - MirrorMaker is backed by Kafka Connect Framework, which provides REST APIs through which MirrorMaker configurations like replicating new topics, stopping replicating certain topics, etc. can be updated without restarting the cluster.
- Fault-Tolerant and Horizontally Scalable Operations - The number of processes can be scaled horizontally to increase performance.
How Kafka MirrorMaker 2 Works
MirrorMaker uses a set of standard Kafka connectors. Each connector has its own role. The listing of connectors and their functions is provided below.
- MirrorSourceConnector: Replicates topics, topic ACLs, and configs from the source cluster to the target cluster.
- MirrorCheckpointConnector: Syncs consumer offsets, emits checkpoints, and enables failover.
- MirrorHeartBeatConnector: Checks connectivity between the source and target clusters.
MirrorMaker Running Modes
There are three ways to run MirrorMaker:
- As a dedicated MirrorMaker cluster (can be distributed with multiple replicas having the same config): In this mode, MirrorMaker does not require an existing Connect cluster. Instead, a high-level driver manages a collection of Connect workers.
- As a standalone Connect worker: In this mode, a single Connect worker runs MirrorSourceConnector. This does not support multi-clusters, but it’s useful for small workloads or for testing.
- In legacy mode, using existing MirrorMaker scripts: After legacy MirrorMaker is deprecated, the existing ./bin/kafka-mirror-maker.sh scripts will be updated to run MM2 in legacy mode:
Setting up MirrorMaker 2
We recommend running MirrorMaker as a dedicated MirrorMaker cluster since it does not require an existing Connect cluster. Instead, a high-level driver manages a collection of Connect workers. The cluster can be easily converted to a distributed cluster just by adding multiple replicas of the same configuration. A distributed cluster is required to reduce the load on a single node cluster and also to increase MirrorMaker throughput.
Prerequisites
Steps to Set Up MirrorMaker 2
Set up a single node source, target Kafka cluster, and a MirrorMaker node to run MirrorMaker 2.
1. Clone repository:
https://gitlab.com/velotio/kafka-mirror-maker.git
2. Run the below command to start the Kafka clusters and the MirrorMaker Docker container:
CODE: https://gist.github.com/raihavelotio/894ba7ca50c7ea26ff11b96209ddbf9d.js
3. Login to the mirror-maker docker container:
CODE: https://gist.github.com/raihavelotio/8bf2f6d9913165288513b696e98c4180.js
4. Start MirrorMaker:
CODE: https://gist.github.com/raihavelotio/beb9cf69cff4338e4a55120c7f4b0131.js
5. Monitor the logs of the MirrorMaker container—it should be something like this:
- [2024-02-05 04:07:39,450] INFO [MirrorCheckpointConnector|task-0] sync idle consumer group offset from source to target took 0 ms (org.apache.kafka.connect.mirror.Scheduler:95)
- [2024-02-05 04:07:49,246] INFO [MirrorCheckpointConnector|worker] refreshing consumer groups took 1 ms (org.apache.kafka.connect.mirror.Scheduler:95)
- [2024-02-05 04:07:49,337] INFO [MirrorSourceConnector|worker] refreshing topics took 3 ms (org.apache.kafka.connect.mirror.Scheduler:95)
- [2024-02-05 04:07:49,450] INFO [MirrorCheckpointConnector|task-0] refreshing idle consumers group offsets at target cluster took 2 ms (org.apache.kafka.connect.mirror.Scheduler:95)
6. Create a topic at the source cluster:
CODE: https://gist.github.com/raihavelotio/26e2c978316773c5c8b253c4089fd6c5.js
7. List topics and validate the topic:
CODE: https://gist.github.com/raihavelotio/a07d378ee7464e9ec93df0c57e491742.js
8. Produce 100 messages on the topic:
CODE: https://gist.github.com/raihavelotio/e7126eaaf53f7f8cef485813d6113cc0.js
9. Check whether the topic is mirrored in the target cluster.
Note: The mirrored topic will have a source cluster name prefix to be able to identify which source cluster the topic is mirrored from.
CODE: https://gist.github.com/raihavelotio/c16c129c1ce0ce35967f70fd7ca00d03.js
10. Consume 5 messages from the source kafka cluster:
CODE: https://gist.github.com/raihavelotio/10e41efa972fe8f3f7e0922911556437.js
11. Describe the consumer group at the source and destination to verify that consumer offsets are also mirrored:
CODE: https://gist.github.com/raihavelotio/f2e95dbd5ac7d6d13b18ad9ea24d8128.js
CODE: https://gist.github.com/raihavelotio/b723adca39a24611daeb8b76eaab7410.js
12. Consume five messages from the target Kafka cluster. The messages should start from the committed offset in the source cluster. In this case, the message offset will start at 6.
CODE: https://gist.github.com/raihavelotio/86f2ee21861f7752729b7fc392e457df.js
Conclusion
We’ve seen how to set up MirrorMaker 2.0 in a dedicated instance. This running mode does not need a running Connect cluster as it leverages a high-level driver that creates a set of Connect workers based on the MirrorMaker properties configuration file.