MongoDB Kafka Connector Docker Example¶
This guide provides an end-to-end setup of MongoDB and Kafka Connect to demonstrate the functionality of the MongoDB Kafka Source and Sink Connectors.
In this example, we create the following Kafka Connectors:
Connector | Data Source | Destination |
---|---|---|
Confluent Connector:
Datagen | Kafka topic: pageviews | |
Sink Connector: mongo-sink | Kafka topic: pageviews | MongoDB collection: test.pageviews |
Source Connector: mongo-source | MongoDB collection: test.pageviews | Kafka topic: mongo.test.pageviews |
- The Datagen Connector creates random data using the Avro random generator and publishes it to the Kafka topic "pageviews".
- The mongo-sink connector reads data from the "pageviews" topic and writes it to MongoDB in the "test.pageviews" collection.
- The mongo-source connector produces change events for the "test.pageviews" collection and publishes them to the "mongo.test.pageviews" collection.
Requirements¶
Linux/Unix-based OS¶
- Docker 18.09 or later
- Docker Compose 1.24 or later
MacOS¶
- Docker Desktop Community Edition (Mac) 2.1.0.1 or later
Windows¶
- Docker Desktop Community Edition (Windows) 2.1.0.1 or later
How to Run the Example¶
Clone the mongo-kafka repository from GitHub:
git clone https://github.com/mongodb/mongo-kafka.git
Change directory to the docker
directory
cd mongo-kafka/docker/
Start the shell script, run.sh:
./run.sh
The shell script executes the following sequence of commands:
Run the
docker-compose up
commandThe
docker-compose
command installs and starts the following applications in a new docker container:- Zookeeper
- Kafka
- Confluent Schema Registry
- Confluent Kafka Connect
- Confluent Control Center
- Confluent KSQL Server
- Kafka Rest Proxy
- Kafka Topics UI
- MongoDB replica set (three nodes: mongo1, mongo2, and mongo3)
- Wait for MongoDB, Kafka, Kafka Connect to become ready
- Register the Confluent Datagen Connector
- Register the MongoDB Kafka Sink Connector
- Register the MongoDB Kafka Source Connector
You may need to increase the RAM resource limits for Docker if the script fails. Use the docker-compose stop command to stop any running instances of docker if the script did not complete successfully.
Once the services have been started by the shell script, the Datagen Connector publishes new events to Kafka at short intervals which triggers the following cycle:
- The Datagen Connector publishes new events to Kafka
- The Sink Connector writes the events into MongoDB
- The Source Connector writes the change stream messages back into Kafka
To view the Kafka topics, open the Kafka Control Center at http://localhost:9021/ and navigate to the cluster topics.
The
pageviews
topic should contain documents added by the Datagen Connector that resemble the following:{ "viewtime": { "$numberLong": "81" }, "pageid": "Page_1", "userid": "User_8" } The
mongo.test.pageviews
topic should contain change events that resemble the following:{ "_id": { "_data": "<resumeToken>" }, "operationType": "insert", "clusterTime": { "$timestamp": { "t": 1563461814, "i": 4 } }, "fullDocument": { "_id": { "$oid": "5d3088b6bafa7829964150f3" }, "viewtime": { "$numberLong": "81" }, "pageid": "Page_1", "userid": "User_8" }, "ns": { "db": "test", "coll": "pageviews" }, "documentKey": { "_id": { "$oid": "5d3088b6bafa7829964150f3" } } }
Next, explore the collection data in the MongoDB replica set:
In your local shell, navigate to the
docker
directory from which you ran thedocker-compose
commands and connect to themongo1
MongoDB instance using the following command:docker-compose exec mongo1 /usr/bin/mongo - If you insert or update a document in the
test.pageviews
, the Source Connector publishes a change event document to themongo.test.pageviews
Kafka topic.
To stop the docker containers and all the processes running on them, use Ctrl-C in the shell running the script, or the following command:
docker-compose stop
To remove the docker containers and images completely, use the following command:
docker-compose down