In this guide, you can learn about change streams and how they are used in a MongoDB Kafka Connector source connector.
Change streams are a feature of MongoDB that allow you to receive real-time updates on data changes. Change streams return change event documents. A change event document is a document in your oplog that contains idempotent instructions to recreate a change that occurred in your MongoDB deployment as well as the metadata related to that change.
The oplog is a special collection in MongoDB that keeps track of all changes within a MongoDB replica set. Change streams help you use the change event data stored in the oplog without you having to learn details about how the oplog works.
Change streams are available for replica sets and sharded clusters. A standalone MongoDB instance cannot produce a change stream.
To view a list of all configuration options for change streams, see the Change Stream Properties page.
To learn more about change streams, see the following resources:
To learn more about the oplog, see the MongoDB manual entry on the Replica Set Oplog.
Use an aggregation pipeline to configure your source connector's change stream. Some of the ways you can configure your connector's change stream are as follows:
- Filter change events by operation type
- Project specific fields
- Update the value of fields
- Add fields
- Trim the amount of data generated by the change stream
To learn which aggregation operators you can use with a change stream, see the Modify Change Stream Output guide in the MongoDB manual.
To view examples that use an aggregation pipeline to modify a change stream, see the following pages:
Change Event Structure¶
Find the complete structure of change event documents, including descriptions of all fields, in the MongoDB manual.
If you want Kafka Connect to receive just the document created or modified
from your change operation, use the
option. For more information, see the Change Stream Properties
A MongoDB Kafka Connector source connector works by opening a single change stream with MongoDB and sending data from that change stream to Kafka Connect. Your source connector maintains its change stream for the duration of its runtime, and your connector closes its change stream when you stop it.
To view the available options to configure your source connector's change stream, see the Change Stream Properties page.
Your connector uses a resume token as its offset. An offset is a value
your connector stores in an Apache Kafka topic to keep track of what source data it
has processed. Your connector uses its offset value when it must recover from
a restart or crash. A resume token is a piece of data that references the
_id field of a change event document in your MongoDB oplog.
If your source connector does not have an offset, such as when you start the connector for the first time, your connector starts a new change stream. Once your connector receives its first change event document and publishes that document to Apache Kafka, your connector stores the resume token of that document as its offset.
If the resume token value of your source connector's offset does not correspond to any entry in your MongoDB deployment's oplog, your connector has an invalid resume token. To learn how to recover from an invalid resume token, see the invalid token troubleshooting guide.
To learn more about resume tokens, see the following resources:
To learn more about offsets, see the following resources: