Data Partitioning with Chunks¶
MongoDB uses the shard key associated to the collection to partition the data into chunks. A chunk consists of a subset of sharded data. Each chunk has a inclusive lower and exclusive upper range based on the shard key.
mongos routes writes to the appropriate chunk based on the
shard key value. MongoDB splits chunks when they grows beyond the
configured chunk size. Both inserts and updates
can trigger a chunk split.
The smallest range a chunk can represent is a single unique shard key value. A chunk that only contains documents with a single shard key value cannot be split.
- Small chunks lead to a more even distribution of data at the
expense of more frequent migrations. This creates expense at the
query routing (
- Large chunks lead to fewer migrations. This is more efficient both from the networking perspective and in terms of internal overhead at the query routing layer. But, these efficiencies come at the expense of a potentially uneven distribution of data.
- Chunk size affects the
Maximum Number of Documents Per Chunk to Migrate.
- Chunk size affects the maximum collection size when sharding an
existing collection. Post-sharding, chunk size does not constrain collection size.
For many deployments, it makes sense to avoid frequent and potentially spurious migrations at the expense of a slightly less evenly distributed data set.
Changing the chunk size affects when chunks split but there are some limitations to its effects.
- Automatic splitting only occurs during inserts or updates. If you lower the chunk size, it may take time for all chunks to split to the new size.
- Splits cannot be “undone”. If you increase the chunk size, existing chunks must grow through inserts or updates until they reach the new size.
Splitting is a process that keeps chunks from growing too large. When a chunk
grows beyond a specified chunk size, or if the
number of documents in the chunk exceeds
Maximum Number of Documents
Per Chunk to Migrate, MongoDB splits the chunk based on the shard key values
the chunk represent. A chunk may be split into multiple chunks where necessary.
Inserts and updates may trigger splits. Splits are an efficient meta-data
change. To create splits, MongoDB does not migrate any data or affect the
Splits may lead to an uneven distribution of the chunks for a collection
across the shards. In such cases, a
initiates a round of migrations to redistribute chunks across shards. See
Cluster Balancer for more details on balancing chunks across
MongoDB migrates chunks in a sharded cluster to distribute the chunks of a sharded collection evenly among shards. Migrations may be either:
- Manual. Only use manual migration in limited cases, such as to distribute data during bulk inserts. See Migrating Chunks Manually for more details.
- Automatic. The balancer process automatically migrates chunks when there is an uneven distribution of a sharded collection’s chunks across the shards. See Migration Thresholds for more details.
The balancer is a background process that manages chunk migrations. If the difference in number of chunks between the largest and smallest shard exceed the migration thresholds, the balancer begins migrating chunks across the cluster to ensure an even distribution of data.
The balancer can run from any of the
mongos instances in a cluster.
In some cases, chunks can grow beyond the specified chunk size but cannot undergo a split. The most common scenario is when a chunk represents a single shard key value. Since the chunk cannot split, it continues to grow beyond the chunk size, becoming a jumbo chunk. These jumbo chunks can become a performance bottleneck as they continue to grow, especially if the shard key value occurs with high frequency.
The addition of new data or new shards can result in data distribution imbalances within the cluster. A particular shard may acquire more chunks than another shard, or the size of a chunk may grow beyond the configured maximum chunk size.
MongoDB ensures a balanced cluster using two processes: chunk splitting and the balancer.
Starting in MongoDB 2.6,
enabled by default. With
enabled, the source shard archives the documents in the migrated chunks
in a directory named after the collection namespace under the
moveChunk directory in the
If some error occurs during a migration, these files may be helpful in recovering documents affected during the migration.
Once the migration has completed successfully and there is no need to recover documents from these files, you may safely delete these files. Or, if you have an existing backup of the database that you can use for recovery, you may also delete these files after migration.