OPTIONS

Sharded Cluster Balancer

The MongoDB balancer is a background process that monitors the number of chunks on each shard. When the number of chunks on a given shard reaches specific migration thresholds, the balancer attempts to automatically migrate chunks between shards and reach an equal number of chunks per shard.

The balancing procedure for sharded clusters is entirely transparent to the user and application layer, though there may be some performance impact while the procedure takes place.

Diagram of a collection distributed across three shards. For this collection, the difference in the number of chunks between the shards reaches the *migration thresholds* (in this case, 2) and triggers migration.

Cluster Balancer

The balancer process is responsible for redistributing the chunks of a sharded collection evenly among the shards for every sharded collection. By default, the balancer process is always enabled.

Any mongos instance in the cluster can start a balancing round. When a balancer process is active, the responsible mongos acquires a “lock” by modifying a document in the lock collection in the Config Database.

Changed in version 3.2: With replica set config servers, clock skew does not affect distributed lock management. If you are using mirrored config servers, large differences in timekeeping can lead to failed distributed locks. With mirrored config servers, minimize clock skew by running the network time protocol (NTP) ntpd on your servers.

To address uneven chunk distribution for a sharded collection, the balancer migrates chunks from shards with more chunks to shards with a fewer number of chunks. The balancer migrates the chunks, one at a time, until there is an even distribution of chunks for the collection across the shards. For details about chunk migration, see Chunk Migration Procedure.

Changed in version 2.6: Chunk migrations can have an impact on disk space. Starting in MongoDB 2.6, the source shard automatically archives the migrated documents by default. For details, see moveChunk directory.

Chunk migrations carry some overhead in terms of bandwidth and workload, both of which can impact database performance. The balancer attempts to minimize the impact by:

  • Moving only one chunk at a time. See also Asynchronous Chunk Migration Cleanup.
  • Starting a balancing round only when the difference in the number of chunks between the shard with the greatest number of chunks for a sharded collection and the shard with the lowest number of chunks for that collection reaches the migration threshold.

You may disable the balancer temporarily for maintenance. See Disable the Balancer for details.

You can also limit the window during which the balancer runs to prevent it from impacting production traffic. See Schedule the Balancing Window for details.

Note

The specification of the balancing window is relative to the local time zone of all individual mongos instances in the cluster.

Adding and Removing Shards from the Cluster

Adding a shard to a cluster creates an imbalance, since the new shard has no chunks. While MongoDB begins migrating data to the new shard immediately, it can take some time before the cluster balances. See the Add Shards to a Cluster tutorial for instructions on adding a shard to a cluster.

Removing a shard from a cluster creates a similar imbalance, since chunks residing on that shard must be redistributed throughout the cluster. While MongoDB begins draining a removed shard immediately, it can take some time before the cluster balances. Do not shutdown the servers associated to the removed shard during this process.

When you remove a shard in a cluster with an uneven chunk distribution, the balancer first removes the chunks from the draining shard and then balances the remaining uneven chunk distribution.

See the Remove Shards from an Existing Sharded Cluster tutorial for instructions on safely removing a shard from a cluster.

Chunk Migration Procedure

All chunk migrations use the following procedure:

  1. The balancer process sends the moveChunk command to the source shard.

  2. The source starts the move with an internal moveChunk command. During the migration process, operations to the chunk route to the source shard. The source shard is responsible for incoming write operations for the chunk.

  3. The destination shard builds any indexes required by the source that do not exist on the destination.

  4. The destination shard begins requesting documents in the chunk and starts receiving copies of the data.

  5. After receiving the final document in the chunk, the destination shard starts a synchronization process to ensure that it has the changes to the migrated documents that occurred during the migration.

  6. When fully synchronized, the source shard connects to the config database and updates the cluster metadata with the new location for the chunk.

  7. After the source shard completes the update of the metadata, and once there are no open cursors on the chunk, the source shard deletes its copy of the documents.

    Note

    If the balancer needs to perform additional chunk migrations from the source shard, the balancer can start the next chunk migration without waiting for the current migration process to finish this deletion step. See Asynchronous Chunk Migration Cleanup.

    Changed in version 2.6: The source shard automatically archives the migrated documents by default. For more information, see moveChunk directory.

The migration process ensures consistency and maximizes the availability of chunks during balancing.

Migration Thresholds

To minimize the impact of balancing on the cluster, the balancer only begins balancing after the distribution of chunks for a sharded collection has reached certain thresholds. The thresholds apply to the difference in number of chunks between the shard with the most chunks for the collection and the shard with the fewest chunks for that collection. The balancer has the following thresholds:

Number of Chunks Migration Threshold
Fewer than 20 2
20-79 4
80 and greater 8

The balancer stops running on the target collection when the difference between the number of chunks on any two shards for that collection is less than two, or a chunk migration fails.

Asynchronous Chunk Migration Cleanup

To migrate multiple chunks from a shard, the balancer migrates the chunks one at a time. However, the balancer does not wait for the current migration’s delete phase to complete before starting the next chunk migration. See Chunk Migration for the chunk migration process and the delete phase.

This queuing behavior allows shards to unload chunks more quickly in cases of heavily imbalanced cluster, such as when performing initial data loads without pre-splitting and when adding new shards.

This behavior also affects the moveChunk command, and migration scripts that use the moveChunk command may proceed more quickly.

In some cases, the delete phases may persist longer. If multiple delete phases are queued but not yet complete, a crash of the replica set’s primary can orphan data from multiple migrations.

The _waitForDelete, available as a setting for the balancer as well as the moveChunk command, can alter the behavior so that the delete phase of the current migration blocks the start of the next chunk migration. The _waitForDelete is generally for internal testing purposes. For more information, see Wait for Delete.

Chunk Migration and Replication

Changed in version 3.0: The default value secondaryThrottle became true for all chunk migrations.

The new writeConcern field in the balancer configuration document allows you to specify a write concern semantics with the _secondaryThrottle option.

By default, each document operation during chunk migration propagates to at least one secondary before the balancer proceeds with the next document, which is equivalent to a write concern of { w: 2 }. You can set the writeConcern option on the balancer configuration to set different write concern semantics.

To override this behavior and allow the balancer to continue without waiting for replication to a secondary, set the _secondaryThrottle parameter to false. See Change Replication Behavior for Chunk Migration to update the _secondaryThrottle parameter for the balancer.

For the moveChunk command, the secondaryThrottle parameter is independent of the _secondaryThrottle parameter for the balancer.

Independent of the secondaryThrottle setting, certain phases of the chunk migration have the following replication policy:

  • MongoDB briefly pauses all application writes to the source shard before updating the config servers with the new location for the chunk, and resumes the application writes after the update. The chunk move requires all writes to be acknowledged by majority of the members of the replica set both before and after committing the chunk move to config servers.
  • When an outgoing chunk migration finishes and cleanup occurs, all writes must be replicated to a majority of servers before further cleanup (from other outgoing migrations) or new incoming migrations can proceed.

Maximum Number of Documents Per Chunk to Migrate

MongoDB cannot move a chunk if the number of documents in the chunk exceeds either 250000 documents or 1.3 times the result of dividing the configured chunk size by the average document size. db.collection.stats() includes the avgObjSize field, which represents the average document size in the collection.

Shard Size

By default, MongoDB attempts to fill all available disk space with data on every shard as the data set grows. To ensure that the cluster always has the capacity to handle data growth, monitor disk usage as well as other performance metrics.

See the Change the Maximum Storage Size for a Given Shard tutorial for instructions on setting the maximum size for a shard.