Troubleshoot Sharded Clusters¶
On this page
- Application Servers or
mongosInstances Become Unavailable
- A Single
mongodBecomes Unavailable in a Shard
- All Members of a Shard Become Unavailable
- A Config Server Replica Set Member Become Unavailable
- Renaming Mirrored Config Servers and Cluster Availability
- Cursor Fails Because of Stale Config Data
- Shard Keys and Cluster Availability
- Config Database String Error
- Avoid Downtime when Moving Config Servers
moveChunk commit failedError
This page describes common strategies for troubleshooting sharded cluster deployments.
Renaming Mirrored Config Servers and Cluster Availability¶
If the sharded cluster is using mirrored config servers instead of a
replica set and the name or address that a sharded cluster uses to
connect to a config server changes, you must restart every
mongos instance in the sharded cluster.
Avoid downtime by using CNAMEs to identify config servers within the
To avoid downtime when renaming config servers, use DNS names unrelated to physical or virtual hostnames to refer to your config servers.
Generally, refer to each config server using the DNS alias (e.g. a
CNAME record). When specifying the config server connection string to
mongos, use these names. These records make it possible to
change the IP address or rename config servers without changing the
connection string and without having to restart the entire cluster.
Cursor Fails Because of Stale Config Data¶
could not initialize cursor across all shards because : stale config detected
This warning should not propagate back to your application. The
warning will repeat until all the
mongos instances refresh
their caches. To force an instance to refresh its cache, run the
Config Database String Error¶
Changed in version 3.2: Starting in MongoDB 3.2, config servers can be deployed as replica sets
by default. The
mongos instances for the sharded cluster
must specify the same config server replica set name but can specify
hostname and port of different members of the replica set.
Avoid Downtime when Moving Config Servers¶
Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers without downtime.
moveChunk commit failed Error¶
At the end of a chunk migration, the shard must connect to the config database to update the chunk’s record in the cluster metadata. If the shard fails to connect to the config database, MongoDB reports the following error:
ERROR: moveChunk commit failed: version is at <n>|<nn> instead of <N>|<NN>" and "ERROR: TERMINATING"
When this happens, the primary member of the shard’s replica set then terminates to protect data consistency. If a secondary member can access the config database, data on the shard becomes accessible again after an election.