- Sharding >
- Sharding Concepts >
- Sharded Cluster Behavior >
- Sharded Cluster High Availability
Sharded Cluster High Availability¶
On this page
- Application Servers or
mongos
Instances Become Unavailable - A Single
mongod
Becomes Unavailable in a Shard - All Members of a Replica Set Become Unavailable
- One or Two Config Servers Become Unavailable
- Renaming Config Servers and Cluster Availability
- Shard Keys and Cluster Availability
A production cluster has no single point of failure. This section introduces the availability concerns for MongoDB deployments in general and highlights potential failure scenarios and available resolutions.
Renaming Config Servers and Cluster Availability¶
If the name or address that a sharded cluster uses to connect
to a config server changes, you must restart every
mongod
and mongos
instance in the sharded
cluster. Avoid downtime by using CNAMEs to identify config servers
within the MongoDB deployment.
To avoid downtime when renaming config servers, use DNS names unrelated to physical or virtual hostnames to refer to your config servers.
Generally, refer to each config server using the DNS alias (e.g. a
CNAME record). When specifying the config server connection string to
mongos
, use these names. These records make it possible to
change the IP address or rename config servers without changing the
connection string and without having to restart the entire cluster.
Shard Keys and Cluster Availability¶
The most important consideration when choosing a shard key are:
- to ensure that MongoDB will be able to distribute data evenly among shards, and
- to scale writes across the cluster, and
- to ensure that
mongos
can isolate most queries to a specificmongod
.
Furthermore:
- Each shard should be a replica set, if a specific
mongod
instance fails, the replica set members will elect another to be primary and continue operation. However, if an entire shard is unreachable or fails for some reason, that data will be unavailable. - If the shard key allows the
mongos
to isolate most operations to a single shard, then the failure of a single shard will only render some data unavailable. - If your shard key distributes data required for every operation throughout the cluster, then the failure of the entire shard will render the entire cluster unavailable.
In essence, this concern for reliability simply underscores the importance of choosing a shard key that isolates query operations to a single shard.