On this page
The shard key determines the distribution of the collection’s documents among the cluster’s shards. The shard key is either an indexed field or an indexed compound field that exists in every document in the collection.
MongoDB partitions data in the collection using ranges of shard key values. Each range, or chunk, defines a non-overlapping range of shard key values. MongoDB distributes the chunks, and their documents, among the shards in the cluster.
Shard keys are immutable and cannot be changed after insertion. See the system limits for sharded cluster for more information.
The index on the shard key cannot be a multikey index.
Hashed Shard Keys¶
New in version 2.4.
The field you choose as your hashed shard key should have a good cardinality, or large number of different values. Hashed keys work well with fields that increase monotonically like ObjectId values or timestamps.
If you shard an empty collection using a hashed shard key, MongoDB
will automatically create and migrate chunks so that each shard has
two chunks. You can control how many chunks MongoDB will create with
numInitialChunks parameter to
by manually creating chunks on the empty collection using the
To shard a collection using a hashed shard key, see Shard a Collection Using a Hashed Shard Key.
MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need to compute hashes.
Impacts of Shard Keys on Cluster Operations¶
The shard key affects write and query performance by determining how
the MongoDB partitions data in the cluster and how effectively the
mongos instances can direct operations to the
cluster. Consider the following operational impacts of shard key
Some possible shard keys will allow your application to take advantage of the increased write capacity that the cluster can provide, while others do not. Consider the following example where you shard by the values of the default _id field, which is ObjectId.
ObjectId values upon document creation to
produce a unique identifier for the object. However, the most
significant bits of data in this value represent a time stamp, which
means that they increment in a regular and predictable pattern. Even
though this value has high cardinality, when using this, any date, or
other monotonically increasing number as the shard key, all insert
operations will be storing data into a single chunk, and therefore, a
single shard. As a result, the write capacity of this shard will
define the effective write capacity of the cluster.
A shard key that increases monotonically will not hinder performance
if you have a very low insert rate, or if most of your write
distributed through your entire data set. Generally, choose shard keys
that have both high cardinality and will distribute write operations
across the entire cluster.
Typically, a computed shard key that has some amount of “randomness,” such as ones that include a cryptographic hash (i.e. MD5 or SHA1) of other content in the document, will allow the cluster to scale write operations. However, random shard keys do not typically provide query isolation, which is another important characteristic of shard keys.
New in version 2.4: MongoDB makes it possible to shard a collection on a hashed index. This can greatly improve write scaling. See Shard a Collection Using a Hashed Shard Key.
mongos provides an interface for applications to
interact with sharded clusters that hides the complexity of data
mongos receives queries from
applications, and uses metadata from the config server, to route queries to the
instances with the appropriate data. While the
succeeds in making all querying operational in sharded environments,
the shard key you select can have a profound affect on query
Generally, the fastest queries in a sharded environment are those that
mongos will route to a single shard, using the
shard key and the cluster meta data from the config server. For queries that don’t include the shard
mongos must query all shards, wait for their responses
and then return the result to the application. These “scatter/gather”
queries can be long running operations.
If your query includes the first component of a compound shard
key , the
mongos can route the
query directly to a single shard, or a small number of shards, which
provides better performance. Even if you query values of the shard
key that reside in different chunks, the
mongos will route
queries directly to specific shards.
To select a shard key for a collection:
- determine the most commonly included fields in queries for a given application
- find which of these operations are most performance dependent.
If this field has low cardinality (i.e not sufficiently selective) you should add a second field to the shard key making a compound shard key. The data may become more splittable with a compound shard key.
Sharded Cluster Query Routing for more information on query operations in the context of sharded clusters.
|||In many ways, you can think of the shard key a cluster-wide index. However, be aware that sharded systems cannot enforce cluster-wide unique indexes unless the unique field is in the shard key. Consider the Index Concepts page for more information on indexes and compound indexes.|
© MongoDB, Inc 2008-2017. MongoDB, Mongo, and the leaf logo are registered trademarks of MongoDB, Inc.