- Sharding >
- Hashed Sharding
Hashed Sharding¶
Hashed sharding uses a hashed index to partition data across your sharded cluster. Hashed indexes compute the hash value of a single field as the index value; this value is used as your shard key. [1]
Hashed sharding provides more even data distribution across the sharded
cluster at the cost of reducing Targeted Operations vs. Broadcast Operations. Post-hash,
documents with “close” shard key values are unlikely to be on the same
chunk or shard - the mongos
is more likely to perform
Broadcast Operations to fulfill a given ranged query.
mongos
can target queries with equality matches to a single shard.
Tip
MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need to compute hashes.
Warning
MongoDB hashed
indexes truncate floating point numbers to 64-bit integers
before hashing. For example, a hashed
index would store the same
value for a field that held a value of 2.3
, 2.2
, and 2.9
.
To prevent collisions, do not use a hashed
index for floating
point numbers that cannot be reliably converted to 64-bit
integers (and then back to floating point). MongoDB hashed
indexes do
not support floating point values larger than 253.
To see what the hashed value would be for a key, see
convertShardKeyToHashed()
.
[1] | Starting in version 4.0, the mongo shell provides the
method convertShardKeyToHashed() . This method uses the
same hashing function as the hashed index and can be used to see
what the hashed value would be for a key. |
Hashed Sharding Shard Key¶
The field you choose as your hashed shard key should have a good
cardinality, or large number of different values.
Hashed keys are ideal for shard keys with fields that change
monotonically like ObjectId values or
timestamps. A good example of this is the default _id
field, assuming
it only contains ObjectID values.
To shard a collection using a hashed shard key, see Shard a Collection.
Hashed vs Ranged Sharding¶
Given a collection using a monotonically increasing value X
as the
shard key, using ranged sharding results in a distribution of incoming
inserts similar to the following:
Since the value of X
is always increasing, the chunk with an upper bound
of maxKey receives the majority incoming writes. This
restricts insert operations to the single shard containing this chunk, which
reduces or removes the advantage of distributed writes in a sharded cluster.
By using a hashed index on X
, the distribution of inserts is similar
to the following:
Since the data is now distributed more evenly, inserts are efficiently distributed throughout the cluster.
Shard the Collection¶
Use the sh.shardCollection()
method, specifying the full namespace
of the collection and the target hashed index
to use as the shard key.
Important
Once you shard a collection, the selection of the shard key is immutable; i.e. you cannot select a different shard key for that collection.
Starting in MongoDB 4.2, you can update a document’s shard key value unless the shard key field is the immutable
_id
field. For details on updating the shard key, see Change a Document’s Shard Key Value.Before MongoDB 4.2, a document’s shard key field value is immutable.
Shard a Populated Collection¶
If you shard a populated collection using a hashed shard key:
- The sharding operation creates the initial chunk(s) to cover the entire range of the shard key values. The number of chunks created depends on the configured chunk size.
- After the initial chunk creation, the balancer migrates these initial chunks across the shards as appropriate as well as manages the chunk distribution going forward.
Shard an Empty Collection¶
If you shard an empty collection using a hashed shard key:
- With no zones and zone ranges specified for
the empty or non-existing collection:
- The sharding operation creates empty chunks to cover the entire
range of the shard key values and performs an initial chunk
distribution. By default, the operation
creates 2 chunks per shard and migrates across the cluster. You can
use
numInitialChunks
option to specify a different number of initial chunks. This initial creation and distribution of chunks allows for faster setup of sharding. - After the initial distribution, the balancer manages the chunk distribution going forward.
- The sharding operation creates empty chunks to cover the entire
range of the shard key values and performs an initial chunk
distribution. By default, the operation
creates 2 chunks per shard and migrates across the cluster. You can
use
- With zones and zone ranges specified for the
empty or a non-existing collection (Available starting in MongoDB
4.0.3),
- The sharding operation creates empty chunks for the defined zone ranges as well as any additional chunks to cover the entire range of the shard key values and performs an initial chunk distribution based on the zone ranges. This initial creation and distribution of chunks allows for faster setup of zoned sharding.
- After the initial distribution, the balancer manages the chunk distribution going forward.
See also
To learn how to deploy a sharded cluster and implement hashed sharding, see Deploy a Sharded Cluster.