OPTIONS

Tag Aware Sharding

In sharded clusters, you can tag specific ranges of the shard key and associate those tags with a shard or subset of shards. MongoDB routes reads and writes that fall into a tagged range only to those shards configured for the tag.

Some of the deployment patterns that can be implemented through tag aware sharding are as follows:

  • Isolate a specific subset of data on a specific set of shards.
  • Ensure that the most relevant data reside on shards that are geographically closest to the application servers.
  • Route data to shards based on the hardware / performance of the shard hardware.

Each tag represents one or more ranges of shard key values. Tag ranges are always inclusive of the lower boundary and exclusive of the upper boundary. You can assign multiple tags to a shard, as long as the ranges do not conflict.

The following image illustrates a sharded cluster with three shards and two tags. The A tag represents a range with a lower boundary of 1 and an upper bound of 10. The B tag represents a range with a lower boundary of 10 and an upper boundary of 20. Shard Alpha and Beta have the A tag. Shard Beta also has the B tag. Shard Charlie has no tags. The cluster is in a steady state and no chunks violate any of the tags.

Diagram of tag-aware data distribution

Behavior and Operations

The balancer attempts to evenly distribute a sharded collection’s chunks across all shards in the cluster.

With tag aware sharding, the balancer respects tags and tag ranges during chunk migrations. For each chunk marked for migration, the balancer checks for destination shards with valid tag ranges. The balancer ensures tag ranges align with chunk boundaries by splitting chunks where necessary. Chunks that are not covered by a tag range can exist on any shard in the cluster and are migrated normally.

During balancing rounds, if the balancer detects that any chunks violate the configured tags for a given shard, the balancer migrates those chunks to a shard where no conflict exists.

After configuring a tag with a shard key range and associating it with a shard or shards, the cluster may take some time to balance the data among the shards. This depends on the division of chunks and the current distribution of data in the cluster. Once balancing completes, reads and writes for documents within the tag range are routed only to the tagged shard or shards.

Once configured, the balancer respects tag ranges during future balancing rounds.

Considerations

Shard Key

Tags must use fields contained in the shard key. If using a compound shard key, tag ranges must include the prefix of the shard key.

For example, given a shard key { a : 1, b : 2, c : 3 }, to create a tag range based on b, the tag must include a as the prefix. Creating a tag range based on c requires including a and b as the prefix.

You cannot create tag ranges on fields not included in the shard key. For example, if you wanted to use tag aware sharding to partition data based on geographic location, the shard key would need at least one field that contained geographic data.

When choosing a shard key for a collection, consider what fields you might want to use for tag aware sharding. After sharding, you cannot change the shard key. See Choosing a Shard Key for considerations in choosing a shard key.

Shard Tags vs Replica Set Tags

Shard key range tags are distinct from replica set member tags.

Hashed Shard Keys and Shard Tagging

When using tags on a hashed shard key, tagged ranges represent the hashed shard key. Given a shard key { a : 1 } and a tag with a lower bound of 1 and an upper bound of 5, the bounds represent the hashed value of a, and not the actual value. As of such, there is no guarantee that MongoDB routes documents with a shard key of 1 to 5 to the tagged shard or shards. MongoDB routes any document where the hashed shard key falls within the range of 1 or 5 to the tagged shard or shards. In general, range-based shard tags on a hashed shard key may exhibit unexpected behavior.

It is possible to tag the entire range of shard key values using minkey and maxkey to guarantee that MongoDB restricts all the data for a specific collection to the shard or shards with the same tag.

Shard Tag Boundaries

Shard ranges are always inclusive of the lower value and exclusive of the upper boundary.