Fix This Page
Navigation

Segmenting Data by Application or Customer

On this page

MongoDB allows you to associate ranges of shard keys to one or more shards using tags. MongoDB routes data to a target shard respecting any configured tags.

This tutorial shows you how to segment data using tag-aware sharding.

Consider the following scenarios where segmenting data by application or customer may be necessary:

  • A database serving multiple applications
  • A database serving multiple customers
  • A database that requires isolating ranges or subsets of application or customer data
  • A database that requires resource allocation for ranges or subsets of application or customer data

This diagram illustrates a sharded cluster using tags to segment data based on application or customer. This allows for data to be isolated to specific shards. Additionally, each shard can have specific hardware allocated to fit the performance requirement of the data stored on that shard.

Diagram of Data Segmentation tags using Tag Aware Sharding

Scenario

An application tracks the score of a user along with a client field, storing scores in the gamify database under the users collection. Each possible value of client requires its own tag to allow for data segmentation. It also allows the administrator to optimize the hardware for each shard associated to a client for performance and cost.

The following documents represent a partial view of two users:

{
  "_id" : ObjectId("56f08c447fe58b2e96f595fa"),
  "client" : "robot",
  "userid" : 123,
  "high_score" : 181,
  ...,
}
{
  "_id" : ObjectId("56f08c447fe58b2e96f595fb"),
  "client" : "fruitos",
  "userid" : 456,
  "high_score" : 210,
  ...,
}

Shard Key

The users collection uses the { client : 1, userid : 1 } compound index as the shard key.

The client field in each document allows creating a tag range on each distinct client value.

The userid field provides a high cardinality and low frequency component to the shard key relative to country.

See Choosing a Shard Key for more general instructions on selecting a shard key.

Architecture

The application requires tagging each shard in the cluster for a specific client.

The sharded cluster deployment currently consists of four shards.

Diagram of Data Segmentation Architecture using Tag Aware Sharding

Tags

For this application, there are two client tags.

Diagram of Data Segmentation tags using Tag Aware Sharding
Robot client (“robot”)
This tag represents all documents where client : robot.
FruitOS client (“fruitos”)
This tag represents all documents where client : fruitos.

Write Operations

With tag-aware sharding, if an inserted or updated document matches a configured tag range, it can only be written to a shard with the related tag.

MongoDB can write documents that do not match a configured tag range to any shard in the cluster.

Note

The behavior described above requires the cluster to be in a steady state with no chunks violating a configured tag range. See the following section on the balancer for more information.

Read Operations

MongoDB can route queries to a specific shard if the query includes at least the client field.

For example, MongoDB can attempt a targeted read operation on the following query:

chatDB = db.getSiblingDB("gamify")
chatDB.users.find( { "client" : "robot" , "userid" : "123" } )

Queries without the client field perform broadcast operations.

Balancer

The balancer migrates the tagged chunks to the appropriate shard. Until the migration, shards may contain chunks that violate configured tag ranges and tags. Once balancing completes, shards should only contain chunks whose ranges do not violate its assigned tags and tag ranges.

Adding or removing tags or tag ranges can result in chunk migrations. Depending on the size of your data set and the number of chunks a tag range affects, these migrations may impact cluster performance. Consider running your balancer during specific scheduled windows. See Schedule the Balancing Window for a tutorial on how to set a scheduling window.

Security

For sharded clusters running with Role-Based Access Control, authenticate as a user with at least the clusterManager role on the admin database.

Procedure

You must be connected to a mongos associated to the target sharded cluster to proceed. You cannot create tags by connecting directly to a shard.

1

Disable the Balancer

The balancer must be disabled on the collection to ensure no migrations take place while configuring the new tags.

Use sh.disableBalancing(), specifying the namespace of the collection, to stop the balancer.

sh.disableBalancing("chat.message")

Use sh.isBalancerRunning() to check if the balancer process is currently running. Wait until any current balancing rounds have completed before proceeding.

2

Tag each shard

Tag shard0000 with the robot tag.

sh.addShardTag("shard0000", "robot")

Tag shard0001 with the robot tag.

sh.addShardTag("shard0001", "robot")

Tag shard0002 with the fruitos tag.

sh.addShardTag("shard0002", "fruitos")

Tag shard0003 with the fruitos tag.

sh.addShardTag("shard0003", "fruitos")

Run sh.status() to review the tags configured for the sharded cluster.

3

Define ranges for each tag

Define range for the robot client and associate it to the robot tag using the sh.addTagRange() method.

This method requires:

  • The full namespace of the target collection
  • The inclusive lower bound of the range
  • The exclusive upper bound of the range
  • The name of the tag
sh.addTagRange(
  "gamify.users",
  { "client" : "robot", "userid" : MinKey },
  { "client" : "robot", "userid" : MaxKey },
  "robot"
)

Define range for the fruitos client and associate it to the fruitos tag using the sh.addTagRange() method.

This method requires:

  • The full namespace of the target collection
  • The inclusive lower bound of the range
  • The exclusive upper bound of the range
  • The name of the tag
sh.addTagRange(
  "gamify.users",
  { "client" : "fruitos", "userid" : MinKey },
  { "client" : "fruitos", "userid" : MaxKey },
  "fruitos"
)

The MinKey and MaxKey values are reserved special values for comparisons. MinKey always compares as lower than every other possible value, while MaxKey always compares as higher than every other possible value. The configured ranges captures every user for each client.

4

Enable the Balancer

Re-enable the balancer to rebalance the cluster.

Use sh.enableBalancing(), specifying the namespace of the collection, to start the balancer.

sh.enableBalancing("chat.message")

Use sh.isBalancerRunning() to check if the balancer process is currently running.

5

Review the changes

The next time the balancer runs, it splits and migrates chunks across the shards respecting the tag ranges and tags.

Once balancing finishes, the shards tagged as robot only contain documents with client : robot, while shards tagged as fruitos only contain documents with client : fruitos.

You can confirm the chunk distribution by running sh.status().