Administration >
Production Checklist

Production Checklist¶

The following checklists provide recommendations that will help you avoid issues in your production MongoDB deployment.

Operations Checklist¶

Filesystem¶

Align your disk partitions with your RAID configuration.
Avoid using NFS drives for your dbPath. Using NFS drives can result in degraded and unstable performance. See: Remote Filesystems for more information.
- VMWare users should use VMWare virtual drives over NFS.
Linux/Unix: format your drives into XFS or EXT4. If possible, use XFS as it generally performs better with MongoDB.
- With the WiredTiger storage engine, use of XFS is strongly recommended to avoid performance issues found when using EXT4 with WiredTiger.
- If using RAID, you may need to configure XFS with your RAID geometry.
Windows: use the NTFS file system. Do not use any FAT file system (i.e. FAT 16/32/exFAT).

Replication¶

Verify that all non-hidden replica set members are identically provisioned in terms of their RAM, CPU, disk, network setup, etc.
Configure the oplog size to suit your use case:
- The replication oplog window should cover normal maintenance and downtime windows to avoid the need for a full resync.
- The replication oplog window should cover the time needed to restore a replica set member, either by an initial sync or by restoring from the last backup.
Ensure that your replica set includes at least three data-bearing nodes with w:majority write concern. Three data-bearing nodes are required for replica set-wide data durability.
Use hostnames when configuring replica set members, rather than IP addresses.
Ensure full bidirectional network connectivity between all mongod instances.
Ensure that each host can resolve itself.
Ensure that your replica set contains an odd number of voting members.
Ensure that mongod instances have 0 or 1 votes.
For high availability, deploy your replica set into a minimum of three data centers.

Sharding¶

Place your config servers on dedicated hardware for optimal performance in large clusters. Ensure that the hardware has enough RAM to hold the data files entirely in memory and that it has dedicated storage.
Use NTP to synchronize the clocks on all components of your sharded cluster.
Ensure full bidirectional network connectivity between mongod, mongos and config servers.
Use CNAMEs to identify your config servers to the cluster so that you can rename and renumber your config servers without downtime.

Journaling: MMAPv1 Storage Engine¶

Ensure that all instances use journaling.
Place the journal on its own low-latency disk for write-intensive workloads. Note that this will affect snapshot-style backups as the files constituting the state of the database will reside on separate volumes.

Hardware¶

Use RAID10 and SSD drives for optimal performance.
SAN and Virtualization:
- Ensure that each mongod has provisioned IOPS for its dbPath, or has its own physical drive or LUN.
- Avoid dynamic memory features, such as memory ballooning, when running in virtual environments.
- Avoid placing all replica set members on the same SAN, as the SAN can be a single point of failure.

Deployments to Cloud Hardware¶

Windows Azure: Adjust the TCP keepalive (tcp_keepalive_time) to 100-120. The default TTL for TCP connections on Windows Azure load balancers is too slow for MongoDB’s connection pooling behavior.
Use MongoDB version 2.6.4 or later on systems with high-latency storage, such as Windows Azure, as these versions include performance improvements for those systems. See: Azure Deployment Recommendations for more information.

Operating System Configuration¶

Linux¶

Turn off transparent hugepages and defrag. See Transparent Huge Pages Settings for more information.
Adjust the readahead settings on the devices storing your database files to suit your use case. If your working set is bigger that the available RAM, and the document access pattern is random, consider lowering the readahead to 32 or 16. Evaluate different settings to find an optimal value that maximizes the resident memory and lowers the number of page faults.
Use the noop or deadline disk schedulers for SSD drives.
Use the noop disk scheduler for virtualized drives in guest VMs.
Disable NUMA or set vm.zone_reclaim_mode to 0 and run mongod instances with node interleaving. See: MongoDB and NUMA Hardware for more information.
Adjust the ulimit values on your hardware to suit your use case. If multiple mongod or mongos instances are running under the same user, scale the ulimit values accordingly. See: UNIX ulimit Settings for more information.
Use noatime for the dbPath mount point.
Configure sufficient file handles (fs.file-max), kernel pid limit (kernel.pid_max), and maximum threads per process (kernel.threads-max) for your deployment. For large systems, the following values provide a good starting point:
- fs.file-max value of 98000,
- kernel.pid_max value of 64000, and
- kernel.threads-max value of 64000
Ensure that your system has swap space configured. Refer to your operating system’s documentation for details on appropriate sizing.
Ensure that the system default TCP keepalive is set correctly. A value of 300 often provides better performance for replica sets and sharded clusters. See: Does TCP keepalive time affect MongoDB Deployments? in the Frequently Asked Questions for more information.

Windows¶

Consider disabling NTFS “last access time” updates. This is analogous to disabling atime on Unix-like systems.

Backups¶

Schedule periodic tests of your back up and restore process to have time estimates on hand, and to verify its functionality.

Monitoring¶

Use MongoDB Cloud Manager or Ops Manager, an on-premise solution available in MongoDB Enterprise Advanced or another monitoring system to monitor key database metrics and set up alerts for them. Include alerts for the following metrics:
- lock percent (for the MMAPv1 storage engine)
- replication lag
- replication oplog window
- assertions
- queues
- page faults
Monitor hardware statistics for your servers. In particular, pay attention to the disk use, CPU, and available disk space.

In the absence of disk space monitoring, or as a precaution:
- Create a dummy 4GB file on the storage.dbPath drive to ensure available space if the disk becomes full.
- A combination of cron+df can alert when disk space hits a high-water mark, if no other monitoring tool is available.

Load Balancing¶

Configure load balancers to enable “sticky sessions” or “client affinity”, with a sufficient timeout for existing connections.
Avoid placing load balancers between MongoDB cluster or replica set components.

Development¶

Data Durability¶

Ensure that your replica set includes at least three data-bearing nodes with w:majority write concern. Three data-bearing nodes are required for replica-set wide data durability.
Ensure that all instances use journaling.

Schema Design¶

Ensure that your schema design does not rely on indexed arrays that grow in length without bound. Typically, best performance can be achieved when such indexed arrays have fewer than 1000 elements.

Replication¶

Do not use secondary reads to scale overall read throughput. See: Can I use more replica nodes to scale for an overview of read scaling. For information about secondary reads, see: Read Preference.

Sharding¶

Ensure that your shard key distributes the load evenly on your shards. See: Considerations for Selecting Shard Keys for more information.
Use targeted queries for workloads that need to scale with the number of shards.
Always read from primary nodes for non-targeted queries that may be sensitive to stale or orphaned data.
Pre-split and manually balance chunks when inserting large data sets into a new non-hashed sharded collection. Pre-splitting and manually balancing enables the insert load to be distributed among the shards, increasing performance for the initial load.

Drivers¶

Make use of connection pooling. Most MongoDB drivers support connection pooling. Adjust the connection pool size to suit your use case, beginning at 110-115% of the typical number of concurrent database requests.
Ensure that your applications handle transient write and read errors during replica set elections.
Ensure that your applications handle failed requests and retry them if applicable. Drivers do not automatically retry failed requests.
Use exponential backoff logic for database request retries.
Use cursor.maxTimeMS() for reads and wtimeout for writes if you need to cap execution time for database operations.

← Exit Codes and Statuses Operations Checklist →