- Administration >
- Administration Concepts >
- Optimization Strategies for MongoDB >
- Design Notes
Design Notes¶
On this page
This page details features of MongoDB that may be important to keep in mind when developing applications.
Schema Considerations¶
Dynamic Schema¶
Data in MongoDB has a dynamic schema. Collections do not enforce document structure. This facilitates iterative development and polymorphism. Nevertheless, collections often hold documents with highly homogeneous structures. See Data Modeling Concepts for more information.
Some operational considerations include:
- the exact set of collections to be used;
- the indexes to be used: with the exception of the
_id
index, all indexes must be created explicitly; - shard key declarations: choosing a good shard key is very important as the shard key cannot be changed once set.
Avoid importing unmodified data directly from a relational database. In general, you will want to “roll up” certain data into richer documents that take advantage of MongoDB’s support for embedded documents and nested arrays.
Case Sensitive Strings¶
MongoDB strings are case sensitive. So a search for "joe"
will not
find "Joe"
.
Consider:
- storing data in a normalized case format, or
- using regular expressions ending with the
i
option, and/or - using $toLower or $toUpper in the aggregation framework.
Type Sensitive Fields¶
MongoDB data is stored in the BSON format, a binary encoded serialization of JSON-like documents. BSON encodes additional type information. See bsonspec.org for more information.
Consider the following document which has a field x
with the
string value "123"
:
Then the following query which looks for a number value 123
will
not return that document:
General Considerations¶
By Default, Updates Affect one Document¶
To update multiple documents that meet your query criteria, set the
update
multi
option to true
or 1
.
See: Update Multiple Documents.
Prior to MongoDB 2.2, you would specify the upsert
and multi
options in the update
method as positional boolean
options. See: the update
method reference documentation.
BSON Document Size Limit¶
The BSON Document Size
limit is currently
set at 16MB per document. If you require larger documents, use GridFS.
No Fully Generalized Transactions¶
MongoDB does not have fully generalized transactions. If you model your data using rich documents that closely resemble your application’s objects, each logical object will be in one MongoDB document. MongoDB allows you to modify a document in a single atomic operation. These kinds of data modification pattern covers most common uses of transactions in other systems.
Replica Set Considerations¶
Use an Odd Number of Replica Set Members¶
Replica sets perform consensus elections. To ensure that elections will proceed successfully, either use an odd number of members, typically three, or else use an arbiter to ensure an odd number of votes.
Keep Replica Set Members Up-to-Date¶
MongoDB replica sets support automatic failover. It is important for your secondaries to be up-to-date. There are various strategies for assessing consistency:
- Use monitoring tools to alert you to lag events. See Monitoring for MongoDB for a detailed discussion of MongoDB’s monitoring options.
- Specify appropriate write concern.
- If your application requires manual fail over, you can configure your secondaries as priority 0. Priority 0 secondaries require manual action for a failover. This may be practical for a small replica set, but large deployments should fail over automatically.
See also
Sharding Considerations¶
Pick your shard keys carefully. You cannot choose a new shard key for a collection that is already sharded.
Shard key values are immutable.
When enabling sharding on an existing collection, MongoDB imposes a maximum size on those collections to ensure that it is possible to create chunks. For a detailed explanation of this limit, see:
<sharding-existing-collection-data-size>
.To shard large amounts of data, create a new empty sharded collection, and ingest the data from the source collection using an application level import operation.
Unique indexes are not enforced across shards except for the shard key itself. See Enforce Unique Keys for Sharded Collections.
Consider pre-splitting a sharded collection before a massive bulk import.
Analyze Performance¶
As you develop and operate applications with MongoDB, you may want to analyze the performance of the database as the application. Consider the following as you begin to investigate the performance of MongoDB.
Overview¶
Degraded performance in MongoDB is typically a function of the relationship between the quantity of data stored in the database, the amount of system RAM, the number of connections to the database, and the amount of time the database spends in a locked state.
In some cases performance issues may be transient and related to traffic load, data access patterns, or the availability of hardware on the host system for virtualized environments. Some users also experience performance limitations as a result of inadequate or inappropriate indexing strategies, or as a consequence of poor schema design patterns. In other situations, performance issues may indicate that the database may be operating at capacity and that it is time to add additional capacity to the database.
The following are some causes of degraded performance in MongoDB.
Locks¶
MongoDB uses a locking system to ensure data set consistency. However, if
certain operations are long-running, or a queue forms, performance
will slow as requests and operations wait for the lock. Lock-related
slowdowns can be intermittent. To see if the lock has been affecting
your performance, look to the data in the
globalLock section of the serverStatus
output. If
globalLock.currentQueue.total
is consistently high,
then there is a chance that a large number of requests are waiting for
a lock. This indicates a possible concurrency issue that may be affecting
performance.
If globalLock.totalTime
is
high relative to uptime
, the database has
existed in a lock state for a significant amount of time.
Long queries are often the result of a number of factors: ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufficient RAM resulting in page faults and disk reads.
Memory Use¶
MongoDB uses memory mapped files to store data. Given a data set of sufficient size, the MongoDB process will allocate all available memory on the system for its use. While this is part of the design, and affords MongoDB superior performance, the memory mapped files make it difficult to determine if the amount of RAM is sufficient for the data set.
The memory usage statuses metrics of the
serverStatus
output can provide insight into MongoDB’s
memory use. Check the resident memory use
(i.e. mem.resident
): if this
exceeds the amount of system memory and there is a significant amount
of data on disk that isn’t in RAM, you may have exceeded the capacity
of your system.
You should also check the amount of mapped memory (i.e. mem.mapped
.) If this value is greater than the amount
of system memory, some operations will require disk access page
faults to read data from virtual memory and negatively
affect performance.
Page Faults¶
Page faults triggered by MongoDB are reported as the total number of page
faults in one second. To check for page faults, see the
extra_info.page_faults
value
in the serverStatus
output.
MongoDB on Windows counts both hard and soft page faults.
The MongoDB page fault counter may increase dramatically in moments of poor performance and may correlate with limited physical memory environments. Page faults also can increase while accessing much larger data sets, for example, scanning an entire collection. Limited and sporadic MongoDB page faults do not necessarily indicate a problem or a need to tune the database.
A single page fault completes quickly and is not problematic. However, in aggregate, large volumes of page faults typically indicate that MongoDB is reading too much data from disk. In many situations, MongoDB’s read locks will “yield” after a page fault to allow other processes to read and avoid blocking while waiting for the next page to read into memory. This approach improves concurrency, and also improves overall throughput in high volume systems.
Increasing the amount of RAM accessible to MongoDB may help reduce the
frequency of page faults. If this is not possible, you may want to consider
deploying a sharded cluster or adding shards
to your deployment to distribute load among mongod
instances.
See What are page faults? for more information.
Number of Connections¶
In some cases, the number of connections between the application layer
(i.e. clients) and the database can overwhelm the ability of the
server to handle requests. This can produce performance
irregularities. The following fields in the serverStatus
document can provide insight:
globalLock.activeClients
contains a counter of the total number of clients with active operations in progress or queued.connections
is a container for the following two fields:
If requests are high because there are numerous concurrent application
requests, the database may have trouble keeping up with demand. If
this is the case, then you will need to increase the capacity of your
deployment. For read-heavy applications increase the size of your
replica set and distribute read operations to
secondary members. For write heavy applications, deploy
sharding and add one or more shards to a
sharded cluster to distribute load among mongod
instances.
Spikes in the number of connections can also be the result of application or driver errors. All of the officially supported MongoDB drivers implement connection pooling, which allows clients to use and reuse connections more efficiently. Extremely high numbers of connections, particularly without corresponding workload is often indicative of a driver or other configuration error.
Unless constrained by system-wide limits MongoDB has no limit on
incoming connections. You can modify system limits
using the ulimit
command, or by editing your system’s
/etc/sysctl
file. See UNIX ulimit Settings for more
information.
Database Profiling¶
MongoDB’s “Profiler” is a database profiling system that can help identify inefficient queries and operations.
The following profiling levels are available:
Level | Setting |
---|---|
0 | Off. No profiling |
1 | On. Only includes “slow” operations |
2 | On. Includes all operations |
Enable the profiler by setting the
profile
value using the following command in the
mongo
shell:
The slowOpThresholdMs
setting defines what constitutes a “slow”
operation. To set the threshold above which the profiler considers
operations “slow” (and thus, included in the level 1
profiling
data), you can configure slowOpThresholdMs
at runtime as an argument to
the db.setProfilingLevel()
operation.
See
The documentation of db.setProfilingLevel()
for more
information about this command.
By default, mongod
records all “slow” queries to its
log
, as defined by slowOpThresholdMs
.
Note
Because the database profiler can negatively impact performance, only enable profiling for strategic intervals and as minimally as possible on production systems.
You may enable profiling on a per-mongod
basis. This
setting will not propagate across a replica set or
sharded cluster.
You can view the output of the profiler in the system.profile
collection of your database by issuing the show profile
command in
the mongo
shell, or with the following operation:
This returns all operations that lasted longer than 100 milliseconds.
Ensure that the value specified here (100
, in this example) is above the
slowOpThresholdMs
threshold.
See also
Optimization Strategies for MongoDB addresses strategies that may improve the performance of your database queries and operations.