- Frequently Asked Questions >
- FAQ: MongoDB for Application Developers
FAQ: MongoDB for Application Developers¶
On this page
- When does MongoDB write updates to disk?
- How do I do transactions and locking in MongoDB?
- How do you aggregate data with MongoDB?
- Why does MongoDB log so many “Connection Accepted” events?
- Does MongoDB run on Amazon EBS?
- Why are MongoDB’s data files so large?
- How do I optimize storage use for small documents?
- When should I use GridFS?
- How does MongoDB address SQL or Query injection?
- How does MongoDB provide concurrency?
- What is the compare order for BSON types?
- When multiplying values of mixed types, what type conversion rules apply?
- How do I query for fields that have null values?
- How do I isolate cursors from intervening write operations?
- When should I embed documents within other documents?
- Where can I learn more about data modeling in MongoDB?
- Can I manually pad documents to prevent moves during updates?
This document answers common questions about application development using MongoDB.
If you don’t find the answer you’re looking for, check the complete list of FAQs or post your question to the MongoDB User Mailing List.
When does MongoDB write updates to disk?¶
MongoDB flushes writes to disk on a regular interval.
- MMAPv1
In the default configuration for the MMAPv1 storage engine, MongoDB writes to the data files on disk every 60 seconds and writes to the journal files roughly every 100 milliseconds.
To change the interval for writing to the data files, use the
storage.syncPeriodSecs
setting. For the journal files, seestorage.journal.commitIntervalMs
setting.These values represent the maximum amount of time between the completion of a write operation and when MongoDB writes to the data files or to the journal files. In many cases MongoDB and the operating system flush data to disk more frequently, so that the above values represents a theoretical maximum.
While MongoDB writes to journal files promptly, MongoDB writes to the data files lazily. MongoDB may wait to write data to the data files for as much as one minute by default. This does not affect durability, as the journal has enough information to ensure crash recovery.
This “lazy” strategy provides advantages in situations where the database receives a thousand increments to an object within one second. With this “lazy” strategy, MongoDB only needs to flush this data to disk once. To modify this strategy, you can use
fsync
and Write Concern Reference as well as modify the aforementioned configuration options.- WiredTiger
For the data files, MongoDB creates checkpoints (i.e. write the snapshot data to disk) at intervals of 60 seconds or 2 gigabytes of data to write, depending on which occurs first. For the journal data,
- WiredTiger sets checkpoints for journal data at intervals of 60 seconds or 2 GB of data, depending on which occurs first.
- Because MongoDB uses a log file size limit of 100 MB, WiredTiger creates a new journal file approximately every 100MB of data. When WiredTiger creates a new journal file, WiredTiger syncs the previous journal file.
- If the write operation includes a write concern of
j:true
, WiredTiger forces a sync on commit of that operation as well as anything that has happened before.
How do I do transactions and locking in MongoDB?¶
MongoDB does not have support for traditional locking or complex transactions with rollback. MongoDB aims to be lightweight, fast, and predictable in its performance. This is similar to the MySQL MyISAM autocommit model. By keeping transaction support extremely simple, MongoDB can provide greater performance especially for partitioned or replicated systems with a number of database server processes.
MongoDB does have support for atomic operations within a single document. Given the possibilities provided by nested documents, this feature provides support for a large number of use-cases.
See also
The Atomicity and Transactions page.
How do you aggregate data with MongoDB?¶
In version 2.1 and later, you can use the new aggregation
framework, with the
aggregate
command.
MongoDB also supports map-reduce with the
mapReduce
command, as well as basic aggregation with the
group
, count
, and
distinct
. commands.
See also
The Aggregation page.
Why does MongoDB log so many “Connection Accepted” events?¶
If you see a very large number connection and re-connection messages in your MongoDB log, then clients are frequently connecting and disconnecting to the MongoDB server. This is normal behavior for applications that do not use request pooling, such as CGI. Consider using FastCGI, an Apache Module, or some other kind of persistent application server to decrease the connection overhead.
If these connections do not impact your performance you can use the
run-time quiet
option or the command-line option
--quiet
to suppress these messages from the
log.
Does MongoDB run on Amazon EBS?¶
Yes.
MongoDB users of all sizes have had a great deal of success using MongoDB on the EC2 platform using EBS disks.
See also
Why are MongoDB’s data files so large?¶
MongoDB aggressively preallocates data files to reserve space and
avoid file system fragmentation. You can use the storage.smallFiles
setting to modify the file preallocation strategy.
How do I optimize storage use for small documents?¶
Each MongoDB document contains a certain amount of overhead. This overhead is normally insignificant but becomes significant if all documents are just a few bytes, as might be the case if the documents in your collection only have one or two fields.
Consider the following suggestions and strategies for optimizing storage utilization for these collections:
Use the
_id
field explicitly.MongoDB clients automatically add an
_id
field to each document and generate a unique 12-byte ObjectId for the_id
field. Furthermore, MongoDB always indexes the_id
field. For smaller documents this may account for a significant amount of space.To optimize storage use, users can specify a value for the
_id
field explicitly when inserting documents into the collection. This strategy allows applications to store a value in the_id
field that would have occupied space in another portion of the document.You can store any value in the
_id
field, but because this value serves as a primary key for documents in the collection, it must uniquely identify them. If the field’s value is not unique, then it cannot serve as a primary key as there would be collisions in the collection.Use shorter field names.
MongoDB stores all field names in every document. For most documents, this represents a small fraction of the space used by a document; however, for small documents the field names may represent a proportionally large amount of space. Consider a collection of documents that resemble the following:
If you shorten the field named
last_name
tolname
and the field namedbest_score
toscore
, as follows, you could save 9 bytes per document.Shortening field names reduces expressiveness and does not provide considerable benefit for larger documents and where document overhead is not of significant concern. Shorter field names do not reduce the size of indexes, because indexes have a predefined structure.
In general it is not necessary to use short field names.
Embed documents.
In some cases you may want to embed documents in other documents and save on the per-document overhead.
When should I use GridFS?¶
For documents in a MongoDB collection, you should always use GridFS for storing files larger than 16 MB.
In some situations, storing large files may be more efficient in a MongoDB database than on a system-level filesystem.
- If your filesystem limits the number of files in a directory, you can use GridFS to store as many files as needed.
- When you want to keep your files and metadata automatically synced
and deployed across a number of systems and facilities. When using
geographically distributed replica sets MongoDB can distribute
files and their metadata automatically to a number of
mongod
instances and facilities. - When you want to access information from portions of large files without having to load whole files into memory, you can use GridFS to recall sections of files without reading the entire file into memory.
Do not use GridFS if you need to update the content of the entire file atomically. As an alternative you can store multiple versions of each file and specify the current version of the file in the metadata. You can update the metadata field that indicates “latest” status in an atomic update after uploading the new version of the file, and later remove previous versions if needed.
Furthermore, if your files are all smaller the 16 MB
BSON Document Size
limit, consider storing the file manually
within a single document. You may use the BinData data type to store
the binary data. See your drivers
documentation for details on using BinData.
For more information on GridFS, see GridFS.
How does MongoDB address SQL or Query injection?¶
BSON¶
As a client program assembles a query in MongoDB, it builds a BSON object, not a string. Thus traditional SQL injection attacks are not a problem. More details and some nuances are covered below.
MongoDB represents queries as BSON objects. Typically client libraries provide a convenient, injection free, process to build these objects. Consider the following C++ example:
Here, my_query
then will have a value such as { name : "Joe"
}
. If my_query
contained special characters, for example
,
, :
, and {
, the query simply wouldn’t match any
documents. For example, users cannot hijack a query and convert it to
a delete.
JavaScript¶
Note
You can disable all server-side execution of JavaScript, by passing the
--noscripting
option on the command
line or setting security.javascriptEnabled
in a
configuration file.
All of the following MongoDB operations permit you to run arbitrary JavaScript expressions directly on the server:
You must exercise care in these cases to prevent users from submitting malicious JavaScript.
Fortunately, you can express most queries in MongoDB without
JavaScript and for queries that require JavaScript, you can mix
JavaScript and non-JavaScript in a single query. Place all the
user-supplied fields directly in a BSON field and pass
JavaScript code to the $where
field.
If you need to pass user-supplied values in a $where
clause,
you may escape these values with the CodeWScope
mechanism. When you
set user-submitted values as variables in the scope document, you can
avoid evaluating them on the database server.
Driver-Specific Issues¶
See the “PHP MongoDB Driver Security Notes” page in the PHP driver documentation for more information
How does MongoDB provide concurrency?¶
MongoDB implements a readers-writer lock. This means that at any one time, only one client may be writing or any number of clients may be reading, but that reading and writing cannot occur simultaneously.
In standalone and replica sets the lock’s scope
applies to a single mongod
instance or primary
instance. In a sharded cluster, locks apply to each individual shard,
not to the whole cluster.
For more information, see FAQ: Concurrency.
What is the compare order for BSON types?¶
MongoDB permits documents within a single collection to have fields with different BSON types. For instance, the following documents may exist within a single collection.
When comparing values of different BSON types, MongoDB uses the following comparison order, from lowest to highest:
- MinKey (internal type)
- Null
- Numbers (ints, longs, doubles)
- Symbol, String
- Object
- Array
- BinData
- ObjectId
- Boolean
- Date
- Timestamp
- Regular Expression
- MaxKey (internal type)
MongoDB treats some types as equivalent for comparison purposes. For instance, numeric types undergo conversion before comparison.
Changed in version 3.0.0: Date objects sort before Timestamp objects. Previously Date and Timestamp objects sorted together.
The comparison treats a non-existent field as it would an empty BSON
Object. As such, a sort on the a
field in documents { }
and {
a: null }
would treat the documents as equivalent in sort order.
With arrays, a less-than comparison or an ascending sort compares the
smallest element of arrays, and a greater-than comparison or a
descending sort compares the largest element of the arrays. As such,
when comparing a field whose value is a single-element array (e.g. [
1 ]
) with non-array fields (e.g. 2
), the comparison is between
1
and 2
. A comparison of an empty array (e.g. [ ]
) treats
the empty array as less than null
or a missing field.
MongoDB sorts BinData
in the following order:
- First, the length or size of the data.
- Then, by the BSON one-byte subtype.
- Finally, by the data, performing a byte-by-byte comparison.
Consider the following mongo
example:
The $type
operator provides access to BSON type comparison in the MongoDB query syntax. See the
documentation on BSON types and the $type
operator
for additional information.
Warning
Data models that associate a field name with different data types within a collection are strongly discouraged.
Without internal consistency complicates application code, and can lead to unnecessary complexity for application developers.
When multiplying values of mixed types, what type conversion rules apply?¶
The $mul
multiplies the numeric value of a field by a
number. For multiplication with values of mixed numeric types (32-bit
integer, 64-bit integer, float), the following type conversion rules
apply:
32-bit Integer | 64-bit Integer | Float | |
---|---|---|---|
32-bit Integer | 32-bit or 64-bit Integer | 64-bit Integer | Float |
64-bit Integer | 64-bit Integer | 64-bit Integer | Float |
Float | Float | Float | Float |
Note
- If the product of two 32-bit integers exceeds the maximum value for a 32-bit integer, the result is a 64-bit integer.
- Integer operations of any type that exceed the maximum value for a 64-bit integer produce an error.
How do I query for fields that have null values?¶
Different query operators treat null
values differently.
Consider the collection test
with the following documents:
Comparison with Null¶
The { cancelDate : null }
query matches documents that either
contain the cancelDate
field whose value is null
or that
do not contain the cancelDate
field. If the queried index is
sparse, however, then the query will only match
null
values, not missing fields.
Changed in version 2.6: If using the sparse index results in an incomplete result, MongoDB will not
use the index unless a hint()
explicitly specifies the
index. See Sparse Indexes for more information.
Given the following query:
The query returns both documents:
Type Check¶
The { cancelDate : { $type: 10 } }
query matches documents that
contains the cancelDate
field whose value is null
only;
i.e. the value of the cancelDate
field is of BSON Type Null
(i.e. 10
) :
The query returns only the document that contains the null
value:
How do I isolate cursors from intervening write operations?¶
With the MMAPv1 storage engine, MongoDB cursors
can return the same document more than once in some situations.
[1] You can use the
snapshot()
method on a cursor to isolate the
operation for a very specific case.
snapshot()
traverses the index on the _id
field
and guarantees that the query will return each document (with respect to
the value of the _id
field) no more than once. [2]
The snapshot()
does not guarantee that the data
returned by the query will reflect a single moment in time nor does it
provide isolation from insert or delete operations.
Warning
- You cannot use
snapshot()
with sharded collections. - You cannot use
snapshot()
withsort()
orhint()
cursor methods.
As an alternative, if your collection has a field or fields that are
never modified, you can use a unique index on this field or these
fields to achieve a similar result as the snapshot()
.
Query with hint()
to explicitly force the query to use
that index.
[1] | As a cursor returns documents other operations may interleave with the query: with MMAPv1 storage engine, if some of these operations are updates that cause the document to move (in the case of a table scan, caused by document growth) or that change the indexed field on the index used by the query; then the cursor will return the same document more than once. |
[2] | MongoDB does not permit changes to the value of the
_id field; it is not possible for a cursor that transverses
this index to pass the same document more than once. |
When should I embed documents within other documents?¶
When modeling data in MongoDB, embedding is frequently the choice for:
- “contains” relationships between entities.
- one-to-many relationships when the “many” objects always appear with or are viewed in the context of their parents.
You should also consider embedding for performance reasons if you have a collection with a large number of small documents. Nevertheless, if small, separate documents represent the natural model for the data, then you should maintain that model.
If, however, you can group these small documents by some logical relationship and you frequently retrieve the documents by this grouping, you might consider “rolling-up” the small documents into larger documents that contain an array of embedded documents. Keep in mind that if you often only need to retrieve a subset of the documents within the group, then “rolling-up” the documents may not provide better performance.
“Rolling up” these small documents into logical groupings means that queries to retrieve a group of documents involve sequential reads and fewer random disk accesses.
Additionally, “rolling up” documents and moving common fields to the larger document benefit the index on these fields. There would be fewer copies of the common fields and there would be fewer associated key entries in the corresponding index. See Index Concepts for more information on indexes.
Where can I learn more about data modeling in MongoDB?¶
Begin by reading the documents in the Data Models section. These documents contain a high level introduction to data modeling considerations in addition to practical examples of data models targeted at particular issues.
Additionally, consider the following external resources that provide additional examples:
- Schema Design by Example
- Dynamic Schema Blog Post
- MongoDB Data Modeling and Rails
- Ruby Example of Materialized Paths
- Sean Cribs Blog Post which was the source for much of the Model Tree Structures in MongoDB content.
Can I manually pad documents to prevent moves during updates?¶
Changed in version 3.0.0.
With the MMAPv1 storage engine, an update can cause a document to move on disk if the document grows in size. To minimize document movements, MongoDB uses padding.
You should not have to pad manually because by default, MongoDB uses Power of 2 Sized Allocations to add padding automatically. The Power of 2 Sized Allocations ensures that MongoDB allocates document space in sizes that are powers of 2, which helps ensure that MongoDB can efficiently reuse free space created by document deletion or relocation as well as reduce the occurrences of reallocations in many cases.
However, if you must pad a document manually, you can add a
temporary field to the document and then $unset
the field,
as in the following example.
Warning
Do not manually pad documents in a capped collection. Applying manual padding to a document in a capped collection can break replication. Also, the padding is not preserved if you re-sync the MongoDB instance.
See also