- MongoDB CRUD Operations >
- MongoDB CRUD Concepts >
- Write Operations >
- Bulk Inserts in MongoDB
Bulk Inserts in MongoDB¶
On this page
In some situations you may need to insert or ingest a large amount of data into a MongoDB database. These bulk inserts have some special considerations that are different from other write operations.
Use the insert()
Method¶
The insert()
method, when passed an
array of documents, performs a bulk insert, and inserts each document
atomically. Bulk inserts can significantly increase performance by
amortizing write concern costs.
In the drivers, you can configure write concern for batches rather than on a per-document level.
Drivers have a ContinueOnError
option in their insert operation, so
that the bulk operation will continue to insert remaining documents in a
batch even if an insert fails.
Note
If multiple errors occur during a bulk insert, clients only receive the last error generated.
See also
Driver documentation for details on performing bulk inserts in your application. Also see Import and Export MongoDB Data.
Bulk Inserts on Sharded Clusters¶
While ContinueOnError
is optional on unsharded clusters, all bulk
operations to a sharded collection run with
ContinueOnError
, which cannot be disabled.
Large bulk insert operations, including initial data inserts or routine data import, can affect sharded cluster performance. For bulk inserts, consider the following strategies:
Pre-Split the Collection¶
If the sharded collection is empty, then the collection has only one initial chunk, which resides on a single shard. MongoDB must then take time to receive data, create splits, and distribute the split chunks to the available shards. To avoid this performance cost, you can pre-split the collection, as described in Split Chunks in a Sharded Cluster.
Insert to Multiple mongos
¶
To parallelize import processes, send insert operations to more than
one mongos
instance. Pre-split empty collections first as
described in Split Chunks in a Sharded Cluster.
Avoid Monotonic Throttling¶
If your shard key increases monotonically during an insert, then all inserted data goes to the last chunk in the collection, which will always end up on a single shard. Therefore, the insert capacity of the cluster will never exceed the insert capacity of that single shard.
If your insert volume is larger than what a single shard can process, and if you cannot avoid a monotonically increasing shard key, then consider the following modifications to your application:
- Reverse the binary bits of the shard key. This preserves the information and avoids correlating insertion order with increasing sequence of values.
- Swap the first and last 16-bit words to “shuffle” the inserts.
Example
The following example, in C++, swaps the leading and trailing 16-bit word of BSON ObjectIds generated so that they are no longer monotonically increasing.
See also
Shard Keys for information on choosing a sharded key. Also see Shard Key Internals (in particular, Choosing a Shard Key).