- Storage >
On this page
To provide durability in the event of a failure, MongoDB uses write ahead logging to on-disk journal files.
Journaling and the WiredTiger Storage Engine¶
The log mentioned in this section refers to the WiredTiger write-ahead log (i.e. the journal) and not the MongoDB log file.
WiredTiger uses checkpoints to provide a consistent view of data on disk and allow MongoDB to recover from the last checkpoint. However, if MongoDB exits unexpectedly in between checkpoints, journaling is required to recover information that occurred after the last checkpoint.
With journaling, the recovery process:
- Looks in the data files to find the identifier of the last checkpoint.
- Searches in the journal files for the record that matches the identifier of the last checkpoint.
- Apply the operations in the journal files since the last checkpoint.
Changed in version 3.2.
With journaling, WiredTiger creates one journal record for each client initiated write operation. The journal record includes any internal write operations caused by the initial write. For example, an update to a document in a collection may result in modifications to the indexes; WiredTiger creates a single journal record that includes both the update operation and its associated index modifications.
MongoDB configures WiredTiger to use in-memory buffering for storing the journal records. Threads coordinate to allocate and copy into their portion of the buffer. All journal records up to 128 kB are buffered.
WiredTiger syncs the buffered journal records to disk according to the following intervals or conditions:
New in version 3.2: Every 50 milliseconds.
MongoDB sets checkpoints to occur in WiredTiger on user data at an interval of 60 seconds or when 2 GB of journal data has been written, whichever occurs first.
If the write operation includes a write concern of
j: true, WiredTiger forces a sync of the WiredTiger journal files.
Because MongoDB uses a journal file size limit of 100 MB, WiredTiger creates a new journal file approximately every 100 MB of data. When WiredTiger creates a new journal file, WiredTiger syncs the previous journal file.
In between write operations, while the journal records
remain in the WiredTiger buffers, updates can be lost following a
hard shutdown of
serverStatus command returns information on the
WiredTiger journal statistics in the
For the journal files, MongoDB creates a subdirectory named
dbPath directory. WiredTiger journal
files have names with the following format
<sequence> is a zero-padded number starting from
Journal files contain a record per each write operation. Each record has a unique identifier.
MongoDB configures WiredTiger to use snappy compression for the journaling data.
Minimum log record size for WiredTiger is 128 bytes. If a log record is 128 bytes or smaller, WiredTiger does not compress that record.
WiredTiger journal files for MongoDB have a maximum size limit of approximately 100 MB. Once the file exceeds that limit, WiredTiger creates a new journal file.
WiredTiger automatically removes old journal files to maintain only the files needed to recover from last checkpoint.
WiredTiger will pre-allocate journal files.
Journaling and the MMAPv1 Storage Engine¶
With MMAPv1, when a write operation occurs, MongoDB updates the in-memory view. With journaling enabled, MongoDB writes the in-memory changes first to on-disk journal files. If MongoDB should terminate or encounter an error before committing the changes to the data files, MongoDB can use the journal files to apply the write operation to the data files and maintain a consistent state.
With journaling, MongoDB’s storage layer has two internal views of the data set: the private view, used to write to the journal files, and the shared view, used to write to the data files:
- MongoDB first applies write operations to the private view.
- MongoDB then applies the changes in the private view to the on-disk
journal files in the
journaldirectory roughly every 100 milliseconds. MongoDB records the write operations to the on-disk journal files in batches called group commits. Grouping the commits help minimize the performance impact of journaling since these commits must block all writers during the commit. Writes to the journal are atomic, ensuring the consistency of the on-disk journal files. For information on the frequency of the commit interval, see
- Upon a journal commit, MongoDB applies the changes from the journal to the shared view.
- Finally, MongoDB applies the changes in the shared view to the data
files. More precisely, at default intervals of 60 seconds, MongoDB
asks the operating system to flush the shared view to the data
files. The operating system may choose to flush the shared view to
disk at a higher frequency than 60 seconds, particularly if the
system is low on free memory. To change the interval for writing to
the data files, use the
mongod instance were to crash without having applied
the writes to the data files, the journal could replay the writes to
the shared view for eventual write to the data files.
When MongoDB flushes write operations to the data files, MongoDB notes which journal writes have been flushed. Once a journal file contains only flushed writes, it is no longer needed for recovery and MongoDB can recycle it for a new journal file.
Once the journal operations have been applied to the shared view and flushed to disk (i.e. pages in the shared view and private view are in sync), MongoDB asks the operating system to remap the shared view to the private view in order to save physical RAM. MongoDB routinely asks the operating system to remap the shared view to the private view in order to save physical RAM. Upon a new remapping, the operating system knows that physical memory pages can be shared between the shared view and the private view mappings.
The interaction between the shared view and the on-disk data files is similar to how MongoDB works without journaling. Without journaling, MongoDB asks the operating system to flush in-memory changes to the data files every 60 seconds.
With journaling enabled, MongoDB creates a subdirectory named
journal under the
dbPath directory. The
journal directory contains journal files named
<sequence> is an integer starting from
0 and a “last
sequence number” file
Journal files contain the write ahead logs; each journal entry
describes the bytes the write operation changed in the data files.
Journal files are append-only files. When a journal file holds 1
gigabyte of data, MongoDB creates a new journal file. If you use the
storage.smallFiles option when starting
you limit the size of each journal file to 128 megabytes.
lsn file contains the last time MongoDB flushed the changes to
the data files.
Once MongoDB applies all the write operations in a particular journal file to the data files, MongoDB can recycle it for a new journal file.
Unless you write many bytes of data per second, the
directory should contain only two or three journal files.
A clean shutdown removes all the files in the journal directory. A dirty shutdown (crash) leaves files in the journal directory; these are used to automatically recover the database to a consistent state when the mongod process is restarted.
To speed the frequent sequential writes that occur to the current journal file, you can ensure that the journal directory is on a different filesystem from the database data files.
If you place the journal on a different filesystem from your data
files, you cannot use a filesystem snapshot alone to capture valid
backups of a
dbPath directory. In this case, use
fsyncLock() to ensure that database files are consistent
before the snapshot and
fsyncUnlock() once the snapshot
MongoDB may preallocate journal files if the
determines that it is more efficient to preallocate journal files than
create new journal files as needed.
Depending on your filesystem, you might experience a preallocation lag
the first time you start a
mongod instance with journaling
enabled. The amount of time required to pre-allocate files might last
several minutes; during this time, you will not be able to connect to
the database. This is a one-time preallocation and does not occur with
To avoid preallocation lag, see Avoid Preallocation Lag for MMAPv1.
Journaling and the In-Memory Storage Engine¶
Starting in MongoDB Enterprise version 3.2.6, the In-Memory
Storage Engine is part of general availability (GA).
Because its data is kept in memory, there is no separate journal. Write
operations with a write concern of
j: true are
If any voting member of a replica set runs without journaling (i.e. either runs an in-memory storage
engine or runs with journaling disabled), you
writeConcernMajorityJournalDefault set to
MongoDB will not wait for
w: "majority" writes to be durable
before acknowledging the writes. As such,
write operations could possibly
roll back in the event of a loss of a replica set member.