GridFS¶

On this page

Implement GridFS
GridFS Collections
GridFS Index

GridFS is a specification for storing and retrieving files that exceed the BSON-document size limit of 16MB.

Instead of storing a file in a single document, GridFS divides a file into parts, or chunks [1], and stores each chunk as a separate document. By default, GridFS uses a chunk size of 255 KB; that is, GridFS divides a file into chunks of 255 KB with the exception of the last chunk. The last chunk is only as large as necessary. Similarly, files that are no larger than the chunk size only have a final chunk, using only as much space as needed plus some additional metadata.

GridFS uses two collections to store files. One collection stores the file chunks, and the other stores file metadata. For more information, refer to GridFS Collections.

When you query a GridFS store for a file, the driver or client will reassemble the chunks as needed. You can perform range queries on files stored through GridFS. You also can access information from arbitrary sections of files, which allows you to “skip” into the middle of a video or audio file.

GridFS is useful not only for storing files that exceed 16MB but also for storing any files for which you want access without having to load the entire file into memory. For more information on the indications of GridFS, see When should I use GridFS?.

[1]	The use of the term chunks in the context of GridFS is not related to the use of the term chunks in the context of sharding.

Changed in version 2.4.10: The default chunk size changed from 256k to 255k.

Implement GridFS¶

To store and retrieve files using GridFS, use either of the following:

A MongoDB driver. See the drivers documentation for information on using GridFS with your driver.
The mongofiles command-line tool in the mongo shell. See the mongofiles reference for complete documentation.

GridFS Collections¶

GridFS stores files in two collections:

chunks stores the binary chunks. For details, see The chunks Collection.
files stores the file’s metadata. For details, see The files Collection.

GridFS places the collections in a common bucket by prefixing each with the bucket name. By default, GridFS uses two collections with names prefixed by fs bucket:

fs.files
fs.chunks

You can choose a different bucket name than fs, and create multiple buckets in a single database.

Each document in the chunks collection represents a distinct chunk of a file as represented in the GridFS store. Each chunk is identified by its unique ObjectId stored in its _id field.

For descriptions of all fields in the chunks and files collections, see GridFS Reference.

GridFS Index¶

GridFS uses a unique, compound index on the chunks collection for the files_id and n fields. The files_id field contains the _id of the chunk’s “parent” document. The n field contains the sequence number of the chunk. GridFS numbers all chunks, starting with 0. For descriptions of the documents and fields in the chunks collection, see GridFS Reference.

The GridFS index allows efficient retrieval of chunks using the files_id and n values, as shown in the following example:

copy

cursor = db.fs.chunks.find({files_id: myFileID}).sort({n:1});

See the relevant driver documentation for the specific behavior of your GridFS application. If your driver does not create this index, issue the following operation using the mongo shell:

copy

db.fs.chunks.createIndex( { files_id: 1, n: 1 }, { unique: true } );

← Operational Factors and Data Models Data Model Examples and Patterns →