Navigation

$out

$out takes documents returned by the aggregation pipeline and writes them to a specified collection. The $out operator must be the last stage in the aggregation pipeline. In Atlas Data Lake, $out can be used to write to S3 buckets with read and write permissions or to an Atlas cluster namespace.

Permissions Required

You must have:

You must be a database user with one of the following roles:

Syntax

1
2
3
4
5
6
7
8
9
10
11
12
13
{ "$out": { "s3": { "bucket": "<bucket-name>", "region": "<aws-region>", "filename": "<file-name>", "format": { "name": "json|json.gz|bson|bson.gz", "maxFileSize": "<file-size>" } } } }
1
2
3
4
5
6
7
8
9
10
{ "$out": { "atlas": { "projectId": "<atlas-project-ID>", "clusterName": "<atlas-cluster-name>", "db": "<atlas-database-name>", "coll": "<atlas-collection-name>" } } }

Fields

FieldTypeDescriptionNecessity
s3objectLocation to write the documents from the aggregation pipeline.Required
s3.bucketstring

Name of the S3 bucket to write the documents from the aggregation pipeline to.

important

The generated call to S3 inserts a / between s3.bucket and s3.filename. Don't append a / after the s3.bucket.

example

If you set s3.bucket to myBucket and s3.filename to myPath/myData, Atlas Data Lake writes this out as s3://myBucket/myPath/myFile

Required
s3.regionstringName of the AWS region in which the bucket is hosted.Required
s3.filenamestring

Name of the file to write the documents from the aggregation pipeline to. Filename can be constant or created dynamically from the fields in the documents that reach the $out stage. Any filename expression you provide must evaluate to a string data type. If there are any files on S3 with the same name and path as the newly generated files, $out overwrites the existing files with the newly generated files.

important

The generated call to S3 inserts a / between s3.bucket and s3.filename. Don't prepend a / before the s3.filename.

example

If you set s3.bucket to myBucket and s3.filename to myPath/myData, Atlas Data Lake writes this out as s3://myBucket/myPath/myFile

Required
s3.formatobjectDetails of the file in S3 .Required
s3
.format
.name
enum

Format of the file in S3 . Value can be one of the following:

  • json
  • json.gz
  • bson
  • bson.gz
Required
s3
.format
.maxFileSize
bytes

Maximum size of the file in S3 . When the file size limit for the current file is reached, a new file is created in S3 . The first file appends a 1 before the filename extension. For each subsequent file, the Atlas Data Lake increments the appended number by one.

example

<filename>.1.<fileformat>

<filename>.2.<fileformat>

If a document is larger than the maxFileSize, Data Lake writes the document to its own file. The following suffixes are supported:

Base 10: scaling in multiples of 1000
  • B
  • KB
  • MB
  • GB
  • TB
  • PB
Base 2: scaling in multiples of 1024
  • KiB
  • MiB
  • GiB
  • TiB
  • PiB

If omitted, defaults to 200MiB.

Optional
FieldTypeDescriptionNecessity
atlasobjectLocation to write the documents from the aggregation pipeline.Required
clusterNamestringName of the Atlas cluster.Required
collstringName of the collection on the Atlas cluster.Required
dbstringName of the database on the Atlas cluster that contains the collection.Required
projectIdstringUnique identifier of the project that contains the Atlas cluster. The project ID must be the ID of the project that contains your Data Lake. If omitted, defaults to the ID of the project that contains your Data Lake.Optional

Examples

The following examples show $out syntaxes for dynamically creating a filename from a constant string or from the fields of the same or different data types in the documents that reach the $out stage.

Simple String Example

example

You want to write 1 GiB of data as compressed BSON files to an S3 bucket named my-s3-bucket.

Using the following $out syntax:

1
2
3
4
5
6
7
8
9
10
11
12
{ "$out": { "s3": { "bucket": "my-s3-bucket", "region": "us-east-1", "filename": "big_box_store/", "format": { "name": "bson.gz" } } } }

$out writes five compressed BSON files:

  1. The first 200 MiB of data to a file that $out names big_box_store/1.bson.gz.

    note

    • The value of s3.filename serves as a constant in each filename. This value doesn't depend upon any document field or value.
    • Your s3.filename ends with a delimiter, so Atlas Data Lake appends the counter after the constant.
    • If it didn't end with a delimiter, Atlas Data Lake would have added a . between the constant and the counter, like big_box_store.1.bson.gz
    • As you didn't change the maximum file size using s3.format.maxFileSize, Atlas Data Lake uses the default value of 200 MiB.
  2. The second 200 MiB of data to a new file that $out names big_box_store/2.bson.gz.
  3. Three more files that $out names big_box_store/3.bson.gz through big_box_store/5.bson.gz.

Single Field from Documents

example

You want to write 90 MiB of data to JSON files to an S3 bucket named my-s3-bucket.

Using the following $out syntax:

1
2
3
4
5
6
7
8
9
10
11
12
13
{ "$out": { "s3": { "bucket": "my-s3-bucket", "region": "us-east-1", "filename": {"$toString": "$sale-date"}, "format": { "name": "json", "maxFileSize": "100MiB" } } } }

$out writes 90 MiB of data to JSON files in the root of the bucket. Each JSON file contains all of the documents with the same sale-date value. $out names each file using the documents' sale-date value converted to a string.

Multiple Fields from Documents

example

You want to write 176 MiB of data as BSON files to an S3 bucket named my-s3-bucket.

Using the following $out syntax:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
{ "$out": { "s3": { "bucket": "my-s3-bucket", "region": "us-east-1", "filename": { "$concat": [ "persons/", "$name", "/", "$unique-id", "/" ] }, "format": { "name": "bson", "maxFileSize": "200MiB" } } } }

$out writes 176 MiB of data to BSON files. To name each file, $out concatenates:

  • A constant string persons/ and, from the documents:

    • The string value of the name field,
    • A forward slash (/),
    • The string value of the unique-id field, and
    • A forward slash (/).

Each BSON file contains all of the documents with the same name and unique-id values. $out names each file using the documents' name and unique-id values.

Multiple Types of Fields from Documents

example

You want to write 154 MiB of data as compressed JSON files to an S3 bucket named my-s3-bucket.

Consider the following $out syntax:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{ "$out": { "s3": { "bucket": "my-s3-bucket", "region": "us-east-1", "filename": { "$concat": [ "big-box-store/", { "$toString": "$store-number" }, "/", { "$toString": "$sale-date" }, "/", "$part-id", "/" ] }, "format": { "name": "json.gz", "maxFileSize": "200MiB" } } } }

$out writes 154 MiB of data to compressed JSON files, where each file contains all documents with the same store-number, sale-date, and part-id values. To name each file, $out concatenates:

  • A constant string value of big-box-store/,
  • A string value of a unique store number in the store-number field,
  • A forward slash (/),
  • A string value of the date from the sale-date field,
  • A forward slash (/),
  • A string value of part ID from the part-id field, and
  • A forward slash (/).

example

This $out syntax sends the aggregated data to a sampleDB.mySampleData collection in the Atlas cluster named myTestCluster. The syntax doesn't specify a project ID; $out uses the ID of the project that contains your Data Lake.

1
2
3
4
5
6
7
8
9
{ "$out": { "atlas": { "clusterName": "myTestCluster", "db": "sampleDB", "coll": "mySampleData" } } }

Limitations

Data Lake interprets empty strings ("") as null values when parsing filenames. If you want Data Lake to generate parseable filenames, wrap the field references that could have null values using $convert with an empty string onNull value.

example

This example shows how to handle null values in the year field when creating a filename from the field value.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{ "$out": { "s3": { "bucket": "my-s3-bucket", "region": "us-east-1", "filename": { "$concat": [ "big-box-store/", { "$convert": { "input": "$year", "to": "string", "onNull": "" } }, "/" ] }, "format": { "name": "json.gz", "maxFileSize": "200MiB" } } } }

Errors

  • If the filename is not of type string, Data Lake writes documents to a special error file in your bucket.
  • If the documents cannot be written to a file with the specified filename, Data Lake writes documents to ordinally-named files in the specified format and specified size.

    example

    • s3://<bucket-name>/atlas-data-lake-{<CORRELATION_ID>}/$out-error-docs/1.json
    • s3://<bucket-name>/atlas-data-lake-{<CORRELATION_ID>}/$out-error-docs/2.json

    Data Lake returns an error message that specifies the number of documents that had invalid filenames and the directory where these documents were written.

←  $lookup$sql →