Navigation

Supported Aggregation Pipeline Stages and Operators

This page describes the MongoDB aggregation pipeline stages and operators that Atlas Data Lake supports.

Note

By default, Atlas Data Lake does not return documents in any specific order for queries on Data Lakes for S3 data stores. Atlas Data Lake reads the partitions concurrently and the underlying storage response order determines which documents Atlas Data Lake returns first, unless you define order using $sort in your query. For example, if you run the same findOne() query twice, you could see different documents, and if you use $skip, different documents might be skipped if $sort is not used in the query.

Atlas Data Lake supports all the aggregation pipeline stages except the following:

For the following stages in Atlas Data Lake queries, Atlas Data Lake introduces an alternate syntax, includes a caveat, or deviates from server. See the Description column for details.

Pipeline Stage
Description

Outputs documents in order of nearest to farthest from a specified point. Atlas Data Lake supports $geoNear in queries on Data Lake collections that are mapped to one or more Atlas collections. Atlas Data Lake doesn't support $geoNear for S3 or HTTP data stores.

See Querying Data in Your Atlas Cluster for more information.

Performs a recursive search on a collection. Atlas Data Lake supports $graphLookup in queries on Data Lake collections that are mapped to one Atlas collection only. Atlas Data Lake doesn't support $graphLookup for:

  • S3 or HTTP data stores.
  • Queries on Data Lake collections that are mapped to multiple Atlas collections.

See Querying Data in Your Atlas Cluster for more information.

Performs a left outer join to a collection in the same database. Atlas Data Lake provides syntax for joining collections from different databases also. See $lookup for more information.
Filters the documents to pass only the documents that match the specified condition(s) to the next pipeline stage. Atlas Data Lake supports $match. Note that the partition attributes for selecting specific files on S3 are only optimized for the following aggregation pipeline operators: $eq, $gt, $lt, $gte, $lte, $ne, $and, $or.
Takes the documents returned by the aggregation pipeline and writes them to a specified collection. Atlas Data Lake provides alternate syntax for writing to S3 and Atlas cluster. See $out for more information.
Randomly selects the specified number of documents from its input. Atlas Data Lake supports $sample, but does not provide a truly random sample and returns the first set of documents that it finds.
Skips over the specified number of documents that pass into the stage and passes the remaining documents to the next stage in the pipeline. Atlas Data Lake supports $skip, but this does not reduce data scan because Data Lake accesses all partitions that correspond to your query.

Atlas Data Lake supports all the aggregation pipeline operators. However, Atlas Data Lake supports all the geospatial query operators and the following evaluation query operators only in queries on collections that are mapped to an Atlas cluster data store.

Note

Atlas Data Lake doesn't include a server-side JavaScript engine. So, Atlas Data Lake doesn't support operators such as $where, $function, and $accumulator that require server-side scripting enabled.

Pipeline Stage
Description
Performs a text search on the content of the fields indexed with a text index.
Passes either a string containing a JavaScript expression or a full JavaScript function to the query system.
Give Feedback