Atlas Data Lake¶
On this page
About Atlas Data Lake¶
MongoDB Atlas Data Lake allows you to natively query, transform,
and move data across AWS S3 and MongoDB Atlas clusters. You can
query your richly structured data stored in JSON, BSON, CSV, TSV,
Avro, ORC, and Parquet formats using the
mongo shell, MongoDB
Compass, or any MongoDB driver.
You can use Atlas Data Lake to:
- Convert richly structured MongoDB data into columnar Parquet or CSV files.
- Query across multiple Atlas clusters to get a holistic view of your data.
- Materialize aggregations from MongoDB or S3 data.
- Automatically import data from your S3 bucket into an Atlas cluster.
Data Lake Access¶
When you create a Data Lake, you grant Atlas either read only or read and write access to S3 buckets in your AWS account. To access your Atlas clusters, Atlas uses your existing Role Based Access Controls. You can view and edit the generated data storage configuration that maps data from your S3 buckets and Atlas clusters to virtual databases and collections.
A database user must have one of the following roles to query an Atlas Data Lake:
Privilege actions define the operations that you can perform on your Data Lake. You can grant the following Atlas Data Lake privileges:
- When you create or modify custom roles from the Atlas User Interface
- In the
actions.actionrequest body parameter when you create or update a custom role from the Atlas API
Retrieve details about the queries that were run in the past 24 hours using $queryHistory.
Atlas Data Lake Regions¶
To prevent excessive charges on your bill, create your Atlas Data Lake in the same AWS region as your S3 data source.
Atlas Data Lake routes your Data Lake requests through one of the following regions:
Data Lake Regions
Northern Virginia, North America
Oregon, North America
You will incur charges when running Atlas Data Lake queries. For more information, see Billing below.
You incur Atlas Data Lake costs for the following:
- storage on the cloud object storage,
- data scanned by Data Lake, and
- data returned by Data Lake.
Total Data Processed¶
Atlas charges for the total number of bytes that Data Lake processes from your AWS S3 buckets, rounded up to the nearest megabyte. Atlas charges $5.00 per TB of processed data, with a minimum of 10 MB or $0.00005 per query.
You can use partitioning strategies and compression in AWS S3 to reduce the amount of data processed.
Total Data Returned¶
Atlas charges for the total number of bytes returned by Data Lake. This total is the sum of the following data transfers:
- The number of bytes transferred between Data Lake service nodes
- The number of bytes transferred from Data Lake to the client
Returned data is billed as outlined in the Data Transfer Fees section of the Atlas pricing page. The cost of data transfer depends on the Cloud Service Provider charges for same-region, region-to-region, or region-to-internet data transfer.