Atlas Data Lake¶
The Atlas Data Lake is available as a Beta feature. The product and the corresponding documentation may change at any time during the Beta stage. For support, see Atlas Support.
About Atlas Data Lake¶
When you create a Data Lake, you can grant Atlas either read only or read and write access to S3 buckets in your AWS account and create a data storage configuration file that maps data from your S3 buckets to your MongoDB databases and collections.
Atlas supports using any M10+ cluster, including Global Clusters, to connect to Data Lakes in the same project.
If you update your custom AWS role ARN , you must update the AWS trust policy associated with the role. See the Configure a New Data Lake modal for instructions.
A database user must have one of the following roles to query an Atlas Data Lake:
To view, create, or modify any existing Data Lakes in an Atlas project, click Data Lake on the left hand navigation.
Verify that you meet the following prerequisites before you create a Data Lake:
- One or more AWS S3 buckets in the same AWS account.
- An AWS CLI configured to access your AWS account. Alternatively, you must have access to the AWS Management Console with permission to create IAM roles.
Atlas Data Lake incurs costs for the amount of data scanned and returned by the service.
Total Data Scanned¶
Atlas charges for the total number of bytes that Data Lake scans from your AWS S3 buckets, rounded up to the nearest megabyte. Atlas charges $5.00 per TB of scanned data, with a minimum of 10 MB or $0.00005 per query.
You can use partitioning strategies and compression in AWS S3 to reduce the amount of data scanned.
Total Data Returned¶
Atlas charges for the total number of bytes returned by Data Lake. This total is the sum of the following data transfers:
- The number of bytes transferred between Data Lake service nodes
- The number of bytes transferred from Data Lake to the client
Returned data is billed as outlined in the Data Transfer Fees section of the Atlas pricing page. The cost of data transfer depends on the Cloud Service Provider charges for same-region, region-to-region, or region-to-internet data transfer.