Navigation

Create One Data Lake

Note

Groups and projects are synonymous terms. Your {GROUP-ID} is the same as your project ID. For existing groups, your group/project ID remains the same. The resource and corresponding endpoints use the term groups.

The Atlas API uses HTTP Digest Authentication. Provide a programmatic API public key and corresponding private key as the username and password when constructing the HTTP request.

For complete documentation on configuring API access for an Atlas project, see Configure Atlas API Access.

https://cloud.mongodb.com/api/atlas/v1.0

Use this endpoint to create a specific Atlas Data Lake associated to an Atlas project. To create a Data Lake, specify a name for your Data Lake, the unique identifier of the role that Data Lake can use to access your AWS data store, and the S3 bucket where data is stored.

POST /groups/{GROUP-ID}/dataLakes
Path Element
Required/Optional
Description
GROUP-ID
Required.
The unique identifier for the project.

The following query parameters are optional:

Query Parameter
Type
Description
Default
pretty
boolean
Displays response in a prettyprint format.
false
envelope
boolean
Specifies whether or not to wrap the response in an envelope.
false
Field
Required/Optional
Description
name
Required
Name of the Atlas Data Lake.
cloudProviderConfig
Optional
Configuration information related to the cloud service where Atlas Data Lake source data is stored.
cloudProviderConfig.<provider>
Required

Name of the provider of the cloud service where Data Lake can access the S3 Bucket.

Atlas Data Lake supports only aws.

Required if specifying cloudProviderConfig.

cloudProviderConfig.aws.roleId
Required

Unique identifier of the role that Data Lake can use to access the data stores. If necessary, use the Atlas API to retrieve the role ID. You must also specify the testS3Bucket.

Required if specifying cloudProviderConfig.

cloudProviderConfig.aws. testS3Bucket
Required

Name of the S3 data bucket that the provided role ID is authorized to access. You must also specify the roleId.

Required if specifying cloudProviderConfig.

Name
Type
Description
cloudProviderConfig
object
Configuration information related to the cloud service where Atlas Data Lake source data is stored.
cloudProviderConfig.<provider>
object

Name of the provider of the cloud service where Data Lake can access the S3 Bucket data stores.

Data Lake only supports aws.

cloudProviderConfig.externalId
string
Unique identifier associated with the IAM Role that Data Lake assumes when accessing the data stores.
cloudProviderConfig.aws. iamAssumedRoleARN
string

Amazon Resource Name (ARN) of the IAM Role that Data Lake assumes when accessing S3 Bucket data stores.

The IAM Role must support the following actions against each S3 bucket:

  • s3:GetObject
  • s3:ListBucket
  • s3:GetObjectVersion

For more information on S3 actions, see Actions, Resources, and Condition Keys for Amazon S3.

cloudProviderConfig.aws. iamUserARN
string
Amazon Resource Name (ARN) of the user that Data Lake assumes when accessing S3 Bucket data stores.
cloudProviderConfig.aws.roleId
string
Unique identifier of the role that Data Lake uses to access the data stores.
dataProcessRegion
Optional

The cloud provider region to which Atlas Data Lake routes client connections for data processing.

If null, the Atlas Data Lake routes client connections to the region nearest to the client based on DNS resolution.

dataProcessRegion.cloudProvider
Required

Name of the cloud service provider.

Atlas Data Lake only supports AWS.

dataProcessRegion.region
Required

Name of the region to which Atlas Data Lake routes client connections for data processing.

Atlas Data Lake only supports the following regions:

  • SYDNEY_AUS (ap-southeast-2)
  • FRANKFURT_DEU (eu-central-1)
  • DUBLIN_IRL (eu-west-1)
  • LONDON_GBR (eu-west-2)
  • VIRGINIA_USA (us-east-1)
  • OREGON_USA (us-west-2)
groupId
string
The unique identifier for the project.
hostnames
array
The list of hostnames assigned to the Atlas Data Lake. Each string in the array is a hostname assigned to the Atlas Data Lake.
name
string
Name of the Atlas Data Lake.
state
string

Current state of the Atlas Data Lake:

  • ACTIVE - The Data Lake is active and verified. You can query the data stores associated with the Atlas Data Lake.
storage
object
Configuration details for each data store and its mapping to MongoDB database(s) and collection(s).
storage.databases
object

Configuration details for mapping each data store to queryable databases and collections. For complete documentation on this object and its nested fields, see databases.

An empty object indicates that the Data Lake has no mapping configuration for any data store.

storage.stores
array

Each object in the array represents a data store. Data Lake uses the storage.databases configuration details to map data in each data store to queryable databases and collections. For complete documentation on this object and its nested fields, see stores.

An empty object indicates that the Data Lake has no configured data stores.

Example
Request
curl -u "{PUBLIC-KEY}:{PRIVATE-KEY}" --digest \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
--request POST "https://cloud.mongodb.com/api/atlas/v1.0/groups/{GROUP-ID}/dataLakes?pretty=true" \
--data '{ \
"name" : "UserMetricData", \
"cloudProviderConfig" : {
"aws" : {
"roleId" : "1a234bcd5e67f89a12b345c6",
"testS3Bucket" : "user-metric-data-bucket"
}
}
}'

The preceding request returns the following:

Example
Response
{
"cloudProviderConfig": {
"aws": {
"externalId" : "12a3bc45-de6f-7890-12gh-3i45jklm6n7o",
"iamAssumedRoleARN": "arn:aws:iam::123456789012:role/ReadS3BucketRole",
"iamUserARN": "arn:aws:iam::1234567890123:root",
"roleId": "1a234bcd5e67f89a12b345c6"
}
},
"dataProcessRegion": null,
"groupId": "1ab23c4567def890gh12ij34",
"hostnames": [
"hardwaremetricdata.mongodb.example.net"
],
"name": "UserMetricData",
"state": "ACTIVE",
"storage": {
"databases": [],
"stores": []
}
}
Give Feedback