Docs Menu

Configuring Data Lake

On this page

  • Overview
  • Retrieve Data Lake Configuration
  • Set or Update Data Lake Configuration
  • Validate Data Lake Configuration
  • Generate Data Lake Configuration
  • Generate Wildcard Collections

You can configure Atlas Data Lake using the Data Lake Configuration. The configuration defines mappings between your data stores and Data Lake. To learn more about the configuration including the configuration fields and format, see Data Lake Configuration.

You can retrieve and update the Data Lake configuration by connecting a mongo shell to the Data Lake. You can also update your Data Lake from the Atlas UI. See Set or Update Data Lake Configuration for more information.

Note

Any MongoDB user in the Atlas project with the atlasAdmin role can retrieve and update the Data Lake configuration.

Once connected to the Data Lake, you can use the following database commands to retrieve the Data Lake configuration:

use admin
db.runCommand( { "storageGetConfig" : 1 } )

The command returns the current Data Lake configuration. For complete documentation on the configuration fields and format, see Configuration Format.

Once connected to the Data Lake, you can use the following database commands to set or update the Data Lake configuration:

use admin
db.runCommand( { "storageSetConfig" : <config> } )

Replace <config> with the Data Lake configuration. For complete documentation on the configuration fields and format, see Configuration Format. You can validate your configuration before setting or updating the Data Lake configuration by running the storageValidateConfig command.

To set or update the storage configuration through the Atlas UI:

  1. From the Atlas UI, select Data Lake from the left-hand navigation.
  2. Click Configuration for the Data Lake that you want to update.

    Image highlighting the Configuration button.
  3. Make any necessary changes to the storage configuration.
  4. Click Save for the changes to take effect.

    Important

    If you make changes to your Data Lake configuration, the changes can take up to 30 seconds to take effect. Examples of changes include adding or removing new users, or adding or removing IP addresses from the access list. This delay can impact your ability to connect. If you are already connected, you might have to disconnect and reconnect to use the most recent storage configuration.

You can run the following command to validate your Data Lake configuration.

use admin
db.runCommand( { "storageValidateConfig" : <config> } )

Replace <config> with the Data Lake configuration. For complete documentation on the configuration fields and format, see Configuration Format.

The command returns the following if your Data Lake configuration is valid:

{ "ok" : 1 }

The command returns the list of errors in the errs field if your Data Lake storage configuration is invalid:

{
"ok" : 1,
"errs" : [
"<error>",
"<error>",
...
]
}

You can run the storageGenerateConfig command to regenerate a Data Lake configuration. The command returns an automatically generated configuration, which you can then modify and upload. In the automatically generated configuration, Data Lake regenerates a database for each store:

As a result, the databases array in the generated configuration might be different from the databases array in your existing configuration.

Note

You must have the storageSetConfig privilege to run the storageGenerateConfig command. The atlasAdmin role has the storageSetConfig privilege by default.

To generate a Data Lake configuration, connect to the Data Lake and run the following database commands:

use admin
db.runCommand( { "storageGenerateConfig" : 1 } )

For complete documentation on the configuration fields and format, see Configuration Format.

You can dynamically generate collection names that map to data in your S3 bucket or Atlas cluster. To dynamically generate collection names, specify the wildcard, *, as the value for the collection name setting in your Data Lake storage configuration. You can't dynamically generate collection names in your Data Lake storage configuration that map to data in your HTTP or HTTPS data store.

You can use the storageSetConfig command to configure the settings for generating wildcard (*) collections.

To learn more about the configuration settings for generating wildcard collections, click on the tab for your data store:

To learn more about the configuration settings, see Data Lake Configuration.

Give Feedback
© 2021 MongoDB, Inc.

About

  • Careers
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2021 MongoDB, Inc.