Navigation

    Configuring Data Lake

    Overview

    You can configure Atlas Data Lake using the Data Lake Configuration File. The configuration file defines mappings between your data stores and Data Lake. To learn more about the configuration file including the configuration fields and file format, see Data Lake Configuration File.

    You can retrieve and update the Data Lake configuration by connecting a mongo shell to the Data Lake:

    1. From the Atlas UI, select Data Lake from the left-hand navigation.
    2. Click Connect for the Data Lake to which you want to connect.
    3. Click Connect with the Mongo Shell.
    4. Follow the instructions in the Connect modal. If you already have the mongo shell installed, ensure you are running at least the latest stable release of the 3.6 shell.

    note

    Any MongoDB user in the Atlas project with the atlasAdmin role can retrieve and update the Data Lake configuration.

    Retrieve Data Lake Configuration

    Once connected to the Data Lake, you can use the following database commands to retrieve the Data Lake configuration:

    use admin
    db.runCommand( { "storageGetConfig" : 1 } )

    The command returns the current Data Lake configuration. For complete documentation on the configuration fields and file format, see Configuration File Format.

    Set or Update Data Lake Configuration

    Once connected to the Data Lake, you can use the following database commands to set or update the Data Lake configuration:

    use admin
    db.runCommand( { "storageSetConfig" : <config> } )

    Replace <config> with the Data Lake configuration file. For complete documentation on the configuration fields and file format, see Configuration File Format. You can validate your configuration before setting or updating the Data Lake configuration by running the storageValidateConfig command.

    To set or update the storage configuration through the Atlas UI:

    1. Click Configuration for your Data Lake to view the Data Lake storage configuration.

      Image highlighting the Configuration button.
    2. Make changes to your storage configuration and click Save.

    Validate Data Lake Configuration

    You can run the following command to validate your Data Lake configuration.

    use admin
    db.runCommand( { "storageValidateConfig" : <config> } )

    Replace <config> with the Data Lake configuration file. For complete documentation on the configuration fields and file format, see Configuration File Format.

    The command returns the following if your Data Lake configuration is valid:

    { "ok" : 1 }

    The command returns the list of errors in the errs field if your Data Lake storage configuration is invalid:

    {
           "ok" : 1,
           "errs" : [
                   "<error>",
                   "<error>",
                   ...
           ]
    }

    Generate Data Lake Configuration

    You can run the storageGenerateConfig command to regenerate a Data Lake configuration. The command returns an automatically generated configuration, which you can then modify and upload. In the automatically generated configuration, Data Lake regenerates a database for each store:

    As a result, the databases array in the generated configuration might be different from the databases array in your existing configuration.

    note

    You must have the storageSetConfig privilege to run the storageGenerateConfig command. The atlasAdmin role has the storageSetConfig privilege by default.

    To generate a Data Lake configuration, connect to the Data Lake and run the following database commands:

    use admin
    db.runCommand( { "storageGenerateConfig" : 1 } )

    For complete documentation on the configuration fields and file format, see Configuration File Format.