Navigation

create

The create command creates a collection for existing stores or a view on a collection in the Atlas Data Lake storage configuration.

The wildcard "*" can be used with the create command in two ways:

  • As the name of the collection to dynamically create collections that maps to files and folders in the specified file path on the stores data store.
  • In the path parameter to create a collection that maps to multiple files and folders in the specified file path on the stores data store.

Click on the tab to learn more about creating a collection or a view.

This tab contains the syntax and parameters for creating a collection. Choose the tab for your data store to learn more about the syntax and parameters for that data store.

This tab contains the syntax and parameters for creating a collection for S3 data store.

db.runCommand({ "create" : "<collection-name>|*", "dataSources" : [{ "storeName" : "<store-name>", "path" : "<path-to-files-or-folders>", "defaultFormat" : "<file-extension>" }]})
Parameter
Type
Description
Required?

<collection-name>|*

string

Either the name of the collection to which Data Lake maps the data contained in the data store or the wildcard "*" to dynamically create collections.

You can generate collection names dynamically from file paths by specifying * for the collection name and the collectionName() function in the dataSources.collection field. By default, Atlas Data Lake creates up to 100 wildcard collections. You can customize the maximum number of wildcard collections that Atlas Data Lake automatically generates using the databases.[n].maxWildcardCollections parameter. Note that each wildcard collection can contain only one dataSource.

yes
dataSources
object
Array of objects where each object represents a data store in the stores array to map with the collection.
yes
dataSources.storeName
string
The name of a data store to map to the collection. The value must match the name in the stores array.
yes
dataSources.path
string
The path to the files and folders. Specify / to capture all files and folders from the prefix path. See Path Syntax for more information.
yes
dataSources.defaultFormat
string

The format that Data Lake defaults to if it encounters a file without an extension while querying the data store. The following values are valid:

.json, .json.gz, .bson, .bson.gz, .avro, .avro.gz, .orc, .tsv, .tsv.gz, .csv, .csv.gz, .parquet

If omitted, Data Lake attempts to detect the file type by processing a few bytes of the file.

no

The command returns the following output if it succeeds. You can verify the results by running the commands in Verify Collection. If it fails, see Troubleshoot Errors below for recommended solutions.

{ ok: 1 }

The following examples use the sample dataset, airbnb, on AWS S3 store with the following settings:

Store Name
egS3Store
Region
us-east-2
Bucket
test-data-lake
Prefix
json
Delimiter
/
Sample Dataset
airbnb

Basic Example

The following command creates a collection named airbnb in the sampleDB database in the storage configuration.

The airbnb collection maps to the airbnb sample dataset in the json folder in the S3 store named egS3Store.

use sampleDB
db.runCommand({ "create" : "airbnb", "dataSources" : [{ "storeName" : "egS3Store", "path" : "/json/airbnb", "defaultFormat" : ".json" }]})

The previous command returns the following output:

{ "ok" : 1 }

The following commands show that the collection was successfully created:

> show collections
airbnb
> db.runCommand({"storageGetConfig" : 1 })
{
"ok" : 1,
"storage" : {
"stores" : [{
"name" : "egS3Store",
"provider" : "s3",
"region" : "us-east-2",
"bucket" : "test-data-lake",
"delimiter" : "/",
"prefix" : ""
}],
"databases" : [{
"name" : "sampleDB",
"collections" : [{
"name" : "airbnb",
"dataSources" : [{
"storeName" : "egS3Store",
"path" : "/json/airbnb",
"defaultFormat" : ".json"
}]
}]
}]
}
}

Multiple Data Sources Example

The following command creates a collection named egCollection in the sampleDB database in the storage configuration. The egCollection collection maps to the following sample datasets:

  • airbnb dataset in the json folder in the S3 store named egS3Store
  • airbnb dataset in the sample_airbnb.listingsAndReviews collection on the Atlas cluster named myTestCluster
  • airbnb dataset in the URL https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json
use sampleDB
db.runCommand({ "create" : "egCollection", "dataSources" : [{ "storeName" : "egS3Store", "path" : "/json/airbnb" },{ "storeName" : "egAtlasStore", "database": "sample_airbnb", "collection": "listingsAndReviews" },{"storeName" : "egHttpStore", "urls": ["https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json"]}]})

The previous command returns the following output:

{ "ok" : 1 }

The following commands show that the collection was successfully created:

> show collections
egCollection
> db.runCommand({"storageGetConfig":1})
{
"ok" : 1,
"storage" : {
"stores" : [{
"name" : "egS3Store",
"provider" : "s3",
"region" : "us-east-2",
"bucket" : "test-data-lake",
"delimiter" : "/",
"prefix" : ""
},
{
"name" : "egAtlasStore",
"provider" : "atlas",
"clusterName" : "myTestCluster",
"projectId" : "<project-id>"
},
{
"name" : "egHttpStore",
"provider" : "http",
"urls" : ["https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json"]
}
],
"databases" : [{
"name" : "sampleDB",
"collections" : [{
"name" : "egCollection",
"dataSources" : [
{
"storeName" : "egS3Store",
"path" : "json/airbnb"
},
{
"storeName" : "egAtlasStore",
"database" : "sample_airbnb",
"collection" : "listingsAndReviews"
},
{
"storeName" : "egHttpStore",
"urls" : ["https://atlas-data-lake.s3.amazonaws.com/json/sample_airbnb/listingsAndReviews.json"]
}
]
}]
}]
}
}

Wildcard Usage Examples

This example shows how the wildcard "*" can be specified with the create command.

Collection Name Example

The following example uses the create command to dynamically create collections.

The following example uses the create command to dynamically create collections for the files in the path /json/ in the egS3Store data store. It uses the collectionName() function to name the collections after the filenames in the specified path.

use sampleDB
db.runCommand({ "create" : "*", "dataSources" : [{ "storeName" : "egS3Store", "path": "/json/{collectionName()}"}]})

The previous command returns the following output:

{ "ok" : 1 }

The following commands show that the collection was successfully created:

> show collections
airbnb
> db.runCommand({"storageGetConfig" : 1 })
{
"ok" : 1,
"storage" : {
"stores" : [{
"name" : "egS3Store",
"provider" : "s3",
"region" : "us-east-2",
"bucket" : "test-data-lake",
"delimiter" : "/",
"prefix" : ""
}],
"databases" : [{
"name" : "sampleDB",
"collections" : [{
"name" : "*",
"dataSources" : [{
"storeName" : "egS3Store",
"path" : "/json/{collectionName()}"
}]
}]
}]
}
}

Path Glob Example

The following example uses the create command to create a collection named egCollection that maps to a Data Lake store named egS3Store. The egS3Store contains the sample dataset, airbnb, in a folder named json.

use sampleDB
db.runCommand({ "create" : "egCollection", "dataSources" : [{ "storeName" : "egS3Store", "path": "/json/*"}]}})

The previous command returns the following output:

{ "ok" : 1 }

The following commands show that the collection was successfully created:

> show collections
egCollection
> db.runCommand({"storageGetConfig" : 1 })
{
"ok" : 1,
"storage" : {
"stores" : [{
"name" : "egS3Store",
"provider" : "s3",
"region" : "us-east-2",
"bucket" : "test-data-lake",
"delimiter" : "/",
"prefix" : ""
}],
"databases" : [{
"name" : "sample",
"collections" : [{
"name" : "egCollection",
"dataSources" : [{
"storeName" : "egS3Store",
"path" : "/json/*"
}]
}]
}]
}
}

You can verify that the command successfully created the collection or view by running one of the following commands:

show collections
db.runCommand({ "storageGetConfig" : 1 })
db.runCommand({ "listCollections" : 1 })

If the command fails, it returns one of the following errors:

Store Name Does Not Exist

{
"ok" : 0,
"errmsg" : "store name does not exist",
"code" : 9,
"codeName" : "FailedToParse"
}

Solution: Ensure that the specified storeName matches the name of a store in the stores array. You can run the listStores command to retrieve the list of stores in your Data Lake storage configuration.

Collection Name Already Exists

{
"ok" : 0,
"errmsg" : "collection name already exists in the database",
"code" : 9,
"codeName" : "FailedToParse"
}

Solution: Ensure that the collection name is unique. You can run the show collections command to retrieve the list of existing collections.

Give Feedback