Navigation

create

The create command creates a collection for existing stores or a view on a collection in the Atlas Data Lake storage configuration.

The wildcard "*" can be used with the create command in two ways:

  • As the name of the collection to dynamically create collections that maps to files and folders in the specified file path on the stores data store.
  • In the path parameter to create a collection that maps to multiple files and folders in the specified file path on the stores data store.

Click on the tab to learn more about creating a collection or a view.

This tab contains the syntax and parameters for creating a collection. Choose the tab for your data store to learn more about the syntax and parameters for the data store.

This tab contains the syntax and parameters for creating a collection for S3 data store.

This tab contains the syntax and parameters for creating a collection for Atlas data store.

This tab contains the syntax and parameters for creating a view for a collection.

Syntax

db.runCommand({ "create" : "<collection-name>|*", "dataSources" : [{ "storeName" : "<store-name>", "path" : "<path-to-files-or-folders>", "defaultFormat" :  "<file-extension>" }]})
db.runCommand({ "create" : "<collection-name>|*", "dataSources" : [{ "storeName" : "<store-name>", "database" : "<atlas-database-name>", "collection" :  "<atlas-collection-name>" }]})
db.runCommand({ "create" : "<view-name>", "viewOn" :" <collection-name>", "pipeline" : ["<stage1>","<stage2>",...] })

Parameters

ParameterTypeDescriptionRequired?
<collection-name>|*string

Either the name of the collection to which Data Lake maps the data contained in the data store or the wildcard "*" to dynamically create collections.

You can generate collection names dynamically from file paths by specifying * for the collection name and the collectionName() function in the dataSources.collection field.

You can generate collection names dynamically by specifying * for the collection name and omitting the dataSources.collection field.

yes
dataSourcesobjectArray of objects where each object represents a data store in the stores array to map with the collection.yes
dataSources.storeNamestringThe name of a data store to map to the collection. The value must match the stores.[n].name in the stores array.yes
dataSources.pathstringThe path to the files and folders. Specify / to capture all files and folders from the prefix path. See Path Syntax Examples for more information.yes
dataSources.defaultFormatstring

The format that Data Lake defaults to if it encounters a file without an extension while querying the data store. If omitted, Data Lake attempts to detect the file type by processing a few bytes of the file. The following values are valid:

.json, .json.gz, .bson, .bson.gz, .avro, .avro.gz, .tsv, .tsv.gz, .csv, .csv.gz, .parquet

no
dataSources.databasestringThe name of the database that contains the collection in the Atlas cluster.yes
dataSources.collectionstringThe name of the collection in the Atlas database. When creating a wildcard collection, this must not be specified.yes
ParameterTypeDescriptionRequired?
<view-name>stringThe name of the view. A view name must be unique. It cannot be the same as a collection name or any other view name in the same database.yes
viewOnstringThe name of the source collection on which to create the view.yes
pipelinearray of stages

The array of aggregation pipeline stages to use to create the view.

The view definition pipeline cannot include the $out or the $merge stage, even inside nested pipeline stages like $lookup or $facet.

yes

Output

The command returns the following output if it succeeds. You can verify the results by running the commands in Verify Collection. If it fails, see Troubleshoot Errors below for recommended solutions.

{ ok: 1 }

Examples

The following examples use the sample dataset, airbnb, on AWS S3 store with the following settings:

Store NameegS3Store
Regionus-east-2
Buckettest-data-lake
Prefixjson
Delimiter/
Sample Datasetairbnb

Use the procedure in the Getting Started with Atlas Data Lake Tutorial to prepare your S3 bucket and upload the sample dataset.

The following examples use the sample_airbnb.listingsAndReviews collection from the sample dataset on the Atlas cluster with the following settings:

Store NameegAtlasStore
Sample Datasetsample_airbnb.listingsAndReviews

Review Load Sample Data into Your Atlas Cluster to load the sample dataset in your Atlas cluster.

Basic Example

The following command creates a collection named airbnb in the sampleDB database in the storage configuration.

The airbnb collection maps to the airbnb sample dataset in the json folder in the S3 store named egS3Store.

use sampleDB
db.runCommand({ "create" : "airbnb", "dataSources" : [{ "storeName" : "egS3Store", "path" : "/json/airbnb", "defaultFormat" : ".json" }]})

The previous command returns the following output:

{ "ok" : 1 }

The following commands show that the collection was successfully created:

> show collections
airbnb
> db.runCommand({"storageGetConfig" : 1 })
{
        "ok" : 1,
        "storage" : {
                "stores" : [{
                              "name" : "egS3Store",
                              "provider" : "s3",
                              "region" : "us-east-2",
                              "bucket" : "test-data-lake",
                              "delimiter" : "/",
                              "prefix" : ""
                }],
                "databases" : [{
                        "name" : "sampleDB",
                        "collections" : [{
                                "name" : "airbnb",
                                "dataSources" : [{
                                        "storeName" : "egS3Store",
                                        "path" : "/json/airbnb",
                                        "defaultFormat" : ".json"
                                }]
                        }]
                }]
        }
}

The airbnb collection maps to the listingsAndReviews sample collection in the sample_airbnb database on the Atlas cluser.

use sampleDB
db.runCommand({ "create" : "airbnb", "dataSources" : [{ "storeName" : "egAtlasStore", "database" : "sample_airbnb", "collection" : "listingsAndReviews" }]})

The previous command returns the following output:

{ "ok" : 1 }

The following commands show that the collection was successfully created:

> show collections
airbnb
> db.runCommand({"storageGetConfig":1})
{
        "ok" : 1,
        "storage" : {
                "stores" : [{
                              "name" : "egAtlasStore",
                              "provider" : "atlas",
                              "clusterName" : "myTestCluster",
                              "projectId" : "<project-id>"
                      }],
                "databases" : [{
                        "name" : "sampleDB",
                        "collections" : [{
                                "name" : "airbnb",
                                "dataSources" : [{
                                        "storeName" : "egAtlasStore",
                                        "database" : "sample_airbnb",
                                        "collection" : "listingsAndReview"
                                }]
                        }]
                }]
        }
}

Multiple Data Sources Example

The following command creates a collection named egCollection in the sampleDB database in the storage configuration. The egCollection collection maps to the following sample datasets:

  • airbnb dataset in the json folder in the S3 store named egS3Store
  • airbnb dataset in the sample_airbnb.listingsAndReviews collection on the Atlas cluster named myTestCluster
use sampleDB
db.runCommand({ "create" : "egCollection", "dataSources" : [{ "storeName" : "egS3Store", "path" : "/json/airbnb" },{ "storeName" : "egAtlasStore", "database": "sample_airbnb", "collection": "listingsAndReviews" }]})

The previous command returns the following output:

{ "ok" : 1 }

The following commands show that the collection was successfully created:

> show collections
egCollection
> db.runCommand({"storageGetConfig":1})
{
        "ok" : 1,
        "storage" : {
                "stores" : [{
                              "name" : "egS3Store",
                              "provider" : "s3",
                              "region" : "us-east-2",
                              "bucket" : "test-data-lake",
                              "delimiter" : "/",
                              "prefix" : ""
                      },
                      {
                              "name" : "egAtlasStore",
                              "provider" : "atlas",
                              "clusterName" : "myTestCluster",
                              "projectId" : "<project-id>"
                      }],
                "databases" : [{
                        "name" : "sampleDB",
                        "collections" : [{
                                "name" : "egCollection",
                                "dataSources" : [
                                        {
                                          "storeName" : "egS3Store",
                                          "path" : "json/airbnb"
                                        },
                                        {
                                          "storeName" : "egAtlasStore",
                                          "database" : "sample_airbnb",
                                          "collection" : "listingsAndReviews"
                                        }
                                ]
                        }]
                }]
        }
}

Wildcard Usage Examples

This example shows how the wildcard "*" can be specified with the create command.

Collection Name Example

The following example uses the create command to dynamically create collections.

The following example uses the create command to dynamically create collections for the files in the path /json/ in the egS3Store data store. It uses the collectionName() function to name the collections after the filenames in the specified path.

use sampleDB
db.runCommand({ "create" : "*", "dataSources" : [{ "storeName" : "egS3Store", "path": "/json/{collectionName()}"}]})

The previous command returns the following output:

{ "ok" : 1 }

The following commands show that the collection was successfully created:

 > show collections
 airbnb
 > db.runCommand({"storageGetConfig" : 1 })
 {
   "ok" : 1,
   "storage" : {
     "stores" : [{
       "name" : "egS3Store",
       "provider" : "s3",
       "region" : "us-east-2",
       "bucket" : "test-data-lake",
       "delimiter" : "/",
       "prefix" : ""
     }],
     "databases" : [{
       "name" : "sampleDB",
       "collections" : [{
         "name" : "*",
         "dataSources" : [{
           "storeName" : "egS3Store",
           "path" : "/json/{collectionName()}"
         }]
       }]
     }]
   }
 }

The following example uses the create command to dynamically create collections for the documents in the sample_airbnb database on the Atlas cluster name myTestCluster.

use sampleDB
db.runCommand({ "create" : "*", "dataSources" : [{ "storeName" : "egAtlasStore", "database": "sample_airbnb"}]})

The previous command returns the following output:

 { "ok" : 1 }

The following command shows that the collection was successfully created:

> db.runCommand({storageGetConfig:1})
{
  "ok" : 1,
  "storage" : {
    "stores" : [{
      "name" : "egAtlasStore",
      "provider" : "atlas",
      "clusterName" : "myTestCluster",
      "projectId" : "<project-id>"
    }],
    "databases" : [{
      "name" : "sampleDB",
      "collections" : [{
        "name" : "*",
        "dataSources" : [{
          "storeName" : "egAtlasStore",
          "database" : "sample_airbnb"
         }]
      }]
    }]
  }
}
> show collections
listingsAndReviews

Path Glob Example

The following example uses the create command to create a collection named egCollection that maps to a Data Lake store named egS3Store. The egS3Store contains the sample dataset, airbnb, in a folder named json.

use sampleDB
db.runCommand({ "create" : "egCollection", "dataSources" : [{ "storeName" : "egS3Store", "path": "/json/*"}]}})

The previous command returns the following output:

{ "ok" : 1 }

The following commands show that the collection was successfully created:

> show collections
egCollection
> db.runCommand({"storageGetConfig" : 1 })
{
        "ok" : 1,
        "storage" : {
                "stores" : [{
                  "name" : "egS3Store",
                  "provider" : "s3",
                  "region" : "us-east-2",
                  "bucket" : "test-data-lake",
                  "delimiter" : "/",
                  "prefix" : ""
                }],
                "databases" : [{
                  "name" : "sample",
                  "collections" : [{
                          "name" : "egCollection",
                          "dataSources" : [{
                                  "storeName" : "egS3Store",
                                  "path" : "/json/*"
                                }]
                        }]
                }]
        }
}

The following command creates a view named listings on the airbnb collection in the sample database with the name and property_type fields:

use sampleDB
db.runCommand({ "create" : "listings", "viewOn" : "airbnb", "pipeline" : [{$project: {"property_type":1, "name": 1}}] })

This command returns the following output:

{ "ok" : 1 }

The listCollections and storageGetConfig commands return the following output:

> db.runCommand({"listCollections":1})
{
        "ok" : 1,
        "cursor" : {
                "firstBatch" : [
                        {
                                "name" : "airbnb/",
                                "type" : "collection",
                                "info" : {
                                        "readOnly" : true
                                }
                        },
                        {
                                "name" : "listings",
                                "type" : "view",
                                "info" : {
                                        "readOnly" : true
                                }
                        }
                ],
                "id" : NumberLong(0),
                "ns" : "egS3Store.$cmd.listCollections"
        }
}
> db.runCommand({"storageGetConfig":1})
{
        "ok" : 1,
        "storage" : {
                "stores" : [
                        {
                                "name" : "egS3Store",
                                "provider" : "s3",
                                "region" : "us-east-2",
                                "bucket" : "test-data-lake",
                                "delimiter" : "/"
                        }
                ],
                "databases" : [
                        {
                                "name" : "sample",
                                "collections" : [
                                        {
                                                "name" : "airbnb/",
                                                "dataSources" : [
                                                        {
                                                                "storeName" : "egS3Store",
                                                                "path" : "json/airbnb/*"
                                                        }
                                                ]
                                        },
                                        {
                                                "name" : "*",
                                                "dataSources" : [
                                                        {
                                                                "storeName" : "egS3Store",
                                                                "path" : "json/{collectionName()}"
                                                        }
                                                ]
                                        }
                                ],
                                "views" : [
                                        {
                                                "name" : "listings",
                                                "source" : "airbnb",
                                                "pipeline" : "[{\"$project\":{\"property_type\":{\"$numberInt\":\"1\"},\"name\":{\"$numberInt\":\"1\"}}}]"
                                        }
                                ]
                        },
                ]
        }
}

Verify Collection

You can verify that the command successfully created the collection or view by running one of the following commands:

show collections
db.runCommand({ "storageGetConfig" : 1 })
db.runCommand({ "listCollections" : 1 })

Troubleshoot Errors

If the command fails, it returns one of the following errors:

Store Name Does Not Exist

{
        "ok" : 0,
        "errmsg" : "store name does not exist",
        "code" : 9,
        "codeName" : "FailedToParse"
}

Solution: Ensure that the specified storeName matches the name of a store in the stores array. You can run the listStores command to retrieve the list of stores in your Data Lake storage configuration.

Collection Name Already Exists

{
        "ok" : 0,
        "errmsg" : "collection name already exists in the database",
        "code" : 9,
        "codeName" : "FailedToParse"
}

Solution: Ensure that the collection name is unique. You can run the show collections command to retrieve the list of existing collections.

If the command fails, it returns the following error:

View Name Exists

{
        "ok" : 0,
        "errmsg" : "a view '<database>.<view>' already exists, correlationID = <1603aaffdbc91ba93de6364a>",
        "code" : 48,
        "codeName" : "NamespaceExists"
}

Solution: Ensure that the view name is unique. You can run the listCollections command to retrieve the list of existing views on a collection.