Navigation

Deploy a Data Lake

On this page

  • Prerequisites
  • Procedure
  • Next Steps

Estimated completion time: 15 minutes

This part of the tutorial will guide you through deploying an Atlas Data Lake.

To complete this part of the tutorial, you will need to create a MongoDB Atlas account, if you do not have one already.

1
2
3
  • For your first Data Lake, click Create a Data Lake.
  • For your subsequent Data Lakes, click Configure a New Data Lake.
4
Screenshot of the Data Lake Overview.
5
Path to Sample Dataset
Description
/airbnb/listingsAndReviews/{bedrooms string}/{review_scores.review_scores_rating int}/

This path references the airbnb dataset, which contains the vacation home listing details and customer reviews. To learn more about this dataset, see Sample AirBnB Listings Dataset.

For this path, Data Lake utilizes partitions optimized for queries on the bedrooms field and review_scores.review_score_ratings field.

/analytics/accounts/{limit int}/

This path references the analytics dataset, which contains data for a typical finanacial services application. To learn more about this dataset, see Sample Analytics Dataset.

For this path, Data Lake utilizes partitions optimized for queries on the limit field.

/analytics/customers/{birthdate isodate}/

This data references the analytics dataset, which contains collections for a typical finanacial services application. To learn more about this dataset, see Sample Analytics Dataset.

For this path, Data Lake utilizes partitions optimized for queries on the birthdate field.

/analytics/transactions/{account_id int}/

This path references the analytics dataset, which contains data for a typical finanacial services application. To learn more about this dataset, see Sample Analytics Dataset.

For this path, Data Lake utilizes partitions optimized for queries on the account_id field.

/mflix/comments/{shortDate isodate}/{movie_id objectid}/

This path references the mflix dataset, which contains data on movies and movie theaters. To learn more about this dataset, see Sample Mflix Dataset.

For this path, Data Lake utilizes partitions optimized for queries on the date and movie_id fields.

/mflix/movies/{type string}/{year int}/

This path references the mflix dataset, which contains data on movies and movie theaters. To learn more about this dataset, see Sample Mflix Dataset.

For this path, Data Lake utilizes partitions optimized for queries on the type and year fields.

/mflix/sessions.json

This path references the mflix dataset, which contains data on movies and movie theaters. To learn more about this dataset, see Sample Mflix Dataset.

This path does not contain any partition attributes and so, for queries against data in the collection, Data Lake searches all the files in the collection.

/mflix/theaters/{theaterId string}/{location.address.zipcode string}/

This path references the mflix dataset, which contains data on movies and movie theaters. To learn more about this dataset, see Sample Mflix Dataset.

For this path, Data Lake utilizes partitions optimized for queries on the theaterId and location.address.zipcode fields.

/mflix/users.json

This path references the mflix collection, which contains data on movies and movie theaters. To learn more about this dataset, see Sample Mflix Dataset.

This path does not contain any partition attributes and so, for queries against data in the collection, Data Lake searches all the files in the collection.

/nyc-yellow-cab-trips/{trip_start_isodate isodate}/{passenger_count int}/{fare_type string}/

The path references the nyc-yellow-cab-trips dataset, which contains data on the trips, including trip date, fare, and number of passengers.

For this path, Data Lake utilizes partitions optimized for queries on the trip_start_isodate, passenger_count, and fare_type fields.

6

You need not modify the database or collection name because the sample queries that you run against the sample datasets later in this tutorial use the default names.

7

Now that your Data Lake is deployed, proceed to Connect to Your Data Lake.

Screenshot of the deployed Data Lake at this point.
Give Feedback

On this page

  • Prerequisites
  • Procedure
  • Next Steps