Run Queries Against Your Data Lake

Estimated completion time: 5 minutes

You can run operations using the MongoDB Query Language (MQL) which includes most, but not all standard server commands. In particular, the Atlas Data Lake is currently a read-only service. MQL operations can run in parallel to enhance performance even for large and complex queries. However, Data Lake is designed for analytics-type workloads and is not intended for day-to-day operational workloads. To learn which MQL operations are supported, see the MQL Support documentation.


To complete this part of the tutorial, you will need to have completed:

You must be connected to your Data Lake with the mongo shell before running the following queries.


Find instances in the weather data where pressure is higher than 900 millibars. Sort by timestamp and limit the number of documents returned:{"pressure": {$gt: 900}}).limit(5).sort({ "ts": 1})

Find AirBnB offerings in Porto with a high review score:

db.listingsAndReviews.find( { "" : "Porto", "review_scores.review_scores_rating": {$gt: 79}})

Find properties in New York for less than $200 per night, and sort the returned documents by customer review rating:

db.listingsAndReviews.find({ "" : "New York", "price": {$lt: NumberDecimal("200.00")} } ).sort({review_scores_rating: -1})

Find the average cost of accommodation in Porto by accommodation type:

db.listingsAndReviews.aggregate([{ $match: { "" : "Porto" } },{ $group : {"_id" : "$property_type", avgPrice: {$avg: "$price"}}}])

Find the average price per night of an apartment in Sydney:

db.listingsAndReviews.aggregate([{ $match: { "" : "Sydney", "property_type" : "Apartment" } }, { $group : {"_id" : "$property_type", avgPrice: {$avg: "$price"}}}])

Find the number of apartments available to rent in Barcelona.

db.listingsAndReviews.aggregate([{ $match: { "" : "Barcelona" , "property_type": "Apartment"} },{ $count: "numApartments"}])


Congratulations! You just set up an Atlas Data Lake, created a database and collections from data stored in an S3 bucket, and queried the data using MQL commands.

For more information on Atlas Data Lake, see Atlas Data Lake.

Screenshot of the Data Lake after running queries.


When you dynamically generate collections from filenames, the number of collections is not accurately reported in the Data Lake view.