Docs Menu

Read From MongoDB

Use the MongoSpark.load method to create an RDD representing a collection.

The following example loads the collection specified in the SparkConf:

val rdd = MongoSpark.load(sc)
println(rdd.count)
println(rdd.first.toJson)

To specify a different collection, database, and other read configuration settings, pass a ReadConfig to MongoSpark.load().

MongoSpark.load() can accept a ReadConfig object which specifies various read configuration settings, such as the collection or the Read Preference.

The following example reads from the spark collection with a secondaryPreferred read preference:

import com.mongodb.spark.config._
val readConfig = ReadConfig(Map("collection" -> "spark", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
val customRdd = MongoSpark.load(sc, readConfig)
println(customRdd.count)
println(customRdd.first.toJson)

SparkContext has an implicit helper method loadFromMongoDB() to load data from MongoDB.

For example, use the loadFromMongoDB() method without any arguments to load the collection specified in the SparkConf:

sc.loadFromMongoDB() // Uses the SparkConf for configuration

Call loadFromMongoDB() with a ReadConfig object to specify a different MongoDB server address, database and collection. See input configuration settings for available settings:

sc.loadFromMongoDB(ReadConfig(Map("uri" -> "mongodb://example.com/database.collection"))) // Uses the ReadConfig
Give Feedback
© 2021 MongoDB, Inc.

About

  • Careers
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2021 MongoDB, Inc.