Navigation

Read From MongoDB

Use the MongoSpark.load method to create an RDD representing a collection.

The following example loads the collection specified in the SparkConf:

val rdd = MongoSpark.load(sc)

println(rdd.count)
println(rdd.first.toJson)

To specify a different collection, database, and other read configuration settings, pass a ReadConfig to MongoSpark.load().

Using a ReadConfig

MongoSpark.load() can accept a ReadConfig object which specifies various read configuration settings, such as the collection or the Read Preference.

The following example reads from the spark collection with a secondaryPreferred read preference:

import com.mongodb.spark.config._

val readConfig = ReadConfig(Map("collection" -> "spark", "readPreference.name" -> "secondaryPreferred"), Some(ReadConfig(sc)))
val customRdd = MongoSpark.load(sc, readConfig)

println(customRdd.count)
println(customRdd.first.toJson)

SparkContext Load Helper Methods

SparkContext has an implicit helper method loadFromMongoDB() to load data from MongoDB.

For example, use the loadFromMongoDB() method without any arguments to load the collection specified in the SparkConf:

sc.loadFromMongoDB() // Uses the SparkConf for configuration

Call loadFromMongoDB() with a ReadConfig object to specify a different MongoDB server address, database and collection. See input configuration settings for available settings:

sc.loadFromMongoDB(ReadConfig(Map("uri" -> "mongodb://example.com/database.collection"))) // Uses the ReadConfig