Fix This Page
Navigation

Write to MongoDB

When saving RDD data into MongoDB, the data must be convertible to a BSON document. You may need to include a map transformation to convert the data into a Document (or BsonDocument or a DBObject).

The following example creates a 10 document RDD and saves it to the MongoDB collection specified in the SparkConf:

import org.bson.Document

val documents = sc.parallelize((1 to 10).map(i => Document.parse(s"{test: $i}")))

MongoSpark.save(documents) // Uses the SparkConf for configuration

Using a WriteConfig

MongoSpark.save() can accept a WriteConfig object which specifies various write configuration settings, such as the collection or the write concern.

For example, the following code saves data to the spark collection with a majority write concern:

import com.mongodb.spark.config._

val writeConfig = WriteConfig(Map("collection" -> "spark", "writeConcern.w" -> "majority"), Some(WriteConfig(sc)))
val sparkDocuments = sc.parallelize((1 to 10).map(i => Document.parse(s"{spark: $i}")))

MongoSpark.save(sparkDocuments, writeConfig)

RDD Save Helper Methods

RDDs have an implicit helper method saveToMongoDB() to write data to MongoDB:

For example, the following uses the documents RDD defined above and uses its saveToMongoDB() method without any arguments to save the documents to the collection specified in the SparkConf:

documents.saveToMongoDB() // Uses the SparkConf for configuration

Call saveToMongoDB() with a WriteConfig object to specify a different MongoDB server address, database and collection. See write configuration settings for available settings:

documents.saveToMongoDB(WriteConfig(Map("uri" -> "mongodb://example.com/database.collection")))
// Uses the WriteConfig

Unsupported Types

Some Scala types (e.g. Lists) are unsupported and should be converted to their Java equivalent. To convert from Scala into native types include the following import statement to use the .asJava method:

The following operation imports the .asJava method, converts a Scala list to its Java equivalent, and saves it to MongoDB:

import scala.collection.JavaConverters._
import org.bson.Document

val documents = sc.parallelize(
  Seq(new Document("fruits", List("apples", "oranges", "pears").asJava))
)
MongoSpark.save(documents)