Navigation

Write to MongoDB

To create a DataFrame, first create a SparkSession object, then use the object's createDataFrame() function. The sparkR shell provides a default SparkSession object called spark.

To create a DataFrame, use the createDataFrame method to convert an R data.frame to a Spark DataFrame. To save the DataFrame to MongoDB, use the write.df() method:

charactersRdf <- data.frame(list(name=c("Bilbo Baggins", "Gandalf", "Thorin",
"Balin", "Kili", "Dwalin", "Oin", "Gloin", "Fili", "Bombur"),
age=c(50, 1000, 195, 178, 77, 169, 167, 158, 82, NA)))
charactersSparkdf <- createDataFrame(charactersRdf)
write.df(charactersSparkdf, "", source = "com.mongodb.spark.sql.DefaultSource",
mode = "overwrite")
Info With Circle IconCreated with Sketch.Note

The empty argument ("") refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.

The above operation writes to the MongoDB database and collection specified in the spark.mongodb.output.uri option specified in the sparkR shell arguments or SparkSession configuration.

To read the first few rows of the DataFrame, use the head() method.

head(charactersSparkdf)

The operation prints the following output:

name age
1 Bilbo Baggins 50
2 Gandalf 1000
3 Thorin 195
4 Balin 178
5 Kili 77
6 Dwalin 169

The printSchema() method prints out the DataFrame's schema:

printSchema(charactersSparkdf)

In the sparkR shell, the operation prints the following output:

root
|-- name: string (nullable = true)
|-- age: double (nullable = true)

You can add arguments to the write.df() method to specify a MongoDB database and collection.

The following operation writes the charactersSparkdf data to a MongoDB collection called ages in a database called characters.

write.df(charactersSparkdf, "", source = "com.mongodb.spark.sql.DefaultSource",
mode = "overwrite", database = "characters", collection = "ages")
Give Feedback