Write to MongoDB¶

On this page

Writing with Options

To create a DataFrame, first create a SparkSession object, then use the object’s createDataFrame() function. The sparkR shell provides a default SparkSession object called spark.

To create a DataFrame, use the createDataFrame method to convert an R data.frame to a Spark DataFrame. To save the DataFrame to MongoDB, use the write.df() method:

copy

charactersRdf <- data.frame(list(name=c("Bilbo Baggins", "Gandalf", "Thorin",
                      "Balin", "Kili", "Dwalin", "Oin", "Gloin", "Fili", "Bombur"),
                      age=c(50, 1000, 195, 178, 77, 169, 167, 158, 82, NA)))

charactersSparkdf <- createDataFrame(charactersRdf)
write.df(charactersSparkdf, "", source = "com.mongodb.spark.sql.DefaultSource",
         mode = "overwrite")

Note

The empty argument (“”) refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.

The above operation writes to the MongoDB database and collection specified in the spark.mongodb.output.uri option specified in the sparkR shell arguments or SparkSession configuration.

To read the first few rows of the DataFrame, use the head() method.

copy

head(charactersSparkdf)

The operation prints the following output:

copy

           name  age
Bilbo Baggins   50
     Gandalf 1000
      Thorin  195
       Balin  178
        Kili   77
      Dwalin  169

The printSchema() method prints out the DataFrame’s schema:

copy

printSchema(charactersSparkdf)

In the sparkR shell, the operation prints the following output:

copy

root
 |-- name: string (nullable = true)
 |-- age: double (nullable = true)

Writing with Options¶

You can add arguments to the write.df() method to specify a MongoDB database and collection.

The following operation writes the charactersSparkdf data to a MongoDB collection called ages in a database called characters.

copy

write.df(charactersSparkdf, "", source = "com.mongodb.spark.sql.DefaultSource",
         mode = "overwrite", database = "characters", collection = "ages")

← Spark Connector R Guide Read from MongoDB →