Fix This Page
Navigation

Write to MongoDB

To create a DataFrame, first create a SparkSession object, then use the object’s createDataFrame() function. In the following example, createDataFrame() takes a list of tuples containing names and ages, and a list of column names:

people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])

Write the people DataFrame to the MongoDB database and collection specified in the spark.mongodb.output.uri option by using the write method:

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

The above operation writes to the MongoDB database and collection specified in the spark.mongodb.output.uri option when you connect to the pyspark shell.

To read the contents of the DataFrame, use the show() method.

people.show()

In the pyspark shell, the operation prints the following output:

+-------------+----+
|         name| age|
+-------------+----+
|Bilbo Baggins|  50|
|      Gandalf|1000|
|       Thorin| 195|
|        Balin| 178|
|         Kili|  77|
|       Dwalin| 169|
|          Oin| 167|
|        Gloin| 158|
|         Fili|  82|
|       Bombur|null|
+-------------+----+

The printSchema() method prints out the DataFrame’s schema:

people.printSchema()

In the pyspark shell, the operation prints the following output:

root
 |-- _id: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)

If you need to write to a different MongoDB collection, use the .option() method with .write().

To write to a collection called contacts in a database called people, specify people.contacts in the output URI option.

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").option("database",
"people").option("collection", "contacts").save()