Write to MongoDB¶

To create a DataFrame, first create a SparkSession object, then use the object’s createDataFrame() function. In the following example, createDataFrame() takes a list of tuples containing names and ages, and a list of column names:

copy

people = spark.createDataFrame([("Bilbo Baggins",  50), ("Gandalf", 1000), ("Thorin", 195), ("Balin", 178), ("Kili", 77),
   ("Dwalin", 169), ("Oin", 167), ("Gloin", 158), ("Fili", 82), ("Bombur", None)], ["name", "age"])

Write the people DataFrame to the MongoDB database and collection specified in the spark.mongodb.output.uri option by using the write method:

copy

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").save()

The above operation writes to the MongoDB database and collection specified in the spark.mongodb.output.uri option when you connect to the pyspark shell.

To read the contents of the DataFrame, use the show() method.

copy

people.show()

In the pyspark shell, the operation prints the following output:

copy

+-------------+----+
|         name| age|
+-------------+----+
|Bilbo Baggins|  50|
|      Gandalf|1000|
|       Thorin| 195|
|        Balin| 178|
|         Kili|  77|
|       Dwalin| 169|
|          Oin| 167|
|        Gloin| 158|
|         Fili|  82|
|       Bombur|null|
+-------------+----+

The printSchema() method prints out the DataFrame’s schema:

copy

people.printSchema()

In the pyspark shell, the operation prints the following output:

copy

root
 |-- _id: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- age: long (nullable = true)
 |-- name: string (nullable = true)

If you need to write to a different MongoDB collection, use the .option() method with .write().

To write to a collection called contacts in a database called people, specify people.contacts in the output URI option.

copy

people.write.format("com.mongodb.spark.sql.DefaultSource").mode("append").option("database",
"people").option("collection", "contacts").save()

← Spark Connector Python Guide Read from MongoDB →