Navigation

Spark Connector Java Guide

Source Code

For the source code that combines all of the Java examples, see JavaIntroduction.java.

Prerequisites

  • Basic working knowledge of MongoDB and Apache Spark. Refer to the MongoDB documentation and Spark documentation for more details.
  • Running MongoDB instance (version 2.6 or later).
  • Spark 2.0.x.
  • Scala 2.11.x
  • Java 6 or later.

Getting Started

Dependency Management

Provide the Spark Core, Spark SQL, and MongoDB Spark Connector dependencies to your dependency management tool.

The following excerpt is from a Maven pom.xml file:

<dependencies>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-core_2.11</artifactId>
    <version>2.0.0</version>
  </dependency>
  <dependency>
    <groupId>org.mongodb.spark</groupId>
    <artifactId>mongo-spark-connector_2.11</artifactId>
    <version>2.0.0</version>
  </dependency>
  <dependency>
    <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.11</artifactId>
    <version>2.0.0</version>
  </dependency>
</dependencies>

Configuration

For the configuration classes, use the Java-friendly create methods instead of the native Scala apply methods.

The Java API provides a JavaSparkContext that takes a SparkContext object from the SparkSession.

When specifying the Connector configuration via SparkSession, you must prefix the settings appropriately. For details and other available MongoDB Spark Connector options, see the Configuration Options.

package com.mongodb.spark_examples;

import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;

public final class GettingStarted {

  public static void main(final String[] args) throws InterruptedException {
    /* Create the SparkSession.
     * If config arguments are passed from the command line using --conf,
     * parse args for the values to set.
     */
    SparkSession spark = SparkSession.builder()
      .master("local")
      .appName("MongoSparkConnectorIntro")
      .config("spark.mongodb.input.uri", "mongodb://127.0.0.1/test.myCollection")
      .config("spark.mongodb.output.uri", "mongodb://127.0.0.1/test.myCollection")
      .getOrCreate();

    // Create a JavaSparkContext using the SparkSession's SparkContext object
    JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());

    // More application logic would go here...

    jsc.close();

  }
}
  • The spark.mongodb.input.uri specifies the MongoDB server address(127.0.0.1), the database to connect (test), and the collection (myCollection) from which to read data, and the read preference.
  • The spark.mongodb.output.uri specifies the MongoDB server address(127.0.0.1), the database to connect (test), and the collection (myCollection) to which to write data.

You can use a SparkSession object to write data to MongoDB, read data from MongoDB, create Datasets, and perform SQL operations.

MongoSpark Helper

To facilitate interaction between MongoDB and Spark, the MongoDB Spark Connector provides the com.mongodb.spark.api.java.MongoSpark helper.