Docs Menu

FAQ

For any MongoDB deployment, the Mongo Spark Connector sets the preferred location for an RDD to be where the data is:

  • For a non sharded system, it sets the preferred location to be the hostname(s) of the standalone or the replica set.
  • For a sharded system, it sets the preferred location to be the hostname(s) of the shards.

To promote data locality,

Spark streams can be considered as a potentially infinite source of RDDs. Therefore, anything you can do with an RDD, you can do with the results of a Spark Stream.

For an example, see SparkStreams.scala

In MongoDB deployments with mixed versions of mongod, it is possible to get an Unrecognized pipeline stage name: '$sample' error. To mitigate this situation, explicitly configure the partitioner to use and define the Schema when using DataFrames.

Give Feedback
© 2021 MongoDB, Inc.

About

  • Careers
  • Legal Notices
  • Privacy Notices
  • Security Information
  • Trust Center
© 2021 MongoDB, Inc.