- Spark Connector R Guide >
- Filters and SQL
Filters and SQL¶
Filters¶
Note
When using filters with DataFrames or the R API, the underlying Mongo Connector code constructs an aggregation pipeline to filter the data in MongoDB before sending it to Spark.
Use filter()
to read a subset of data from your MongoDB collection.
Consider a collection named fruit
that contains the
following documents:
First, set up a dataframe to connect with your default MongoDB data source:
Note
The empty argument (“”) refers to a file to use as a data source. In this case our data source is a MongoDB collection, so the data source argument is empty.
The following operation filters the data and includes records where the
qty
field is greater than or equal to 10
:
The operation prints the following output:
SQL¶
Before running SQL queries on your dataset, you must register a temporary view for the dataset.
The following example registers a temporary table called temp
,
then uses SQL to query for records in which the type
field
contains the letter e
:
In the sparkR
shell, the operation prints the following output: