Navigation

Map-ReduceΒΆ

Map-Reduce is a data processing paradigm for condensing large volumes of data into aggregated results.

Note

The aggregation framework provides better performance and usability than map-reduce operations, and should be preferred for new development.

A map-reduce operation is issued on a collection view, as obtained from Collection#find method, by calling the map_reduce method on the view. The map_reduce method takes three arguments: the mapper, the reducer and map-reduce options. The mapper and the reducer must be provided as strings containing JavaScript functions.

For example, given the following collection with values 1 through 10:

coll = client['foo']
10.times do |i|
  coll.insert_one(v: i)
end

The following invocation will sum up the values less than 6:

coll.find(v: {'$lt' => 6}).map_reduce(
  'function() { emit(null, this.v) }',
  'function(key, values) { return Array.sum(values) }',
).first['value']
# => 15.0

The map_reduce method returns an instance of Mongo::Collection::View::MapReduce - a map-reduce view which holds the parameters to be used for the operation. To execute the operation, either iterate the results (by using e.g. each, first or to_a on the view object) or invoke the execute method. The execute method issues the map-reduce operation but does not return the result set from the server, and is primarily useful for when the output of the operation is directed to a collection as follows:

coll.find(...).map_reduce(...).out('destination_collection').execute

Note that:

  • If the results of map-reduce are not directed to a collection, they are said to be retrieved inline. In this case the entire result set must fit in the 16 MiB BSON document size limit.
  • If the results of map-reduce are directed to a collection, and the map-reduce view is iterated, the driver automatically retrieves the entire collection and returns its contents as the result set. The collection is retrieved without sorting. If map-reduce is performed into a collection that is not empty, the driver will return the documents as they exist in the collection after the map-reduce operation completes, which may include the documents that were in the collection prior to the map-reduce operation.
coll.find(...).map_reduce(...).out('destination_collection').each do |doc|
  # ...
end

coll.find(...).map_reduce(...).out(replace: 'destination_collection', db: 'db_name').each do |doc|
  # ...
end

Given a map-reduce view, it can be configured using the following methods:

Method Description
js_mode Sets the jsMode flag for the operation.
out Directs the output to the specified collection, instead of returning the result set.
scope Sets the scope for the operation.
verbose Sets whether to include the timing information in the result.

The following accessor methods are defined on the view object:

Method Description
js_mode Returns the current jsMode flag value.
map_function Returns the map function as a string.
out Returns the current output location for the operation.
reduce_function Returns the reduce function as a string.
scope Returns the current scope for the operation.
verbose Returns whether to include the timing information in the result.
←   Aggregation Text Search  →