Navigation

Map/Reduce

Mongoid provides a DSL around MongoDB’s map/reduce framework, for performing custom map/reduce jobs or simple aggregations.

Execution

You can tell Mongoid off the class or a criteria to perform a map/reduce by calling map_reduce and providing map and reduce javascript functions.

map = %Q{
  function() {
    emit(this.name, { likes: this.likes });
  }
}

reduce = %Q{
  function(key, values) {
    var result = { likes: 0 };
    values.forEach(function(value) {
      result.likes += value.likes;
    });
    return result;
  }
}

Band.where(:likes.gt => 100).map_reduce(map, reduce).out(inline: 1)

Just like criteria, map/reduce calls are lazily evaluated. So nothing will hit the database until you iterate over the results, or make a call on the wrapper that would need to force a database hit.

Band.map_reduce(map, reduce).out(replace: "mr-results").each do |document|
  p document # { "_id" => "Tool", "value" => { "likes" => 200 }}
end

The only required thing you provide along with a map/reduce is where to output the results. If you do not provide this an error will be raised. Valid options to #out are:

  • inline: 1: Don’t store the output in a collection.
  • replace: "name": Store in a collection with the provided name, and overwrite any documents that exist in it.
  • merge: "name": Store in a collection with the provided name, and merge the results with the existing documents.
  • reduce: "name": Store in a collection with the provided name, and reduce all existing results in that collection.

Raw Results

Results of Map/Reduce execution can be retrieved via the execute method or its aliases raw and results:

mr = Band.where(:likes.gt => 100).map_reduce(map, reduce).out(inline: 1)

mr.execute
# => {"results"=>[{"_id"=>"Tool", "value"=>{"likes"=>200.0}}],
      "timeMillis"=>14,
      "counts"=>{"input"=>4, "emit"=>4, "reduce"=>1, "output"=>1},
      "ok"=>1.0,
      "$clusterTime"=>{"clusterTime"=>#<BSON::Timestamp:0x00005633c2c2ad20 @seconds=1590105400, @increment=1>, "signature"=>{"hash"=><BSON::Binary:0x12240 type=generic data=0x0000000000000000...>, "keyId"=>0}},
      "operationTime"=>#<BSON::Timestamp:0x00005633c2c2aaf0 @seconds=1590105400, @increment=1>}

Statistics

MongoDB servers 4.2 and lower provide Map/Reduce execution statistics. As of MongoDB 4.4, Map/Reduce is implemented via the aggregation pipeline and statistics described in this section are not available.

The following methods are provided on the MapReduce object:

  • counts: Number of documents read, emitted, reduced and output through the pipeline.
  • input, emitted, reduced, output: individual count methods. Note that emitted and reduced methods are named differently from hash keys in counts.
  • time: The time, in milliseconds, that Map/Reduce pipeline took to execute.

The following code illustrates retrieving the statistics:

mr = Band.where(:likes.gt => 100).map_reduce(map, reduce).out(inline: 1)

mr.counts
# => {"input"=>4, "emit"=>4, "reduce"=>1, "output"=>1}

mr.input
# => 4

mr.emitted
# => 4

mr.reduced
# => 1

mr.output
# => 1

mr.time
# => 14

Note

Each statistics method invocation re-executes the Map/Reduce pipeline. The results of execution are not stored by Mongoid. Consider using the execute method to retrieve the raw results and obtaining the statistics from the raw results if multiple statistics are desired.