mongodb/doc/map-reduce-example.md

## Map/Reduce Example

This is an example of how to use the mapReduce function to perform map/reduce style aggregation on your data.

### Setup

To start, we'll insert some example data which we can perform map/reduce queries on:

    $ ghci
    ...
    Prelude> :set prompt "> "
    > :set -XOverloadedStrings
    > import Database.MongoDB
    > pipe <- connect $ host "127.0.0.1"
    > let run act = access pipe master "test" act
    > let docs = [ ["x" =: 1, "tags" =: ["dog", "cat"]], ["x" =: 2, "tags" =: ["cat"]], ["x" =: 3, "tags" =: ["mouse", "cat", "dog"]] ]
    > run $ insertMany "mr1" docs

### Basic Map/Reduce

Now we'll define our map and reduce functions to count the number of occurrences for each tag in the tags array, across the entire collection.

Our map function just emits a single (key, 1) pair for each tag in the array:

    > let mapFn = Javascript [] "function() {this.tags.forEach (function(z) {emit(z, 1);});}"

The reduce function sums over all of the emitted values for a given key:

    > let reduceFn = Javascript [] "function (key, values) {var total = 0; for (var i = 0; i < values.length; i++) {total += values[i];} return total;}"

Note: We can't just return values.length as the reduce function might be called iteratively on the results of other reduce steps.

Finally, we run mapReduce, results by default will be return in an array in the result document (inlined):

    > run $ runMR' (mapReduce "mr1" mapFn reduceFn)
    [ results: [[ _id: "cat", value: 3.0],[ _id: "dog", value: 2.0],[ _id: "mouse", value: 1.0]], timeMillis: 379, counts: [ input: 4, emit: 6, reduce: 2, output: 3], ok: 1.0]

Inlining only works if result set < 16MB. An alternative to inlining is outputing to a collection. But what to do if there is data already in the collection from a previous run of the same MapReduce? You have three alternatives in the MRMerge data type: Replace, Merge, and Reduce. See its documentation for details. To output to a collection, set the `rOut` field in `MapReduce`.

	> run $ runMR' (mapReduce "mr1" mapFn reduceFn) {rOut = Output Replace "mr1out" Nothing}
	[ result: "mr1out", timeMillis: 379, counts: [ input: 3, emit: 6, reduce: 2, output: 3], ok: 1.0]

You can now query the mr1out collection to see the result, or run another MapReduce on it! A shortcut for running the map-reduce then querying the result collection right away is `runMR`.

	> run $ rest =<< runMR (mapReduce "mr1" mapFn reduceFn) {rOut = Output Replace "mr1out" Nothing}
	[[ _id: "cat", value: 3.0],[ _id: "dog", value: 2.0],[ _id: "mouse", value: 1.0]]
move Pool to System.IO. update docs 2011-07-12 14:51:54 +00:00			`## Map/Reduce Example`
map/reduce example 2010-03-09 03:37:27 +00:00
move Pool to System.IO. update docs 2011-07-12 14:51:54 +00:00			`This is an example of how to use the mapReduce function to perform map/reduce style aggregation on your data.`
map/reduce example 2010-03-09 03:37:27 +00:00
move Pool to System.IO. update docs 2011-07-12 14:51:54 +00:00			`### Setup`
map/reduce example 2010-03-09 03:37:27 +00:00
move Pool to System.IO. update docs 2011-07-12 14:51:54 +00:00			`To start, we'll insert some example data which we can perform map/reduce queries on:`
map/reduce example 2010-03-09 03:37:27 +00:00
MapReduce updated to work with MongoDB version >= 1.7.4 2011-06-22 21:18:32 +00:00			`$ ghci`
fix map-reduce-example formatting 2010-03-09 04:54:23 +00:00			`...`
			`Prelude> :set prompt "> "`
See V0.5.0-Redesign.md for description of changes in this commit 2010-06-15 03:14:40 +00:00			`> :set -XOverloadedStrings`
fix map-reduce-example formatting 2010-03-09 04:54:23 +00:00			`> import Database.MongoDB`
Make examples compilable 2014-07-07 05:05:03 +00:00			`> pipe <- connect $ host "127.0.0.1"`
move Pool to System.IO. update docs 2011-07-12 14:51:54 +00:00			`> let run act = access pipe master "test" act`
			`> let docs = [ ["x" =: 1, "tags" =: ["dog", "cat"]], ["x" =: 2, "tags" =: ["cat"]], ["x" =: 3, "tags" =: ["mouse", "cat", "dog"]] ]`
			`> run $ insertMany "mr1" docs`

			`### Basic Map/Reduce`

			`Now we'll define our map and reduce functions to count the number of occurrences for each tag in the tags array, across the entire collection.`

			`Our map function just emits a single (key, 1) pair for each tag in the array:`

			`> let mapFn = Javascript [] "function() {this.tags.forEach (function(z) {emit(z, 1);});}"`

			`The reduce function sums over all of the emitted values for a given key:`

			`> let reduceFn = Javascript [] "function (key, values) {var total = 0; for (var i = 0; i < values.length; i++) {total += values[i];} return total;}"`

			`Note: We can't just return values.length as the reduce function might be called iteratively on the results of other reduce steps.`
map/reduce example 2010-03-09 03:37:27 +00:00
MapReduce updated to work with MongoDB version >= 1.7.4 2011-06-22 21:18:32 +00:00			`Finally, we run mapReduce, results by default will be return in an array in the result document (inlined):`
map/reduce example 2010-03-09 03:37:27 +00:00
MapReduce updated to work with MongoDB version >= 1.7.4 2011-06-22 21:18:32 +00:00			`> run $ runMR' (mapReduce "mr1" mapFn reduceFn)`
Make examples compilable 2014-07-07 05:05:03 +00:00			`[ results: [[ _id: "cat", value: 3.0],[ _id: "dog", value: 2.0],[ _id: "mouse", value: 1.0]], timeMillis: 379, counts: [ input: 4, emit: 6, reduce: 2, output: 3], ok: 1.0]`
map/reduce example 2010-03-09 03:37:27 +00:00
move Pool to System.IO. update docs 2011-07-12 14:51:54 +00:00			Inlining only works if result set < 16MB. An alternative to inlining is outputing to a collection. But what to do if there is data already in the collection from a previous run of the same MapReduce? You have three alternatives in the MRMerge data type: Replace, Merge, and Reduce. See its documentation for details. To output to a collection, set the `rOut` field in `MapReduce`.
map/reduce example 2010-03-09 03:37:27 +00:00
MapReduce updated to work with MongoDB version >= 1.7.4 2011-06-22 21:18:32 +00:00			`> run $ runMR' (mapReduce "mr1" mapFn reduceFn) {rOut = Output Replace "mr1out" Nothing}`
Make examples compilable 2014-07-07 05:05:03 +00:00			`[ result: "mr1out", timeMillis: 379, counts: [ input: 3, emit: 6, reduce: 2, output: 3], ok: 1.0]`
See V0.5.0-Redesign.md for description of changes in this commit 2010-06-15 03:14:40 +00:00
MapReduce updated to work with MongoDB version >= 1.7.4 2011-06-22 21:18:32 +00:00			You can now query the mr1out collection to see the result, or run another MapReduce on it! A shortcut for running the map-reduce then querying the result collection right away is `runMR`.
See V0.5.0-Redesign.md for description of changes in this commit 2010-06-15 03:14:40 +00:00
MapReduce updated to work with MongoDB version >= 1.7.4 2011-06-22 21:18:32 +00:00			`> run $ rest =<< runMR (mapReduce "mr1" mapFn reduceFn) {rOut = Output Replace "mr1out" Nothing}`
Make examples compilable 2014-07-07 05:05:03 +00:00			`[[ _id: "cat", value: 3.0],[ _id: "dog", value: 2.0],[ _id: "mouse", value: 1.0]]`