Aggregation
Docs
Aggregations pipeline stages
$match
$sort
$skip
$limit
$count
$facet
$graphLookup
Pipeline stages can appear multiple times in the pipeline with the exception of few like $out, $merge, and $geoNear
Tree structure data model
$lookup (for joins)
It's like left outer join of relational database (can be one to one, one to many)
SYNTAX{$lookup: {from: "otherCollection",localField: "x",foreignField: "y",as: "resultingArray"}}eg:db.products.aggregate([{$lookup:{from: 'masterData',localField:'pid',//field from the input document, so can be array or single keyforeignField:'pid',/*field from the documents of the "from" collection,so target can be foreign primary key/ normal column*/as: "orderdetails"}}]).pretty()//(.) check where localField is in object array (may be product_ids.id)
First Find then lookup
db.products.aggregate([{$match:{pid:{$exists :true},qty:{$lte :40}}},{$lookup:{from: 'masterData',localField:'pid',foreignField:'pid',as: "orderdetails"}}]).pretty()- First Find then lookup with custom condition ($match) & projection ($project)
- [docs](https://docs.mongodb.com/manual/reference/operator/aggregation/ lookup#specify-multiple-join-conditions-with-lookup) //$lookup-> pipeline-> $match - $expr
- $lookup - $match can be also used in sub query conceptNote: All except the $out, $merge, and $geoNear stages can appear multiple times in a pipeline.//so $lookup, $match..etc multiple times docs[join caveat if field doesn't exist](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/ #perform-a-single-equality-join-with-lookup) //also merging objects if we don't need resultingArray for one to one mapping (don't use - bad syntax)db.products.aggregate([{$match:{pid:{$exists :true},qty:{$lte :40}}},{$lookup:{from: 'masterData',let: { cust_pid: "$pid"},pipeline: [{ $match:{ $expr:{ $and:[{ $eq: [ "$pid", "$$cust_pid" ] },{ $eq: [ "$status", 1 ] }]}}},{ $project: { item: 1, _id: 0 } }],as: "orderdetails"}}]).pretty()
$group
Syntax & note points
{$group:{_id: <group by expression>, // single prop OR object with multi prop<field1>: { <accumulator1> : <expression1> },...}}Note :
- If _id = null, accumulation happens for all the input documents as a whole.
- some accumulators are $min, $max, $sum, $avg, $first, $last ..etc
- $group stage has a limit of 100 megabytes of RAM
- To allow for the handling of large datasets, set .aggregate() option [allowDiskUse](https://docs.mongodb.com/manual/ reference/method/db.collection.aggregate/#db.collection.aggregate) to true
Count the Number of Documents in a Collection
Also check collection stats
use samplesdb.companies.aggregate( [{$group: {_id: null,count: { $sum: 1 }}}] )o/p:{"_id": null,"count": 18801}Note : To just count - we can use $count. Use above if want to perform other accumulators too on all rows.
db.companies.aggregate( [{$count: 'total_rows'}] ){ "total_rows" : 18801 }Group by single field (distinct example)
db.companies.aggregate([{$group: {_id: "$name"}}])o/p:{ "_id" : "EnteGreat" }{ "_id" : "EnteGreat Solutions" }{ "_id" : "goBookmaker" }{ "_id" : "AfterLogic" }{ "_id" : "Embedster" }{ "_id" : "Geekdive" }{ "_id" : "Contemporary Computer Services" }{ "_id" : "Intergy" }{ "_id" : "Getyoo" }{ "_id" : "Eazeeloans" }...Note : $name = $ defines passed document to the group stage
Multi stage aggregation (group by having example)
db.companies.aggregate([// First Stage{$group :{_id : "$name",totalDayCapacity: { $sum: {$multiply: [ "$number_of_employees", 8 ]} }}},// Second Stage{$match: { "totalDayCapacity": { $gte: 100 } }}])o/p:{ "_id" : "Eazeeloans", "totalDayCapacity" : 200 }{ "_id" : "Atlantic BT", "totalDayCapacity" : 240 }{ "_id" : "UOL (Universo Online)", "totalDayCapacity" : 4000 }{ "_id" : "Tiny Speck", "totalDayCapacity" : 360 }{ "_id" : "VSE Corp", "totalDayCapacity" : 15360 }{ "_id" : "Vertro", "totalDayCapacity" : 400 }{ "_id" : "SERG", "totalDayCapacity" : 160 }...Multi stage aggregation (group by day/ date formatting example) Check date formatters & convert operators link in Project section
db.companies.aggregate([// First Stage{$match: {"created_at": {$gte: new ISODate("2007-01-01"),$lt: new ISODate("2015-01-01")}}},// Second Stage{$group: {_id: {$dateToString: {format: "%Y-%m-%d",date: "$created_at"}},totalNewEmployeesOnDay: {$sum: "$number_of_employees"},averagePerCompanyEmployees: {$avg: "$number_of_employees"},noOfCompanies: {$sum: 1}}},// Third Stage{$sort: {noOfCompanies: -1}}])o/p:{"_id" : "2007-05-25","totalNewEmployeesOnDay" : 647,"averagePerCompanyEmployees" : 323.5,"noOfCompanies" : 2}{"_id" : "2007-06-01","totalNewEmployeesOnDay" : 0,"averagePerCompanyEmployees" : null,"noOfCompanies" : 1}Note: In DB most created_at values are string, so less dataGroup by Field & add other column values in array
db.companies.aggregate([{$match: {"created_at": {$gte: new ISODate("2007-01-01"),$lt: new ISODate("2015-01-01")}}},{$group: {_id: {$dateToString: {format: "%Y-%m-%d",date: "$created_at"}},names: {$push: "$name"}}}])o/p:{ "_id" : "2007-06-01", "names" : [ "StumbleUpon" ] }{ "_id" : "2007-05-25", "names" : [ "Wetpaint", "AdventNet" ] }Group by Field & add entire row in array
db.companies.aggregate([{$group: {_id: '$category_code',names: {$push: "$$ROOT"}}},{$limit: 2}], {allowDiskUse: true})Group by Field & add extra fields in next stage
db.companies.aggregate([{$group: {_id: '$category_code',comps: {$push: "$$ROOT"}}},{$addFields: {totalEmp: {$sum: "$comps.number_of_employees"}}},{$limit: 2}], {allowDiskUse: true})Check this query in compass aggregation tab
Group by Multiple Field
db.companies.aggregate([{$group: {_id: {category_code: '$category_code',founded_year: '$founded_year'},count: {$sum: 1}}},{$sort: {"_id.category_code": -1}},{$limit: 10}], {allowDiskUse: true})o/p:{"_id" : {"category_code" : "web","founded_year" : 1880},"count" : 1}{"_id" : {"category_code" : "web","founded_year" : 1962},"count" : 1}{"_id" : {"category_code" : "web","founded_year" : 1990},"count" : 2}...
$project
Syntax & Notes Either specify exactly which has to be included/ which has to be excluded Find projection (MD)
{ $project: { "<field1>": 0, "<field2>": 0, ... } }// Return all but the specified fieldsConditionally Exclude Fields
db.books.aggregate( [{$project: {title: 1,"author.first": 1,"author.last" : 1,"author.middle": {$cond: {if: { $eq: [ "", "$author.middle" ] },then: "$$REMOVE",else: "$author.middle"}}}}] )Computed fields (Project new object example)
Consider following
{"_id" : 1,title: "abc123",isbn: "0001122223334",author: { last: "zzz", first: "aaa" },copies: 5}db.books.aggregate([{$project: {title: 1,isbn: {prefix: { $substr: [ "$isbn", 0, 3 ] },group: { $substr: [ "$isbn", 3, 2 ] },publisher: { $substr: [ "$isbn", 5, 4 ] },title: { $substr: [ "$isbn", 9, 3 ] },checkDigit: { $substr: [ "$isbn", 12, 1] }},lastName: "$author.last",copiesSold: "$copies"}}])o/p:{"_id" : 1,"title" : "abc123","isbn" : {"prefix" : "000","group" : "11","publisher" : "2222","title" : "333","checkDigit" : "4"},"lastName" : "zzz","copiesSold" : 5}Project New Array Fields
Consider following
{"_id" : 1,title: "abc123",isbn: "0001122223334",author: { last: "zzz", first: "aaa" },copies: 5}db.collection.aggregate( [ { $project: { myArray: [ "$title", "$isbn" ] } } ] )
$unwind
Deconstructs an array field from the input documents to output a document for each element.
Normal $push
consider{"_id": 3,"height": 480,"width": 640,"tags": ["kittens", "travel"]}...use sample_picturesdb.images.aggregate([{$limit:50},{$group: {_id: '$height',tags: {$push: '$tags'}}}])o/p:{"_id": 480,"tags": [ //array of arrays["kittens","travel"],["dogs","kittens","work"],["dogs","work"],]}...Just $unwind
use sample_picturesdb.images.aggregate([{$unwind: '$tags'}])o/p: (single document splitted like cross join){ "_id" : 3, "height" : 480, "width" : 640, "tags" : "kittens" }{ "_id" : 3, "height" : 480, "width" : 640, "tags" : "travel" }...$unwind then $push
use sample_picturesdb.images.aggregate([{$limit:50},{$unwind:'$tags'},{$group: {_id: '$height',tags: {$push: '$tags'}}}])o/p:{"_id": 480,"tags": ["kittens","travel","dogs","kittens","work","dogs","work"]}...$unwind then $addToSet (to avoid duplicate)
use sample_picturesdb.images.aggregate([{$limit:50},{$unwind:'$tags'},{$group: {_id: '$height',tags: {$addToSet: '$tags'}}}])o/p:{"_id": 480,"tags": ["work","kittens","travel","dogs"]}...Check docs for unwinding array with in object array example
In simple - say $items = objArr && tags = arr in that object// First Stage{ $unwind: "$items" },// Second Stage{ $unwind: "$items.tags" },
$bucket && $bucketAuto
Categorizes incoming documents into (groups) buckets based on a specified bucket boundaries.
Syntax
{$bucket: {groupBy: <expression>,boundaries: [ <lowerbound1>, <lowerbound2>, ... ],default: <literal>,output: {<output1>: { <$accumulator expression> },...<outputN>: { <$accumulator expression> }}}}Note :
- An array of [ 0, 5, 10 ] creates two buckets:
- [0, 5) with inclusive lower bound 0 and exclusive upper bound 5.
- [5, 10) with inclusive lower bound 5 and exclusive upper bound 10.
- default property is optional if all values fall under specified bucket boundaries.
Example:
use samplesdb.restaurants.aggregate([{$bucket: {groupBy: '$rating',boundaries: [0, 3, 5, 10],default: "Other",output: {"count": {$sum: 1},"ratings": {$addToSet: "$rating"}}}}])output:{ "_id" : 0, "count" : 23, "ratings" : [ 2, 1.5, 1, 2.5 ] }{ "_id" : 3, "count" : 706, "ratings" : [ 3.5, 4, 3, 4.5 ] }{ "_id" : 5, "count" : 1756, "ratings" : [ 5.5, 5, 6 ] }{ "_id" : "Other", "count" : 63, "ratings" : [ "Not yet rated" ] }
Bucket boundaries are automatically determined in an attempt to evenly distribute the documents.
db.restaurants.aggregate([{$bucketAuto: {groupBy: '$rating',buckets: 4,output: {"count": {$sum: 1},"ratings": {$addToSet: "$rating"}}}}])output:{"_id": {"min": 1,"max": 5},"count": 729,"ratings": [4.5, 3, 2.5, 3.5, 2, 1.5, 4, 1 ]} {"_id": {"min": 5,"max": 5.5},"count": 1107,"ratings": [ 5 ]} {"_id": {"min": 5.5,"max": "Not yet rated"},"count": 649,"ratings": [ 6, 5.5 ]} {"_id": {"min": "Not yet rated","max": "Not yet rated"},"count": 63,"ratings": [ "Not yet rated" ]}
$out (vs) $merge
(Not practiced) //check for scoped temporary tables (like #localTempTable of SQL, instead ##globalTempTable)
$out Documents returned by the aggregation pipeline are written to a specified collection.
Use $merge. It is $out + missing features. But MongoDB 4.2 version
Syntax
{ $merge: {into: <collection> -or- { db: <db>, coll: <collection> },on: <identifier field> -or- [ <identifier field1>, ...],//Optionallet: <variables>, //OptionalwhenMatched: <replace|keepExisting|merge|fail|pipeline>,//OptionalwhenNotMatched: <insert|discard|fail> //Optional} }Syntax 1 (If using all default options for $merge )
{ $merge: <collection> } // Output collection is in the same databaseExample:
{$merge: {into: "myOutput",on: "_id",whenMatched: "replace",whenNotMatched: "insert"}}For performance complex queries can run in night & create temp tables, let product query temp tables. Have TimeToLiveIndex so that previous day records get deleted.
Main Operators
Docs
$substr
$toUpper
$toLower
$trim //$ltrim and $rtrim also exists
$toDouble
$subtract
$round
$multiply
$strLenCP
$concat //can have expresion too (check word doc notes)
$dateToString
$dateFromString
$dateToParts
$convert
$cond //sql case expression
$slice
Array slice(startIndex, noOfElements)
Consider
{"_id": 2,"height": 480,"width": 640,"tags": ["dogs", "kittens", "work"]}Use sample_pictures
db.images.aggregate([{$match: {_id: 2}},{$project: {tags: {$slice: ["$tags", 1, 1]}}}]).pretty();o/p:{ "_id" : 2, "tags" : [ "kittens" ] }$filter
(Array filter)
consider{"_id": 2,"height": 480,"width": 640,"tags": ["dogs", "kittens", "work"]}{ $filter: { input: <array>, as: <string>, cond: <expression> } }as = local variable that represents individual element of the input array. default is 'this'
db.images.aggregate([{$match: {_id: 2}},{$project: {tags: {$filter: {input: "$tags",as: "myTag",cond: {$eq: ['$$myTag', 'dogs']} /*two $$ in '' to identify local/special variable value. one $ is field value.*/}}}}]).pretty();o/p:{ "_id" : 2, "tags" : [ "dogs" ] }
$size
use sample_picturesdb.images.aggregate([{$match: {_id: 2}},{$project: {tagCount: {$size: '$tags'}}}]).pretty();o/p:{ "_id" : 2, "tagCount" : 3 }
Accumulators
Docs
- $first
- Some accumulators are $min, $max, $sum, $avg, $first, $last, $addToSet, $push ..etc
Geo spatial
Docs
- Mongo DB tries to optimize aggregation pipeline
- Aggregation pipeline operators + stages
- Aggregation pipeline operators
- https://docs.mongodb.com/manual/aggregation/#map-reduce
Note : Aggregation returns a cursor