Aggregation

Docs

Aggregations pipeline stages

$match
$sort
$skip
$limit
$count
$facet
$graphLookup
Pipeline stages can appear multiple times in the pipeline with the exception of few like $out, $merge, and $geoNear
Tree structure data model
$lookup (for joins)
It's like left outer join of relational database (can be one to one, one to many)
SYNTAX

{
$lookup: {
from: "otherCollection",
localField: "x",
foreignField: "y",
as: "resultingArray"
}
}
eg:
db.products.aggregate([
{
$lookup:
{
from: 'masterData',
localField:'pid',
//field from the input document, so can be array or single key
foreignField:'pid',
/*field from the documents of the "from" collection,
so target can be foreign primary key/ normal column*/
as: "orderdetails"
}
}]).pretty()

//(.) check where localField is in object array (may be product_ids.id)
First Find then lookup
db.products.aggregate([
{
$match:{
pid:{
$exists :true
},
qty:{
$lte :40
}
}
},
{
$lookup:
{
from: 'masterData',
localField:'pid',
foreignField:'pid',
as: "orderdetails"
}
}]).pretty()
- First Find then lookup with custom condition ($match) & projection ($project)
- [docs](https://docs.mongodb.com/manual/reference/operator/aggregation/ lookup#specify-multiple-join-conditions-with-lookup) //$lookup-> pipeline-> $match - $expr
- $lookup - $match can be also used in sub query concept
  db.products.aggregate([
  {
  $match:{
  pid:{
  $exists :true
  },
  qty:{
  $lte :40
  }
  }
  },
  {
  $lookup:
  {
  from: 'masterData',
  let: { cust_pid: "$pid"},
  pipeline: [
  { $match:
  { $expr:
  { $and:
  [
  { $eq: [ "$pid", "$$cust_pid" ] },
  { $eq: [ "$status", 1 ] }
  ]
  }
  }
  },
  { $project: { item: 1, _id: 0 } }
  ],
  as: "orderdetails"
  }
  }]).pretty()
  Note: All except the $out, $merge, and $geoNear stages can appear multiple times in a pipeline.//so $lookup, $match..etc multiple times docs[join caveat if field doesn't exist](https://docs.mongodb.com/manual/reference/operator/aggregation/lookup/ #perform-a-single-equality-join-with-lookup) //also merging objects if we don't need resultingArray for one to one mapping (don't use - bad syntax)
$group
- Syntax & note points
  {
  $group:
  {
  _id: <group by expression>, // single prop OR object with multi prop
  <field1>: { <accumulator1> : <expression1> },
  ...
  }
  }
  Note :
  - If _id = null, accumulation happens for all the input documents as a whole.
  - some accumulators are $min, $max, $sum, $avg, $first, $last ..etc
  - $group stage has a limit of 100 megabytes of RAM
  - To allow for the handling of large datasets, set .aggregate() option [allowDiskUse](https://docs.mongodb.com/manual/ reference/method/db.collection.aggregate/#db.collection.aggregate) to true
- Count the Number of Documents in a Collection
  Also check collection stats
  use samples
  
  db.companies.aggregate( [
  {
  $group: {
  _id: null,
  count: { $sum: 1 }
  }
  }
  ] )
  o/p:
  
  {
  "_id": null,
  "count": 18801
  }
  Note : To just count - we can use $count. Use above if want to perform other accumulators too on all rows.
  db.companies.aggregate( [
  {
  $count: 'total_rows'
  }
  ] )
  
  { "total_rows" : 18801 }
- Group by single field (distinct example)
  db.companies.aggregate([{
  $group: {
  _id: "$name"
  }
  }])
  o/p:
  
  { "_id" : "EnteGreat" }
  { "_id" : "EnteGreat Solutions" }
  { "_id" : "goBookmaker" }
  { "_id" : "AfterLogic" }
  { "_id" : "Embedster" }
  { "_id" : "Geekdive" }
  { "_id" : "Contemporary Computer Services" }
  { "_id" : "Intergy" }
  { "_id" : "Getyoo" }
  { "_id" : "Eazeeloans" }
  ...
  Note : $name = $ defines passed document to the group stage
- Multi stage aggregation (group by having example)
  db.companies.aggregate(
  [
  // First Stage
  {
  $group :
  {
  _id : "$name",
  totalDayCapacity: { $sum: {
  $multiply: [ "$number_of_employees", 8 ]
  } }
  }
  },
  // Second Stage
  {
  $match: { "totalDayCapacity": { $gte: 100 } }
  }
  ]
  )
  o/p:
  { "_id" : "Eazeeloans", "totalDayCapacity" : 200 }
  { "_id" : "Atlantic BT", "totalDayCapacity" : 240 }
  { "_id" : "UOL (Universo Online)", "totalDayCapacity" : 4000 }
  { "_id" : "Tiny Speck", "totalDayCapacity" : 360 }
  { "_id" : "VSE Corp", "totalDayCapacity" : 15360 }
  { "_id" : "Vertro", "totalDayCapacity" : 400 }
  { "_id" : "SERG", "totalDayCapacity" : 160 }
  ...
- Multi stage aggregation (group by day/ date formatting example) Check date formatters & convert operators link in Project section
  db.companies.aggregate([
  // First Stage
  {
  $match: {
  "created_at": {
  $gte: new ISODate("2007-01-01"),
  $lt: new ISODate("2015-01-01")
  }
  }
  },
  // Second Stage
  {
  $group: {
  _id: {
  $dateToString: {
  format: "%Y-%m-%d",
  date: "$created_at"
  }
  },
  totalNewEmployeesOnDay: {
  $sum: "$number_of_employees"
  },
  averagePerCompanyEmployees: {
  $avg: "$number_of_employees"
  },
  noOfCompanies: {
  $sum: 1
  }
  }
  },
  // Third Stage
  {
  $sort: {
  noOfCompanies: -1
  }
  }
  ])
  o/p:
  
  {
  "_id" : "2007-05-25",
  "totalNewEmployeesOnDay" : 647,
  "averagePerCompanyEmployees" : 323.5,
  "noOfCompanies" : 2
  }
  {
  "_id" : "2007-06-01",
  "totalNewEmployeesOnDay" : 0,
  "averagePerCompanyEmployees" : null,
  "noOfCompanies" : 1
  }
  
  Note: In DB most created_at values are string, so less data
- Group by Field & add other column values in array
  db.companies.aggregate([{
  $match: {
  "created_at": {
  $gte: new ISODate("2007-01-01"),
  $lt: new ISODate("2015-01-01")
  }
  }
  },
  {
  $group: {
  _id: {
  $dateToString: {
  format: "%Y-%m-%d",
  date: "$created_at"
  }
  },
  names: {
  $push: "$name"
  }
  }
  }
  ])
  o/p:
  
  { "_id" : "2007-06-01", "names" : [ "StumbleUpon" ] }
  { "_id" : "2007-05-25", "names" : [ "Wetpaint", "AdventNet" ] }
- Group by Field & add entire row in array
  db.companies.aggregate([{
  $group: {
  _id: '$category_code',
  names: {
  $push: "$$ROOT"
  }
  }
  },
  {
  $limit: 2
  }
  ], {
  allowDiskUse: true
  })
- Group by Field & add extra fields in next stage
  db.companies.aggregate([{
  $group: {
  _id: '$category_code',
  comps: {
  $push: "$$ROOT"
  }
  }
  },
  {
  $addFields: {
  totalEmp: {
  $sum: "$comps.number_of_employees"
  }
  }
  },
  {
  $limit: 2
  }
  ], {
  allowDiskUse: true
  })
  
  Check this query in compass aggregation tab
- Group by Multiple Field
  db.companies.aggregate([{
  $group: {
  _id: {
  category_code: '$category_code',
  founded_year: '$founded_year'
  },
  count: {
  $sum: 1
  }
  }
  },
  {
  $sort: {
  "_id.category_code": -1
  }
  },
  {
  $limit: 10
  }
  ], {
  allowDiskUse: true
  })
  o/p:
  
  {
  "_id" : {
  "category_code" : "web",
  "founded_year" : 1880
  },
  "count" : 1
  }
  {
  "_id" : {
  "category_code" : "web",
  "founded_year" : 1962
  },
  "count" : 1
  }
  {
  "_id" : {
  "category_code" : "web",
  "founded_year" : 1990
  },
  "count" : 2
  }
  ...
$project
- Syntax & Notes Either specify exactly which has to be included/ which has to be excluded Find projection (MD)
  { $project: { "<field1>": 0, "<field2>": 0, ... } }
  // Return all but the specified fields
- Conditionally Exclude Fields
  db.books.aggregate( [
  {
  $project: {
  title: 1,
  "author.first": 1,
  "author.last" : 1,
  "author.middle": {
  $cond: {
  if: { $eq: [ "", "$author.middle" ] },
  then: "$$REMOVE",
  else: "$author.middle"
  }
  }
  }
  }
  ] )
- Computed fields (Project new object example)
  Consider following
  {
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5
  }
  db.books.aggregate(
  [
  {
  $project: {
  title: 1,
  isbn: {
  prefix: { $substr: [ "$isbn", 0, 3 ] },
  group: { $substr: [ "$isbn", 3, 2 ] },
  publisher: { $substr: [ "$isbn", 5, 4 ] },
  title: { $substr: [ "$isbn", 9, 3 ] },
  checkDigit: { $substr: [ "$isbn", 12, 1] }
  },
  lastName: "$author.last",
  copiesSold: "$copies"
  }
  }
  ]
  )
  o/p:
  {
  "_id" : 1,
  "title" : "abc123",
  "isbn" : {
  "prefix" : "000",
  "group" : "11",
  "publisher" : "2222",
  "title" : "333",
  "checkDigit" : "4"
  },
  "lastName" : "zzz",
  "copiesSold" : 5
  }
- Project New Array Fields
  Consider following
  {
  "_id" : 1,
  title: "abc123",
  isbn: "0001122223334",
  author: { last: "zzz", first: "aaa" },
  copies: 5
  }
  db.collection.aggregate( [ { $project: { myArray: [ "$title", "$isbn" ] } } ] )
$unwind
Deconstructs an array field from the input documents to output a document for each element.
- Normal $push
  consider
  {
  "_id": 3,
  "height": 480,
  "width": 640,
  "tags": ["kittens", "travel"]
  }
  ...
  use sample_pictures
  
  db.images.aggregate([
  {
  $limit:50
  },
  {
  $group: {
  _id: '$height',
  tags: {
  $push: '$tags'
  }
  }
  }])
  o/p:
  {
  "_id": 480,
  "tags": [ //array of arrays
  [
  "kittens",
  "travel"
  ],
  [
  "dogs",
  "kittens",
  "work"
  ],
  [
  "dogs",
  "work"
  ],
  ]
  }
  ...
- Just $unwind
  use sample_pictures
  
  db.images.aggregate([
  {
  $unwind: '$tags'
  }])
  o/p: (single document splitted like cross join)
  { "_id" : 3, "height" : 480, "width" : 640, "tags" : "kittens" }
  { "_id" : 3, "height" : 480, "width" : 640, "tags" : "travel" }
  ...
- $unwind then $push
  use sample_pictures
  
  db.images.aggregate([
  {
  $limit:50
  },
  {
  $unwind:'$tags'
  },
  {
  $group: {
  _id: '$height',
  tags: {
  $push: '$tags'
  }
  }
  }])
  o/p:
  {
  "_id": 480,
  "tags": [
  
  "kittens",
  "travel",
  "dogs",
  "kittens",
  "work",
  "dogs",
  "work"
  ]
  }
  ...
- $unwind then $addToSet (to avoid duplicate)
  use sample_pictures
  
  db.images.aggregate([
  {
  $limit:50
  },
  {
  $unwind:'$tags'
  },
  {
  $group: {
  _id: '$height',
  tags: {
  $addToSet: '$tags'
  }
  }
  }])
  o/p:
  {
  "_id": 480,
  "tags": [
  "work",
  "kittens",
  "travel",
  "dogs"
  ]
  }
  ...
- Check docs for unwinding array with in object array example
  In simple - say $items = objArr && tags = arr in that object
  
  // First Stage
  { $unwind: "$items" },
  
  // Second Stage
  { $unwind: "$items.tags" },
$bucket && $bucketAuto
- Categorizes incoming documents into (groups) buckets based on a specified bucket boundaries.
- $bucket
  - Syntax
    {
    $bucket: {
    groupBy: <expression>,
    boundaries: [ <lowerbound1>, <lowerbound2>, ... ],
    default: <literal>,
    output: {
    <output1>: { <$accumulator expression> },
    ...
    <outputN>: { <$accumulator expression> }
    }
    }
    }
  - Note :
    - An array of [ 0, 5, 10 ] creates two buckets:
    - [0, 5) with inclusive lower bound 0 and exclusive upper bound 5.
    - [5, 10) with inclusive lower bound 5 and exclusive upper bound 10.
    - default property is optional if all values fall under specified bucket boundaries.
  - Example:
    use samples
    
    db.restaurants.aggregate([{
    $bucket: {
    groupBy: '$rating',
    boundaries: [0, 3, 5, 10],
    default: "Other",
    output: {
    "count": {
    $sum: 1
    },
    "ratings": {
    $addToSet: "$rating"
    }
    }
    }
    }])
    output:
    { "_id" : 0, "count" : 23, "ratings" : [ 2, 1.5, 1, 2.5 ] }
    { "_id" : 3, "count" : 706, "ratings" : [ 3.5, 4, 3, 4.5 ] }
    { "_id" : 5, "count" : 1756, "ratings" : [ 5.5, 5, 6 ] }
    { "_id" : "Other", "count" : 63, "ratings" : [ "Not yet rated" ] }
- $bucketAuto
  Bucket boundaries are automatically determined in an attempt to evenly distribute the documents.
  db.restaurants.aggregate([{
  $bucketAuto: {
  groupBy: '$rating',
  buckets: 4,
  output: {
  "count": {
  $sum: 1
  },
  "ratings": {
  $addToSet: "$rating"
  }
  }
  }
  }])
  output:
  {
  "_id": {
  "min": 1,
  "max": 5
  },
  "count": 729,
  "ratings": [4.5, 3, 2.5, 3.5, 2, 1.5, 4, 1 ]
  } {
  "_id": {
  "min": 5,
  "max": 5.5
  },
  "count": 1107,
  "ratings": [ 5 ]
  } {
  "_id": {
  "min": 5.5,
  "max": "Not yet rated"
  },
  "count": 649,
  "ratings": [ 6, 5.5 ]
  } {
  "_id": {
  "min": "Not yet rated",
  "max": "Not yet rated"
  },
  "count": 63,
  "ratings": [ "Not yet rated" ]
  }
$out (vs) $merge
(Not practiced) //check for scoped temporary tables (like #localTempTable of SQL, instead ##globalTempTable)
- $out Documents returned by the aggregation pipeline are written to a specified collection.
- Use $merge. It is $out + missing features. But MongoDB 4.2 version
  - Syntax
    { $merge: {
    into: <collection> -or- { db: <db>, coll: <collection> },
    on: <identifier field> -or- [ <identifier field1>, ...],//Optional
    let: <variables>, //Optional
    whenMatched: <replace|keepExisting|merge|fail|pipeline>,//Optional
    whenNotMatched: <insert|discard|fail> //Optional
    } }
  - Syntax 1 (If using all default options for $merge )
    { $merge: <collection> } // Output collection is in the same database
  - Example:
    {
    $merge: {
    into: "myOutput",
    on: "_id",
    whenMatched: "replace",
    whenNotMatched: "insert"
    }
    }
  - Check Restrictions
  - For performance complex queries can run in night & create temp tables, let product query temp tables. Have TimeToLiveIndex so that previous day records get deleted.

Main Operators

Docs

$substr
$toUpper
$toLower
$trim //$ltrim and $rtrim also exists
$toDouble
$subtract
$round
$multiply
$strLenCP
$concat //can have expresion too (check word doc notes)
$dateToString
$dateFromString
$dateToParts
$convert
$cond //sql case expression
$slice
Array slice(startIndex, noOfElements)
Consider
{
"_id": 2,
"height": 480,
"width": 640,
"tags": ["dogs", "kittens", "work"]
}
Use sample_pictures

db.images.aggregate([{
$match: {
_id: 2
}
},
{
$project: {
tags: {
$slice: ["$tags", 1, 1]
}
}
}
]).pretty();
o/p:
{ "_id" : 2, "tags" : [ "kittens" ] }
$filter
(Array filter)
consider

{
"_id": 2,
"height": 480,
"width": 640,
"tags": ["dogs", "kittens", "work"]
}
{ $filter: { input: <array>, as: <string>, cond: <expression> } }
- as = local variable that represents individual element of the input array. default is 'this'
  db.images.aggregate([{
  $match: {
  _id: 2
  }
  },
  {
  $project: {
  tags: {
  $filter: {
  input: "$tags",
  as: "myTag",
  cond: {
  $eq: ['$$myTag', 'dogs']
  } /*two $$ in '' to identify local/
  special variable value. one $ is field value.*/
  }
  }
  }
  }
  ]).pretty();
  o/p:
  { "_id" : 2, "tags" : [ "dogs" ] }
$size
use sample_pictures

db.images.aggregate([{
$match: {
_id: 2
}
},
{
$project: {
tagCount: {
$size: '$tags'
}
}
}
]).pretty();
o/p:
{ "_id" : 2, "tagCount" : 3 }

Accumulators

Docs

$first
Some accumulators are $min, $max, $sum, $avg, $first, $last, $addToSet, $push ..etc

Geo spatial

Docs

Note : Aggregation returns a cursor

#Aggregations pipeline stages

#Pipeline stages can appear multiple times in the pipeline with the exception of few like $out, $merge, and $geoNear

#$bucket && $bucketAuto

#$out (vs) $merge

#Main Operators

#$trim //$ltrim and $rtrim also exists

#$concat //can have expresion too (check word doc notes)

#$cond //sql case expression

#Accumulators

#Geo spatial

Aggregations pipeline stages

Pipeline stages can appear multiple times in the pipeline with the exception of few like $out, $merge, and $geoNear

$bucket && $bucketAuto

$out (vs) $merge

Main Operators

$trim //$ltrim and $rtrim also exists

$concat //can have expresion too (check word doc notes)

$cond //sql case expression

Accumulators

Geo spatial