# What is MongoDB Sharding?

2K

Let us understand the need for and importance of database sharding with the help of an example.

Assume there is a database collection (with no sharding) of 50,000 employees working in a reputed Multi-National Company. The company maintains a database record of all the freshers and experienced employees in a single database.

Your task is to access a profile of an employee from the 50000-employee database. Without sharding the database, finding the result will take a lot of effort and time. The search pattern will follow a step-by-step approach and need 50,000 transactions to display the details of a single piece of data you are looking for in the database.

Going back to the same example, we now divide the total number of employees into sub-divisions such as freshers, experienced, job-profile, and other sections to make the search easier. For instance, if there are a total of 15000 Software-Developers working in the company, and you want the details of a particular developer, then the database will look only into the 15000 sub-divisions instead of looking through the entire database.

## 1. Spot the difference?

The database on the second scenario looks more organized, simplified, and clean after database sharding. Right? That’s where database sharding comes into the picture. The idea behind database sharding is to simplify the task into smaller divisions to reuse the data in a tech-savvy and efficient way. Technically, database sharding streamlines the searching process and attempts to find the search item from the list in the first go, thus saving time.

## 2. What is Sharding?

Database sharding is a data distribution process and stores a single data set into multiple databases. The purpose of database distribution is to enhance the scalability of applications. Sharding is an excellent way to keep the data safe across different resources. In MongoDB, database sharding is achievable by breaking down big data sets into sub-divided data sets across multiple MongoDB instances.

Attention: MongoDB uses database sharding for deployment support, especially when there are high-volume data sets that are relatively increased throughput operations. It is also important to note that each shard is an independent database, and all the shards consist of a single local database.

## 3. Sharded Cluster

Sharded Cluster is a group of MongoDB instances. In simple words, these are a set of nodes that comprise MongoDB deployment. A sharded cluster has three main components:

• A Shard: A shard is a single MongoDB instance that holds a subset of the sharded data. Each shard can be a replica set or a single mongos instance.
• Config server: Config servers store the metadata for a sharded cluster. It includes the set of chunks on the individual shard and also the range defining the chunks.
• Mongos instances: Mongos instances cache the data and route read and write operations to the right shards. Moreover, they also update the cache when metadata changes for the cluster.

## 4. Shard Keys

On sharding a MongoDB dataset, a shard key is automatically created by default. The shard key can be in the form of an indexed field or indexed compound fields that will be used to distribute the data among the shards. Generally, the “shard key” is used to distribute the MongoDB collection’s documents across all the shards, where the key consists of a single field or multiple fields in every document.

MongoDB divides the range of shard key values into non-overlapping ranges of shard key values, where every range is linked with a chunk. Specifically, MongoDB tries to break down chunks in an even fashion among the different shards present in the cluster.

A shard key can be used to distribute data in the following

• Hashed
• Range
• Zones

## 5. Balancer and Even Chunk Distribution

The balancer is a process that holds the responsibility of distributing the chunks evenly among the different shards. There is a balance specifier for each cluster that handles the chunk distribution. The balance specifier take care of running the primary job and even distributing chunks across all shards evenly. The process of this type of chunk distribution carried out evenly is popularly known as even chunk distribution.

The fundamental idea behind database sharding is to break complex data into subparts for easy accessibility anytime, anywhere. Check out the advantages of sharding a database:

### 1. Increased Storage capacity

In database sharding, when data gets distributed across the shards in the cluster, each shard contains a subset of the total data in the cluster. On increasing the data volume, the additional shards grow which leads to expanding the cluster storage capacity.

### 2. High Availability

With an unsharded database, an outage in one database shard has the caliber to deteriorate the entire application and loosen its functionality or even stop. However, with a sharded database, if there is complete unavailability of one or more shard replicas, only a few parts of the application or website are unavailable to some users. However, the other shards continue their operation without any concern.

In MongoDB, the read and write workloads are easily distributed across the shards in the sharded cluster. It allows each shard to process a subset of the cluster operation. Both the read and write performance can be directly scaled horizontally across the cluster by increasing the shard count.

### 4. Facilitates horizontal scaling

One more reason programmers love database sharding is that it facilitates horizontal scaling (also renowned as scaling out). That means it allows to have parallel backends and carry out tasks simultaneously with no hassle. Whether the focus is on writing or reading operations, scaling out can add a big advantage to enhance the performance and also eliminate complexities.

### 5. Speedier query response

Whenever you submit a query on an unsharded database, it looks for the searched query in all the rows and columns of the table until it finds the searched query. For low-volume data, it may look insignificant, but it becomes problematic with a high-volume database. Unlike the unsharded database, the sharded database distributes the database into sub-sections where queries have to go to fewer rows, and the results are thus quick and efficient.

## 7. Sharded and Non-Sharded Collections

A database collection is not always uniform. That means the database will have a mixture of both sharded as well as unsharded collections of data.

Sharded Collection: A collection of data that are broken down in the cluster and are well partitioned is called a sharded collection.

Non-Sharded Collection: The database collection stored on a primary shard (the shard carrying all the un-sharded collection) is known as a non-sharded collection.

## 8. Connecting to a Sharded Cluster

For connecting to a sharded cluster, you need to connect to the sharded router using the mongos process. That means you have to join the mongos router with collections (sharded and unsharded) in the sharded cluster. Never make the mistake of connecting to every individual shard for performing read and write operations.

## 9. Sharding Strategy

For the distribution of data across the shared clusters, the MongoDB sharding follows the following strategies:

1. Hash-based Sharding
2. Range based Sharding
3. Directory-based Sharding
4. Geo-based Sharding

## 10. Hash-based Sharding

Hash-based database sharding is also known as key-based sharding. Here values are taken from newly registered data into the database and plugged into the hash function. Key-value, or we can call it the hash value, is the shard ID that determines the location of incoming or the registered data. Make sure to keep the values on the hash function in a sequential arrangement so that there is no mismatch of value and the shard.

## 11. Range-based sharding

Ranged sharding involves data distribution based on the ranges of the given shard values. For instance, there is a collection of data storing the inventory details the products will get placed based on the volume of data availability. The biggest drawback of range-based sharding is that it needs a lookup table for reading and write queries, so it may retard the application performance.

## 12. Directory-based Sharding

Directory-based sharding is a sharding strategy used to maintain a record of shard data. There is a lookup table (also called location service), where it stores the sharded key and tracks all the data entries. Using the shard and key pair, the client engine takes consultation from the location service and then switches to a specific shard to proceed for further work.

## 13. Geo-based Sharding

Geo-based sharding has similarities to that of range-based database sharding, with the only difference that queries here are geographically based. The data procession is down with a shard that corresponds to the user region under the range of 100 miles. The perfect example is Tinder, a dating app that uses Geo-Based sharding to keep balancing the production load of the geo-shards.

## 14. Considerations Before Sharding

The perks of data sharding may impress you. However, there are many factors that need attention, else you may have to pay the price of data loss or damage. There are a few considerations you must focus on before proceeding with the database sharding:

1. Before database sharding, keep in mind different aspects like planning, execution, and maintenance. Make sure to have a bird’s-eye view of all the sharded cluster infrastructure requirements and complexities involved.
2. Be cautious when dealing with data collection, especially with the sharded database collection. Mind you, once you have shared a database collection, there is no way to undo it. Simply put, MongoDB does not permit unsharding after sharding database collection.
3. The choice of Shard key you make for sharding plays a significant role in cluster behavior, overall efficiency, and performance. Make sure to check the cardinality, frequency, and monotonicity of the shard key properly. Do not miss to check the shard key limitation.
4. Operational requirements and restrictions of database sharding are also hard to ignore.

## 15. Zones in Sharded Clusters

Before we dive into the MongoDB zone, let us focus on understanding a zone.

A group of shards with a particular set of tags is commonly known as a zone. MongoDB zones available in shading allow distributing chunks based on chunks across shards. All the work, read and write documentation within a zone is done on shards matching the zone. When creating sharded data zones in the sharded clusters, you can link one or more shards in the cluster. Best of all, you can freely associate a shard with any number of zones. Just keep in mind that whenever there is a balanced cluster, migration of chunks in MongoDB takes place such that only those shards associated with the zone get migrated, covered by the zone.

Attention: MongoDB routes reads and writes falling into a zone range only to those shards inside the sharded cluster zone. Shard zones are easily manageable. All the basic operations like creating a zone layer, adding or eliminating shard from the zone, or overviewing existing zones are possible.

## 16. Collations in Sharding

A group of transactions belonging to a single shard is known as collations. It consists of a transaction list and a collation header. The collation header comprises information submitted to the main chain, and the transaction list is the sequence of transactions.

Try using the shard Collection command along with the collation: { locale: "simple" } option to shard a collection with a default collation.

### 1. Change Streams

It becomes difficult for applications to respond to sudden changes.

From the upgraded MongoDB version 3.6, change streams enable applications to simplify the real-time data changes by leveraging MongoDB functionalities. That means applications can get data accessibility without the cost of tailing the operations log. Change streams come with robust and dynamic features like the total ordering that enables applications to receive changes sequentially as applied to the database.

### 2. Transactions

The organized way of representing the change of state is known as transactions. Ideally, there are four properties called ACID:

1. Atomic – The overall transaction gets committed, or there is no transaction at all.
2. Consistent – The database must be consistent before and after the transaction.
3. Isolated – No-one gets to see any part of the transaction until it is committed.
4. Durable – Even if there is a system failure or a restart, there is no change on the saved data.

MongoDB supports multi-document transactions. The MongoDB version supports 4.0, multi-document transactions on replica sets, whereas the upgraded Mongo version 4.2, supports multi-document transactions on replica sets and the sharded clusters.

Wrapping Up

Database sharding facilitates horizontal scaling and is a more effective way to speed up operational efficiency. Besides, sharding databases simplify the data-management and maintenance procedures. Perhaps, not all databases support database sharding. Worst of all, the sharded database cannot get unsharded. The biggest concern comes when dealing with complex data, especially when there is a data pull from multiple resources. Be careful and attentive, and remember the listed considerations mentioned above.  As a gentle reminder, database sharding will only turn to your advantage if you know to use them effectively. Otherwise, if not done the right way, you might corrupt tables and even lead to data loss.

### Abhresh Sugandhi

Author

Abhresh is specialized as a corporate trainer, He has a decade of experience in technical training blended with virtual webinars and instructor-led session created courses, tutorials, and articles for organizations. He is also the founder of Nikasio.com, which offers multiple services in technical training, project consulting, content development, etc.

## Handling React Events - A Detailed Guide

5355
Handling React Events - A Detailed Guide

Event handling essentially allows the user to inte... Read More

## MongoDB Query Document Using Find() With Example

MongoDB's find() method selects documents from a collection or view and returns a cursor to those documents. There are two parameters in this formula: query and projection.Query – This is an optional parameter that specifies the criteria for selection. In simple terms, a query is what you want to search for within a collection.Projection – This is an optional parameter that specifies what should be returned if the query criteria are satisfied. In simple terms, it is a type of decision-making that is based on a set of criteria.MongoDB's Flexible SchemaA NoSQL database, which stands for "not only SQL," is a way of storing and retrieving data that is different from relational databases' traditional table structures (RDBMS).When storing large amounts of unstructured data with changing schemas, NoSQL databases are indeed a better option than RDBMS. Horizontal scaling properties of NoSQL databases allow them to store and process large amounts of data.These are intended for storing, retrieving, and managing document-oriented data, which is frequently stored in JSON format (JavaScript Object Notation). Document databases, unlike RDBMSs, have a flexible schema that is defined by the contents of the documents.MongoDB is one of the most widely used open-source NoSQL document databases. MongoDB is known as a 'schemaless' database because it does not impose a specific structure on documents in a collection.MongoDB is compatible with a number of popular programming languages. It also offers a high level of operational flexibility because it scales well horizontally, allowing data to be spread or 'sharded' across multiple commodity servers with the ability to add more servers as needed. MongoDB can be run on a variety of platforms, including developer laptops, private clouds, and public clouds.Querying documents using find()MongoDB queries are used to retrieve or fetch data from a MongoDB database. When running a query, you can use criteria or conditions to retrieve specific data from the database.The function db.collection is provided by MongoDB. find() is a function that retrieves documents from a MongoDB database.In MongoDB, the find method is used to retrieve a specific document from the MongoDB collection. In Mongo DB, there are a total of six methods for retrieving specific records.find()findAndModify()findOne()findOneAndDelete()findOneAndReplace()findOneAndUpdate()Syntax:find(query, projection)We can fetch a specific record using the Find method, which has two parameters. If these two parameters are omitted, the find method will return all of the documents in the MongoDB collection.Example:Consider an example of employees with the database of employee_id and employee_name and we will fetch the documents using find() method.First, create a database with the name “employees” with the following code:use employeesNow, create a collection “employee” with:db.createCollection("employee")In the next step we will insert the documents in the database:db.employee.insert([{employee_id: 101, employee_name: "Ishan"}, {employee_id: 102, employee_name: "Bhavesh"}, {employee_id: 103, employee_name: "Madan"}])Find all Documents:To get all the records in a collection, we need to use the find method with an empty parameter. In other words, when we need all the records, we will not use any parameters.db.employee.find()Output in Mongo ShellThe pretty() method can be used to display the results in a formatted manner.Syntax:db.COLLECTION_NAME.find().pretty()Let’s check our documents with pretty() method:Query FiltersWe will see examples of query operations using the db.collection.find() method in mongosh.We will use the employee collection in the employees database.db.employee.insert([{employee_id: 101, employee_name: "Ishan", age: 21, email_id: "ishanjain@gmail.com"}, {employee_id: 102, employee_name: "Bhavesh", age: 22, email_id: "bhaveshg@gmail.com"}, {employee_id: 103, employee_name: "Madan", age: 23, email_id: "madan@gmail.com"}])As we have seen earlier that to select all the documents in the database we pass an empty document as the query filter parameter to the find method.db.employee.find().pretty()Find the first document in a collection:db.employee.findOne()Find a document by ID:db.employee.findOne({_id : ObjectId("61d1ae0b56b92c20b423a5a7")})Find Documents that Match Query Criteriadb.employee.find({“age”: “22”})db.employee.find({"employee_name": "Madan"}).pretty()Sort Results by a Field:db.employee.find().sort({age: 1}).pretty()order by age, in ascending orderdb.employee.find().sort({age: -1}).pretty()order by age, in descending orderAND Conditions:A compound query can specify conditions for multiple fields in the documents in a collection. A logical AND conjunction connects the clauses of a compound query indirectly, allowing the query to select all documents in the collection that meet the specified conditions.In the following example, we will consider all the documents in the employee collection where employee_id equals 101 and age equals 21.db.employee.find({"employee_id": 101, "age": "21" }).pretty()Querying nested fieldsThe embedded or nested document feature in MongoDB is a useful feature. Embedded documents, also known as nested documents, are documents that contain other documents.You can simply embed a document inside another document in MongoDB. Documents are defined in the mongo shell using curly braces (), and field-value pairs are contained within these curly braces.Using curly braces, we can now embed or set another document inside these fields, which can include field-value pairs or another sub-document.Syntax:{ field: { field1: value1, field2: value2 } }Example:We have a database “nested” and in this database we have collection “nesteddoc”.The following documents will insert into the nesteddoc collection.db.nesteddoc.insertMany([ { "_id" : 1, "dept" : "A", "item" : { "sku" : "101", "color" : "red" }, "sizes" : [ "S", "M" ] }, { "_id" : 2, "dept" : "A", "item" : { "sku" : "102", "color" : "blue" }, "sizes" : [ "M", "L" ] }, { "_id" : 3, "dept" : "B", "item" : { "sku" : "103", "color" : "blue" }, "sizes" : "S" }, { "_id" : 4, "dept" : "A", "item" : { "sku" : "104", "color" : "black" }, "sizes" : [ "S" ] } ])Place the documents in the collection now. Also, take a look at the results:As a result, the nesteddoc collection contains four documents, each of which contains nested documents. The find() method can be used to access the collection's documents.db.nesteddoc.find()Specify Equality Condition:In this example, we will select the document from the nesteddoc query where dept equals “A”.db.nesteddoc.find({dept: "A"})Querying ArraysUse the query document {: } to specify an equality condition on an array, where is the exact array to match, including the order of the elements.The following query looks for all documents where the field tags value is an array with exactly two elements, "S" and "M," in the order specified:db.nesteddoc.find( { sizes: ["S", "M"] } )Use the $all operator to find an array that contains both the elements "S" and "M," regardless of order or other elements in the array:db.nested.find( { sizes: {$all: ["S", "M"] } } )Query an Array for an Element:The following example queries for all documents where size is an array that contains the string “S” as one of its elements:db.nesteddoc.find( { sizes: "S" } )Filter conditionsTo discuss the filter conditions, we will consider a situation that elaborates this. We will start by creating a collection with the name “products” and then add the documents to it.db.products.insertMany([ { _id: 1, item: { name: "ab", code: "123" }, qty: 15, tags: [ "A", "B", "C" ] }, { _id: 2, item: { name: "cd", code: "123" }, qty: 20, tags: [ "B" ] }, { _id: 3, item: { name: "ij", code: "456" }, qty: 25, tags: [ "A", "B" ] }, { _id: 4, item: { name: "xy", code: "456" }, qty: 30, tags: [ "B", "A" ] }, { _id: 5, item: { name: "mn", code: "000" }, qty: 20, tags: [ [ "A", "B" ], "C" ] }])To check the documents, use db.products.find():$gt$gt selects documents with a field value greater than (or equal to) the specified value.db.products.find( { qty: { $gt: “20” } } )$gte:$gte finds documents in which a field's value is greater than or equal to (i.e. >=) a specified value (e.g. value.)db.products.find( { qty: {$gte: 20 } } )$lt:$lt selects documents whose field value is less than (or equal to) the specified value.db.products.find( { qty: { $lt: 25 } } )$lte:$lte selects documents in which the field's value is less than or equal to (i.e. =) the specified value.db.products.find( { qty: {$lte: 20 } } )Query an Array by Array Length:To find arrays with a specific number of elements, use the $size operator. For example, the following selects documents with two elements in the array.db.products.find( { "tags": {$size: 2} } )ProjectionIn MongoDB, projection refers to selecting only the data that is required rather than the entire document's data. If a document has five fields and you only want to show three of them, select only three of them.The find() method in MongoDB accepts a second optional parameter, which is a list of fields to retrieve, as explained in MongoDB Query Document. When you use the find() method in MongoDB, it displays all of a document's fields. To prevent this, create a list of fields with the values 1 or 0. The value 1 indicates that the field should be visible, while 0 indicates that it should be hidden.Syntax:db.COLLECTION_NAME.find({},{KEY:1})Example:We will consider the previous example of products collection. Run the below command on mongoshell to learn how projection works:db.products.find({},{"tags":1, _id:0})Keep in mind that the _id field is always displayed while executing the find() method; if you do not want this field to be displayed, set it to 0.Optimized FindingsTo retrieve a document from a MongoDB collection, use the Find method.Using the Find method, we can retrieve specific documents as well as the fields that we require. Other find methods can also be used to retrieve specific documents based on our needs.By inserting array elements into the query, we can retrieve specific elements or documents. To retrieve data for array elements from the collection in MongoDB, we can use multiple query operators.
6495
MongoDB Query Document Using Find() With Example

MongoDB's find() method selects documents from a c... Read More

## Implementing MongoDb Map Reduce using Aggregation

Algorithms and applications in today's data-driven market collect data about people, processes, systems, and organisations 24 hours a day, seven days a week, resulting in massive amounts of data. The problem is figuring out how to process this massive amount of data efficiently without sacrificing valuable insights.What is Map Reduce? The MapReduce programming model comes to the rescue here. MapReduce, which was first used by Google to analyse its search results, has grown in popularity due to its ability to split and process terabytes of data in parallel, generating results faster. A (Key,value) pair is the basic unit of information in MapReduce. Before feeding the data to the MapReduce model, all types of structured and unstructured data must be translated to this basic unit. The MapReduce model, as the name implies, consists of two distinct routines: the Map-function and the Reduce-function.  MapReduce is a framework for handling parallelizable problems across huge files using a huge number of devices (nodes), which are collectively referred to as a cluster (if all nodes are on the same local network and use similar hardware) or a grid (if the nodes are shared across geographically and administratively distributed systems, and use more heterogeneous hardware).  When data stored in a filesystem (unstructured) or a database(structured) is processed, MapReduce can take advantage of data's locality, processing it close to where it's stored to reduce communication costs. Typically, a MapReduce framework (or system) consists of three operations: Map: Each worker node applies the map function to local data and saves the result to a temporary storage. Only one copy of the redundant input data is processed by a master node. Shuffle: worker nodes redistribute data based on output keys (produced by the map function), ensuring that all data associated with a single key is stored on the same worker node. Reduce: each group of output data is now processed in parallel by worker nodes, per key. This article will walk you through the Map-Reduce model's functionality step by step. Map Reduce in MongoDB The map-reduce operation has been deprecated since MongoDB 5.0. An aggregation pipeline outperforms a map-reduce operation in terms of performance and usability. Aggregation pipeline operators like $group,$merge, and others can be used to rewrite map-reduce operations. Starting with version 4.4, MongoDB provides the $accumulator and$function aggregation operators for map-reduce operations that require custom functionality. In JavaScript, use these operators to create custom aggregation expressions. The map and reduce functions are the two main functions here. As a result, the data is independently mapped and reduced in different spaces before being combined in the function and saved to the specified new collection. This mapReduce() function was designed to work with large data sets only. You can perform aggregation operations like max and avg on data using Map Reduce, which is similar to groupBy in SQL. It works independently and in parallel on data. Implementing Map Reduce with Mongosh (MongoDB Shell)  The db.collection.mapReduce() method in mongosh is a wrapper for the mapReduce command. The examples that follow make use of the db.collection.mapReduce(). Example: Create a collection ‘orders’ with these documents: db.orders.insertMany([     { _id: 1, cust_id: "Ishan Jain", ord_date: new Date("2021-11-01"), price: 25, items: [ { sku: "oranges", qty: 5, price: 2.5 }, { sku: "apples", qty: 5, price: 2.5 } ], status: "A" },     { _id: 2, cust_id: "Ishan Jain", ord_date: new Date("2021-11-08"), price: 70, items: [ { sku: "oranges", qty: 8, price: 2.5 }, { sku: "chocolates", qty: 5, price: 10 } ], status: "A" },     { _id: 3, cust_id: "Bhavesh Galav", ord_date: new Date("2021-11-08"), price: 50, items: [ { sku: "oranges", qty: 10, price: 2.5 }, { sku: "pears", qty: 10, price: 2.5 } ], status: "A" },     { _id: 4, cust_id: "Bhavesh Galav", ord_date: new Date("2021-11-18"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },     { _id: 5, cust_id: "Bhavesh Galav", ord_date: new Date("2021-11-19"), price: 50, items: [ { sku: "chocolates", qty: 5, price: 10 } ], status: "A"},     { _id: 6, cust_id: "Madan Parmar", ord_date: new Date("2021-11-19"), price: 35, items: [ { sku: "carrots", qty: 10, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },     { _id: 7, cust_id: "Madan Parmar", ord_date: new Date("2021-11-20"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },     { _id: 8, cust_id: "Abhresh", ord_date: new Date("2021-11-20"), price: 75, items: [ { sku: "chocolates", qty: 5, price: 10 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },     { _id: 9, cust_id: "Abhresh", ord_date: new Date("2021-11-20"), price: 55, items: [ { sku: "carrots", qty: 5, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 }, { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },     { _id: 10, cust_id: "Abhresh", ord_date: new Date("2021-11-23"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }  ]) Apply a map-reduce operation to the orders collection to group them by cust_id, then add the prices for each cust_id: To process each input document, define the map function: this refers the document that the map-reduce operation is processing in the function. For each document, the function maps the price to the cust_id and outputs the cust_id and price. var mapFunction1 = function() {emit(this.cust_id, this.price);}; With the two arguments keyCustId and valuesPrices, define the corresponding reduce function: The elements of the valuesPrices array are the price values emitted by the map function, grouped by keyCustId. The valuesPrice array is reduced to the sum of its elements by this function. var reduceFunction1 = function(keyCustId, valuesPrices) {return Array.sum(valuesPrices);};Apply the mapFunction1 map function and the reduceFunction1 reduce function to all documents in the orders collection: db.orders.mapReduce(mapFunction1,reduceFunction1,{ out: "map_reduce_example" }) The results of this operation are saved in the map_reduce_example collection. If the map_reduce_example collection already exists, the operation will overwrite its contents with the map-reduce operation's results. Check the map_reduce_example collection to verify: db.map_reduce_example.find().sort( { _id: 1 } ) Aggregation Alternative:You can rewrite the map-reduce operation without defining custom functions by using the available aggregation pipeline operators: db.orders.aggregate([{$group: { _id:"$cust_id",value:{$sum: "$price" } } },{ \$out: "agg_alternative_1" }]) Check the agg_alternative_1 collection to verify: db.agg_alternative_1.find().sort( { _id: 1 } )Implementing Map Reduce with Java Consider the collection car and insert the following documents in it. db.car.insert( [ {car_id:"c1",name:"Audi",color:"Black",cno:"H110",mfdcountry:"Germany",speed:72,price:11.25}, {car_id:"c2",name:"Polo",color:"White",cno:"H111",mfdcountry:"Japan",speed:65,price:8.5}, {car_id:"c3",name:"Alto",color:"Silver",cno:"H112",mfdcountry:"India",speed:53,price:4.5}, {car_id:"c4",name:"Santro",color:"Grey",cno:"H113",mfdcountry:"Sweden",speed:89,price:3.5} , {car_id:"c5",name:"Zen",color:"Blue",cno:"H114",mfdcountry:"Denmark",speed:94,price:6.5} ] ) You will get an output like this:  Let's now write the map reduce function on a collection of cars, grouping them by speed and classifying them as overspeed cars.  var speedmap = function (){  var criteria;  if ( this.speed > 70 ) {criteria = 'overspeed';emit(criteria,this.speed);}}; Based on the speed, this function classifies the vehicle as an overspeed vehicle. The term "this" refers to the current document that requires map reduction. var avgspeed_reducemap = function(key, speed) {       var total =0;       for (var i = 0; i
7344
Implementing MongoDb Map Reduce using Aggregation

Algorithms and applications in today's data-driven... Read More