# What Is Replication in MongoDB

7K

## What is Replication?

Replication is the process of storing the data in multiple places instead of just one. Data is stored on multiple servers across different physical sites, in order to improve data availability and also ensure uninterrupted access to data even if one of the sites goes down due to some failure.

Simply put, replication involves copying of the data from one server to another. As the changes occur on the primary server, they are simultaneously propagated to the other servers as well. This keeps all of the servers in sync and the read operations can be performed on any of the available sites. The result is a distributed database in which users can access data relevant to their tasks without interfering with the work of others.

## Replication in MongoDB

In MongoDBmongod is the primary process in the system that handles data requests, manages data access, and also performs various background management operations. For Replication to work, it is important to have multiple mongod instances running that maintain the same dataset.

A replica set is the foundation of replicationThe server instances that maintain the same dataset form a replica set in MongoDB. A replica sets ensures redundancy and high availability, and is the basis for all production deployments.

Each replica set contains multiple data bearing nodes and optionally an arbiter node. Of all the data bearing nodes, one and only one member is designated as the primary node, while the other nodes are designated as the secondary nodes. The primary node receives the requests for all the write operations from the clients. The write operations are then synced to the secondary nodes using various algorithms. The primary records all writes and other changes to its datasets in its operation log, i.e. oplog.

To enable replication in MongoDB, a minimum of three nodes are required.

• In this operation of replication, MongoDB assumes one node of replica set as the primary node and the remaining are secondary nodes.
• From within the primary node, data gets replicated to secondary nodes.
• New primary nodes get elected in case there is automatic maintenance or failover.

## Redundancy and Data Availability

If replication is in place, it simply means that there will be multiple copies of the same data in different database servers. This ensures high data availability and data redundancy. High availability indicates a system designed for durability, redundancy, and automatic failover such that the applications supported by the system can operate continuously and without downtime for a long period of time. Due to redundancy, replication provides fault tolerance against the loss of one or more (not a lot) database servers.

In certain cases of data replication, clients can send read operations not just to one server, but to different servers. This results in increased read capacity and faster responses to requests from clients. Maintaining copies of data in different servers increases data locality and availability for distributed applications. These duplicate copies of data can be used for various recoveries, reporting or backup purposes as well.

## Enabling Replication in MongoDB

As we already know, a replica set is a group of mongod instances that maintain the same data set. A replica set contains several data bearing nodes and, in some cases, one arbiter node which is optional. Of all the data bearing nodes, one and only one member is designated to be the primary node, while all the other nodes are assigned to be secondary nodes. The point of an Arbiter is to break the deadlock when an election needs to be held for a Primary.

If you have an odd number of nodes, the election process is simple when all nodes are up, and in a failover, one of the other nodes will simply be elected.

If you have an even number of nodes in a replica set, an Arbiter may be required. An example is a case where you do not want to commit the same level of hardware to have, say, a five-node replica set. Here you could use an arbiter on a lower specification machine in order to avoid a deadlock in elections. An arbiter is also useful if you want to give preference to certain nodes to be elected as the Primary.

The primary node receives all write operations. A replica set can have only one primary capable of confirming writes with { w: "majority" } write concernThe primary records all changes to its data sets in its operation log, i.e. oplog.

The secondaries replicate the primary’s oplog entries and apply the operations to their datasets such that the secondaries’ datasets completely reflect the primary’s dataset. If the primary is unavailable, an eligible secondary holds an election to elect itself as the new primary.

In some cases where, for example, cost constraints allow only one primary and secondary but do not allow addition of more than one secondary, an arbiter is used. An arbiter node does not hold any data at al. It only participates in elections. Hence, it does not provide any data redundancy.

An arbiter will always be an arbiter, whereas a primary may step down and become a secondary, and a secondary may become the primary during an election.

Follow the following steps to enable replication and create a replica set.

• Ensure that all servers can access each other over the network. For now, consider that we have 3 servers, ServerAServerB and ServerC.
• Considering the ServerA is the primary and only server working as of now, issue the following commands on server A.
mongo –host ServerB –port 27017
mongo –host ServerB –port 27017
Execute the same commands on the remaining servers as well.
• Start the first mongod.exe instance with the replSet option. This option provides a grouping for all servers which will be part of this replica set.
mongo –replSet "Replica1"

• The first server is automatically added to the replica set. Next, let’s initiate the replica set.
rs.initiate()

• To add more servers to the replica set, issue the following commands.

• You’re done! Run the rs.status() command. This command gives the status of the replica set. By default, each member will send messages to each other called "heartbeat" messages which just indicate that the server is alive and working. The "status" command gets the status of these messages and shows if there are any issues with any members in the replica set.

### Benefits of Replication

We already know that Replication allows us to increase data availability by creating multiple copies of the data across servers. This is especially useful if a server crashes or if we experience service interruptions or hardware failure. Let’s have a look at some other advantages of Data Replication.

1. Replication helps in disaster recovery and backup of data. In case of a disaster, secondary nodes ensure that the data is always available without service interruptions.
2. Replication ensures that data is always available to every client.
3. Replication keeps the data safe and protected through this redundant backup approach.
4. Replication minimizes downtime for maintenance.

### Asynchronous Replication

Asynchronous replication is a replication technique where data is backed up periodically or after a period of time. It is not immediately backed up during or immediately after the data is written to the primary storage. This kind of replication results in good performance and lesser bandwidth requirements, but the backups are not readily available if something happens to the primary storage.

In an asynchronous replication system, the data is written to the primary storage first and then it is copied over to the secondary nodes. The copying or replication is done at predetermined intervals. How and when this is done, depends on the settings and the type of implementation of asynchronous replication.

This method allows for good read/write performance without adversely affecting the bandwidth usage as data is not replicated to remote backups in real-time, as in a synchronous replication system. So the system in not under a lot of load at any given point of time. Data is only backed up after predetermined times or periodicallyThis does not guarantee 100% backup, so it should be used for less sensitive data or information that has tolerance for loss. In a situation where a disaster or failure occurs right after the data is written to the primary storage, the data will not be copied over to the secondary nodes and therefore will cause loss of data and affect availability.

### Replication vs Sharding

Sharding is a process where the scaling is done horizontally by partitioning data across multiple servers using a special key called Shard Key. A sharded environment does add more complexity because MongoDB now has to manage distributing data and requests between shards -- additional configuration and routing processes are added to manage those aspects.

Replication, on the other hand, creates additional copies of the data that allows for better availability and read performance. Typically, replication and sharding are used in combination. In these situations, each shard is supported by a replica set.

Shards in MongoDB are just replica sets with a router in front of them. The client application connects to the router, issues queries, and the router decides which replica set (shard) to forward the request to. It is significantly more complex than a single replica set because we have the router and configuration servers to deal with.

Sharding is done with the objective of scaling the database horizontally.

### Transactions

Multi-document transactions are available for replica sets (starting from version 4.0). Multi-document transactions that contain read operations must use read preference primary. All operations in a given transaction must route to the same member.

The data changes made in the transaction are not visible outside the transaction until a transaction commits. Once the transaction commits, changes are then available to be read by all secondaries and clients.

However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards.

### Change streams

In replication, as we have read above, the secondary nodes replicate the primary node’s oplog entries and end up having exactly the same dataset as the primary. Another alternative to this approach is that whenever there is a write to the primary node’s data, it informs all the secondaries of this data change and the secondary nodes then update themselves accordingly. This is possible with the help of change streams.

Change streams allow applications to subscribe to all data changes on a collection or a set of collections. This way all the apps are notified of the changes to the data.

Replica sets provide a number of options to support various application needs including data backup, recovery and increasing availability. They increase performance and data availability. Replication also ensures that the downtime, if there is any, is brought down to lowest in case of disaster or any other event that causes interruptions in accessing data.

### KnowledgeHut

Author

KnowledgeHut is an outcome-focused global ed-tech company. We help organizations and professionals unlock excellence through skills development. We offer training solutions under the people and process, data science, full-stack development, cybersecurity, future technologies and digital transformation verticals.
Website : https://www.knowledgehut.com

## Handling React Events - A Detailed Guide

5355
Handling React Events - A Detailed Guide

Event handling essentially allows the user to inte... Read More

## MongoDB Query Document Using Find() With Example

MongoDB's find() method selects documents from a collection or view and returns a cursor to those documents. There are two parameters in this formula: query and projection.Query – This is an optional parameter that specifies the criteria for selection. In simple terms, a query is what you want to search for within a collection.Projection – This is an optional parameter that specifies what should be returned if the query criteria are satisfied. In simple terms, it is a type of decision-making that is based on a set of criteria.MongoDB's Flexible SchemaA NoSQL database, which stands for "not only SQL," is a way of storing and retrieving data that is different from relational databases' traditional table structures (RDBMS).When storing large amounts of unstructured data with changing schemas, NoSQL databases are indeed a better option than RDBMS. Horizontal scaling properties of NoSQL databases allow them to store and process large amounts of data.These are intended for storing, retrieving, and managing document-oriented data, which is frequently stored in JSON format (JavaScript Object Notation). Document databases, unlike RDBMSs, have a flexible schema that is defined by the contents of the documents.MongoDB is one of the most widely used open-source NoSQL document databases. MongoDB is known as a 'schemaless' database because it does not impose a specific structure on documents in a collection.MongoDB is compatible with a number of popular programming languages. It also offers a high level of operational flexibility because it scales well horizontally, allowing data to be spread or 'sharded' across multiple commodity servers with the ability to add more servers as needed. MongoDB can be run on a variety of platforms, including developer laptops, private clouds, and public clouds.Querying documents using find()MongoDB queries are used to retrieve or fetch data from a MongoDB database. When running a query, you can use criteria or conditions to retrieve specific data from the database.The function db.collection is provided by MongoDB. find() is a function that retrieves documents from a MongoDB database.In MongoDB, the find method is used to retrieve a specific document from the MongoDB collection. In Mongo DB, there are a total of six methods for retrieving specific records.find()findAndModify()findOne()findOneAndDelete()findOneAndReplace()findOneAndUpdate()Syntax:find(query, projection)We can fetch a specific record using the Find method, which has two parameters. If these two parameters are omitted, the find method will return all of the documents in the MongoDB collection.Example:Consider an example of employees with the database of employee_id and employee_name and we will fetch the documents using find() method.First, create a database with the name “employees” with the following code:use employeesNow, create a collection “employee” with:db.createCollection("employee")In the next step we will insert the documents in the database:db.employee.insert([{employee_id: 101, employee_name: "Ishan"}, {employee_id: 102, employee_name: "Bhavesh"}, {employee_id: 103, employee_name: "Madan"}])Find all Documents:To get all the records in a collection, we need to use the find method with an empty parameter. In other words, when we need all the records, we will not use any parameters.db.employee.find()Output in Mongo ShellThe pretty() method can be used to display the results in a formatted manner.Syntax:db.COLLECTION_NAME.find().pretty()Let’s check our documents with pretty() method:Query FiltersWe will see examples of query operations using the db.collection.find() method in mongosh.We will use the employee collection in the employees database.db.employee.insert([{employee_id: 101, employee_name: "Ishan", age: 21, email_id: "ishanjain@gmail.com"}, {employee_id: 102, employee_name: "Bhavesh", age: 22, email_id: "bhaveshg@gmail.com"}, {employee_id: 103, employee_name: "Madan", age: 23, email_id: "madan@gmail.com"}])As we have seen earlier that to select all the documents in the database we pass an empty document as the query filter parameter to the find method.db.employee.find().pretty()Find the first document in a collection:db.employee.findOne()Find a document by ID:db.employee.findOne({_id : ObjectId("61d1ae0b56b92c20b423a5a7")})Find Documents that Match Query Criteriadb.employee.find({“age”: “22”})db.employee.find({"employee_name": "Madan"}).pretty()Sort Results by a Field:db.employee.find().sort({age: 1}).pretty()order by age, in ascending orderdb.employee.find().sort({age: -1}).pretty()order by age, in descending orderAND Conditions:A compound query can specify conditions for multiple fields in the documents in a collection. A logical AND conjunction connects the clauses of a compound query indirectly, allowing the query to select all documents in the collection that meet the specified conditions.In the following example, we will consider all the documents in the employee collection where employee_id equals 101 and age equals 21.db.employee.find({"employee_id": 101, "age": "21" }).pretty()Querying nested fieldsThe embedded or nested document feature in MongoDB is a useful feature. Embedded documents, also known as nested documents, are documents that contain other documents.You can simply embed a document inside another document in MongoDB. Documents are defined in the mongo shell using curly braces (), and field-value pairs are contained within these curly braces.Using curly braces, we can now embed or set another document inside these fields, which can include field-value pairs or another sub-document.Syntax:{ field: { field1: value1, field2: value2 } }Example:We have a database “nested” and in this database we have collection “nesteddoc”.The following documents will insert into the nesteddoc collection.db.nesteddoc.insertMany([ { "_id" : 1, "dept" : "A", "item" : { "sku" : "101", "color" : "red" }, "sizes" : [ "S", "M" ] }, { "_id" : 2, "dept" : "A", "item" : { "sku" : "102", "color" : "blue" }, "sizes" : [ "M", "L" ] }, { "_id" : 3, "dept" : "B", "item" : { "sku" : "103", "color" : "blue" }, "sizes" : "S" }, { "_id" : 4, "dept" : "A", "item" : { "sku" : "104", "color" : "black" }, "sizes" : [ "S" ] } ])Place the documents in the collection now. Also, take a look at the results:As a result, the nesteddoc collection contains four documents, each of which contains nested documents. The find() method can be used to access the collection's documents.db.nesteddoc.find()Specify Equality Condition:In this example, we will select the document from the nesteddoc query where dept equals “A”.db.nesteddoc.find({dept: "A"})Querying ArraysUse the query document {: } to specify an equality condition on an array, where is the exact array to match, including the order of the elements.The following query looks for all documents where the field tags value is an array with exactly two elements, "S" and "M," in the order specified:db.nesteddoc.find( { sizes: ["S", "M"] } )Use the $all operator to find an array that contains both the elements "S" and "M," regardless of order or other elements in the array:db.nested.find( { sizes: {$all: ["S", "M"] } } )Query an Array for an Element:The following example queries for all documents where size is an array that contains the string “S” as one of its elements:db.nesteddoc.find( { sizes: "S" } )Filter conditionsTo discuss the filter conditions, we will consider a situation that elaborates this. We will start by creating a collection with the name “products” and then add the documents to it.db.products.insertMany([ { _id: 1, item: { name: "ab", code: "123" }, qty: 15, tags: [ "A", "B", "C" ] }, { _id: 2, item: { name: "cd", code: "123" }, qty: 20, tags: [ "B" ] }, { _id: 3, item: { name: "ij", code: "456" }, qty: 25, tags: [ "A", "B" ] }, { _id: 4, item: { name: "xy", code: "456" }, qty: 30, tags: [ "B", "A" ] }, { _id: 5, item: { name: "mn", code: "000" }, qty: 20, tags: [ [ "A", "B" ], "C" ] }])To check the documents, use db.products.find():$gt$gt selects documents with a field value greater than (or equal to) the specified value.db.products.find( { qty: { $gt: “20” } } )$gte:$gte finds documents in which a field's value is greater than or equal to (i.e. >=) a specified value (e.g. value.)db.products.find( { qty: {$gte: 20 } } )$lt:$lt selects documents whose field value is less than (or equal to) the specified value.db.products.find( { qty: { $lt: 25 } } )$lte:$lte selects documents in which the field's value is less than or equal to (i.e. =) the specified value.db.products.find( { qty: {$lte: 20 } } )Query an Array by Array Length:To find arrays with a specific number of elements, use the $size operator. For example, the following selects documents with two elements in the array.db.products.find( { "tags": {$size: 2} } )ProjectionIn MongoDB, projection refers to selecting only the data that is required rather than the entire document's data. If a document has five fields and you only want to show three of them, select only three of them.The find() method in MongoDB accepts a second optional parameter, which is a list of fields to retrieve, as explained in MongoDB Query Document. When you use the find() method in MongoDB, it displays all of a document's fields. To prevent this, create a list of fields with the values 1 or 0. The value 1 indicates that the field should be visible, while 0 indicates that it should be hidden.Syntax:db.COLLECTION_NAME.find({},{KEY:1})Example:We will consider the previous example of products collection. Run the below command on mongoshell to learn how projection works:db.products.find({},{"tags":1, _id:0})Keep in mind that the _id field is always displayed while executing the find() method; if you do not want this field to be displayed, set it to 0.Optimized FindingsTo retrieve a document from a MongoDB collection, use the Find method.Using the Find method, we can retrieve specific documents as well as the fields that we require. Other find methods can also be used to retrieve specific documents based on our needs.By inserting array elements into the query, we can retrieve specific elements or documents. To retrieve data for array elements from the collection in MongoDB, we can use multiple query operators.
6495
MongoDB Query Document Using Find() With Example

MongoDB's find() method selects documents from a c... Read More

## Implementing MongoDb Map Reduce using Aggregation

Algorithms and applications in today's data-driven market collect data about people, processes, systems, and organisations 24 hours a day, seven days a week, resulting in massive amounts of data. The problem is figuring out how to process this massive amount of data efficiently without sacrificing valuable insights.What is Map Reduce? The MapReduce programming model comes to the rescue here. MapReduce, which was first used by Google to analyse its search results, has grown in popularity due to its ability to split and process terabytes of data in parallel, generating results faster. A (Key,value) pair is the basic unit of information in MapReduce. Before feeding the data to the MapReduce model, all types of structured and unstructured data must be translated to this basic unit. The MapReduce model, as the name implies, consists of two distinct routines: the Map-function and the Reduce-function.  MapReduce is a framework for handling parallelizable problems across huge files using a huge number of devices (nodes), which are collectively referred to as a cluster (if all nodes are on the same local network and use similar hardware) or a grid (if the nodes are shared across geographically and administratively distributed systems, and use more heterogeneous hardware).  When data stored in a filesystem (unstructured) or a database(structured) is processed, MapReduce can take advantage of data's locality, processing it close to where it's stored to reduce communication costs. Typically, a MapReduce framework (or system) consists of three operations: Map: Each worker node applies the map function to local data and saves the result to a temporary storage. Only one copy of the redundant input data is processed by a master node. Shuffle: worker nodes redistribute data based on output keys (produced by the map function), ensuring that all data associated with a single key is stored on the same worker node. Reduce: each group of output data is now processed in parallel by worker nodes, per key. This article will walk you through the Map-Reduce model's functionality step by step. Map Reduce in MongoDB The map-reduce operation has been deprecated since MongoDB 5.0. An aggregation pipeline outperforms a map-reduce operation in terms of performance and usability. Aggregation pipeline operators like $group,$merge, and others can be used to rewrite map-reduce operations. Starting with version 4.4, MongoDB provides the $accumulator and$function aggregation operators for map-reduce operations that require custom functionality. In JavaScript, use these operators to create custom aggregation expressions. The map and reduce functions are the two main functions here. As a result, the data is independently mapped and reduced in different spaces before being combined in the function and saved to the specified new collection. This mapReduce() function was designed to work with large data sets only. You can perform aggregation operations like max and avg on data using Map Reduce, which is similar to groupBy in SQL. It works independently and in parallel on data. Implementing Map Reduce with Mongosh (MongoDB Shell)  The db.collection.mapReduce() method in mongosh is a wrapper for the mapReduce command. The examples that follow make use of the db.collection.mapReduce(). Example: Create a collection ‘orders’ with these documents: db.orders.insertMany([     { _id: 1, cust_id: "Ishan Jain", ord_date: new Date("2021-11-01"), price: 25, items: [ { sku: "oranges", qty: 5, price: 2.5 }, { sku: "apples", qty: 5, price: 2.5 } ], status: "A" },     { _id: 2, cust_id: "Ishan Jain", ord_date: new Date("2021-11-08"), price: 70, items: [ { sku: "oranges", qty: 8, price: 2.5 }, { sku: "chocolates", qty: 5, price: 10 } ], status: "A" },     { _id: 3, cust_id: "Bhavesh Galav", ord_date: new Date("2021-11-08"), price: 50, items: [ { sku: "oranges", qty: 10, price: 2.5 }, { sku: "pears", qty: 10, price: 2.5 } ], status: "A" },     { _id: 4, cust_id: "Bhavesh Galav", ord_date: new Date("2021-11-18"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },     { _id: 5, cust_id: "Bhavesh Galav", ord_date: new Date("2021-11-19"), price: 50, items: [ { sku: "chocolates", qty: 5, price: 10 } ], status: "A"},     { _id: 6, cust_id: "Madan Parmar", ord_date: new Date("2021-11-19"), price: 35, items: [ { sku: "carrots", qty: 10, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },     { _id: 7, cust_id: "Madan Parmar", ord_date: new Date("2021-11-20"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },     { _id: 8, cust_id: "Abhresh", ord_date: new Date("2021-11-20"), price: 75, items: [ { sku: "chocolates", qty: 5, price: 10 }, { sku: "apples", qty: 10, price: 2.5 } ], status: "A" },     { _id: 9, cust_id: "Abhresh", ord_date: new Date("2021-11-20"), price: 55, items: [ { sku: "carrots", qty: 5, price: 1.0 }, { sku: "apples", qty: 10, price: 2.5 }, { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" },     { _id: 10, cust_id: "Abhresh", ord_date: new Date("2021-11-23"), price: 25, items: [ { sku: "oranges", qty: 10, price: 2.5 } ], status: "A" }  ]) Apply a map-reduce operation to the orders collection to group them by cust_id, then add the prices for each cust_id: To process each input document, define the map function: this refers the document that the map-reduce operation is processing in the function. For each document, the function maps the price to the cust_id and outputs the cust_id and price. var mapFunction1 = function() {emit(this.cust_id, this.price);}; With the two arguments keyCustId and valuesPrices, define the corresponding reduce function: The elements of the valuesPrices array are the price values emitted by the map function, grouped by keyCustId. The valuesPrice array is reduced to the sum of its elements by this function. var reduceFunction1 = function(keyCustId, valuesPrices) {return Array.sum(valuesPrices);};Apply the mapFunction1 map function and the reduceFunction1 reduce function to all documents in the orders collection: db.orders.mapReduce(mapFunction1,reduceFunction1,{ out: "map_reduce_example" }) The results of this operation are saved in the map_reduce_example collection. If the map_reduce_example collection already exists, the operation will overwrite its contents with the map-reduce operation's results. Check the map_reduce_example collection to verify: db.map_reduce_example.find().sort( { _id: 1 } ) Aggregation Alternative:You can rewrite the map-reduce operation without defining custom functions by using the available aggregation pipeline operators: db.orders.aggregate([{$group: { _id:"$cust_id",value:{$sum: "$price" } } },{ \$out: "agg_alternative_1" }]) Check the agg_alternative_1 collection to verify: db.agg_alternative_1.find().sort( { _id: 1 } )Implementing Map Reduce with Java Consider the collection car and insert the following documents in it. db.car.insert( [ {car_id:"c1",name:"Audi",color:"Black",cno:"H110",mfdcountry:"Germany",speed:72,price:11.25}, {car_id:"c2",name:"Polo",color:"White",cno:"H111",mfdcountry:"Japan",speed:65,price:8.5}, {car_id:"c3",name:"Alto",color:"Silver",cno:"H112",mfdcountry:"India",speed:53,price:4.5}, {car_id:"c4",name:"Santro",color:"Grey",cno:"H113",mfdcountry:"Sweden",speed:89,price:3.5} , {car_id:"c5",name:"Zen",color:"Blue",cno:"H114",mfdcountry:"Denmark",speed:94,price:6.5} ] ) You will get an output like this:  Let's now write the map reduce function on a collection of cars, grouping them by speed and classifying them as overspeed cars.  var speedmap = function (){  var criteria;  if ( this.speed > 70 ) {criteria = 'overspeed';emit(criteria,this.speed);}}; Based on the speed, this function classifies the vehicle as an overspeed vehicle. The term "this" refers to the current document that requires map reduction. var avgspeed_reducemap = function(key, speed) {       var total =0;       for (var i = 0; i
7344
Implementing MongoDb Map Reduce using Aggregation

Algorithms and applications in today's data-driven... Read More