web-development top banner

How to Use MongoDB Data Modeling to Improve Throughput Operation

Read it in 8 Mins

Last updated on
12nd Jan, 2022
Published
11st Jan, 2022
Views
4,527
How to Use MongoDB Data Modeling to Improve Throughput Operation

Learn how to use Data Modeling in MongoDB while working with different types of data. Understand the importance of data models and how it is used.

The main challenge in data modelling is balancing the application's needs, the database engine's performance characteristics, and the data retrieval patterns. Always consider the application usage of the data (i.e. queries, updates, and data processing) as well as the inherent structure of the data itself when designing data models.

What is Data Modeling?

In general, data modelling is the analysis of data items in a database and how they relate to other objects in that database.

We can have a users’ collection and a profile collection in MongoDB, for example. The users’ collection contains a list of usernames for a specific application, whereas the profile collection contains information about each user's profile settings.

In data modelling, we must create a relationship that connects each user to the appropriate profile. In a nutshell, data modelling is the first step in database design, as well as the foundation for object-oriented programming. It also provides an indication of how the physical application will appear as development progress. An example of an application-database integration architecture is shown below.

How to Use MongoDB Data Modeling to Improve Throughput operation

The Process of Data Modeling in MongoDB

Data modelling boosts database performance, but it comes at the cost of some factors, such as:

  • Patterns of data retrieval
  • Balancing application requirements such as queries, updates, and data processing
  • The chosen database engine's performance characteristics
  • The data's own inherent structure

Why use data models?

Data modelling may appear to be an unorthodox process, far removed from the data analytics projects that generate measurable value for the organisation. However, data modelling is necessary foundational work that allows data to be stored more easily in a database and has a positive impact on data analytics.

These are some of the key advantages of data modelling and why organisations will continue to use them:

Higher data quality:

The visual representation of requirements and business rules enables developers to anticipate what could lead to large-scale data corruption before it occurs. Data models enable developers to define rules that monitor data quality, reducing the possibility of errors.

Internal communication about data and data processes improves:

Creating data models forces the business to define how data is generated and moved across applications.

Costs of development and maintenance reduces:

Data modelling exposes errors and inconsistencies early in the process, making it easier and less expensive to fix.

Performance improves:

An organised database is one that is more efficiently operated; data modelling prevents the schema from endless searching and returns results more quickly.

Key Considerations and Resources for Data Modeling

When deciding on the best data model, there are several factors to consider. These aspects differ depending on the stage of the Data Lifecycle for which we are designing. These elements are as follows:

  • Data Creation and Modification Speed and Frequency - Small amounts of data should be captured more quickly while maintaining consistency.
  • Data Retrieval Speed - The ability to retrieve small or large amounts of data for reporting and analysis.
  • ACID Properties - Atomicity, Consistency, Isolation, and Transaction Durability
  • Business scope - involves one or more departments or business functions.
  • Access to the Finest Grain of Data - Different data use-cases may necessitate access to the finest level of detail or various levels of aggregation.

Other factors may exist, but the ones mentioned above have a significant impact on the decision-making process for selecting the best data model.

MongoDB Document Structure

Documents in MongoDB play a significant role in determining which technique to use for a given set of data. In general, there are two types of data relationships:

  • Embedded Data
  • Reference Data

Embedded Data

In this case, related data is stored as a field value or an array within a single document. The main benefit of this method is that data is denormalized, making it possible to manipulate related data in a single database operation. As a result, the efficiency of CRUD operations improves, and fewer queries are required. Consider the following document as an example:

{ "_id" : ObjectId("5b98bfe7e8b9ab9875e4c80c"),
     "StudentName" : "Ishan Jain",
        "Settings" : {
        "location" : "Embassy",
  "ParentPhone" : 123987456
        "bus" : "KAZ 450G",
        "distance" : "4",
        "placeLocation" : {
            "lat" : -0.376252,
            "lng" : 36.937389
        }
    }
}

In this set of data, we have a student with his name and some additional information. The Settings field has an object embedded in it, as well as the location. The location field is also embedded with a latitude and longitude configuration object. This student's data has been compiled into a single document. If we need to get all of this student's information, we simply run:

db.students.findOne({StudentName : "Ishan Jain"})

Reference Data

The related data is stored in separate documents in this case, but there is a reference link between them. The sample data can be reassembled as follows:

User Document:

{ "_id" : xyz,
     "StudentName" : "Ishan Jain",
     "ParentPhone" : 123987456,
}

Settings Document:

{    
     "id" :xyz,
     "location" : "Embassy",
     "bus" : "KAZ 450G",
     "distance" : "4",
     "lat" : -0.376252,
     "lng" : 36.937389

}

Although the documents are separate, they are linked by the same _id and id fields. The data model has been normalised as a result. We must, however, issue additional queries to access information from a related document, which increases execution time.

MongoDB Schema

For a given set of data, a schema is essentially a skeleton of fields and data types that each field should contain. All rows should have the same columns, and each column should contain the defined data type, according to SQL. However, MongoDB comes with a flexible Schema that doesn't require all documents to conform to the same standards.

Flexible Schema

In MongoDB, a flexible schema specifies that documents do not have to have the same fields or data types, and that a field can vary between documents within a collection. The main benefit of this concept is that it allows you to add new fields, delete existing ones, or change field values to a different type, resulting in a new structure for the document.

Rigid Schema

You may decide to create a rigid schema even though these documents may differ from one another. A rigid schema specifies that all documents in a collection have the same structure, allowing you to create document validation rules to ensure data integrity during insert and update operations.

Schema Validation Levels

There are three levels of validation:

  • Strict: The default validation level in MongoDB is this, which applies validation rules to all inserts and updates.
  • Moderate: Validation rules are only applied during inserts, updates, and to existing documents that meet the validation criteria.
  • Off: sets the validation rules for a given schema to null, implying that no validation will be performed on the documents.

Example: 

Insert the below data into a client collection.

db.clients.insert([{"_id" : 1,"name" : "Abhresh","phone" : "+91 123987456","city" : "Pune","status" : "Married"},{"_id" : 2,"name" : "Bhavesh", "city" : "Kota"}]);

If we apply the moderate validation level using:

db.runCommand( {
   collMod: "test",
   validator: { $jsonSchema: {
      bsonType: "object",
      required: [ "phone", "name" ],
      properties: {
         phone: {
            bsonType: "string",
            description: "must be a string and is required"
         },
         name: {
            bsonType: "string",
            description: "must be a string and is required"
         }
      }
   } },
   validationLevel: "moderate"
} )

The validation rules will only be applied to the document with _id 1 because it meets all the criteria.

The second document will not be validated because the validation rules do not match the issued criteria.

Schema Validation Actions

There may be some documents that violate the validation rules after they have been validated. When this occurs, there is always a need to act.

MongoDB offers two actions for documents that fail to pass the validation rules:

  • Error: If the validation criteria are not met, this is the default MongoDB action, which rejects any insert or update.
  • Warn: This action will log the violation in the MongoDB log, but it will not prevent the insert or update operation from continuing. As an example:
db.createCollection("students", {
   validator: {$jsonSchema: {
         bsonType: "object",
         required: [ "name", "gpa" ],
         properties: {
            name: {
               bsonType: "string",
               description: "must be a string and is required"
            },

            gpa: {
               bsonType: [ "double" ],
               minimum: 0,
               description: "must be a double and is required"
            }
         }

   },
validationAction: “warn”
})

If we try to insert a document like this:

db.students.insert( { name: "Ishan", status: "Updated" } );

Because the validation action is set to warn, despite the fact that the gpa is a required field in the schema design, the document will be saved, and an error message will be recorded in the MongoDB log.

Data Modeling Trend

You now understand how data modelling in MongoDB differs from relational DBMs, particularly in terms of schema. Organizations are increasingly involving business users to solve the problem of data quality. Modern data preparation platforms now enable business users to prepare data for specific analytic initiatives themselves, rather than burdening developers with the task of building data models and resolving all data quality issues.

Profile

Abhresh Sugandhi

Author

Abhresh is specialized as a corporate trainer, He has a decade of experience in technical training blended with virtual webinars and instructor-led session created courses, tutorials, and articles for organizations. He is also the founder of Nikasio.com, which offers multiple services in technical training, project consulting, content development, etc.