web-development top banner

What are the Data-types in MongoDB

Read it in 8 Mins

Last updated on
12nd Mar, 2021
Published
14th Feb, 2021
Views
5,700
What are the Data-types in MongoDB

In this article, you will learn about one of the leading NoSQL databases - MongoDB. You will understand the basics of MongoDB and how data is stored in a NoSQL database, but most of the article will focus on the data types that are supported by MongoDB.
MongoDB is a cross-platform, document-oriented. NoSQL database. MongoDB is known for its high scalability, amazing availability and higher performance compared to a similar SQL database like MySQL.

In any NoSQL database, data is stored as a set of key-value pairs. Here is an example.

"name":  "Knowledgehut"

When we store related key-value pairs together in a set of key-value pairs, the set is known as a document. Here is an example of a document that contains data about an employee.

{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_salary":  40000,
"employee_status":  true,
}

Introduction to Data Types

In the document above, you can see that we have stored multiple values for an employee. This is very similar to how we stored data within a row in a typical RDBMS. Similar documents are stored together within a collection. You can think of collections as the NoSQL equivalent of an RDBMS table with some key differences that we are not going to discuss in this article.

In the above document, you can see that we have 4 different key-value pairs. The values can be of different types, for example, in this case the employee_name and employee_skills have the values of String type, employee_salary is of the number type and the employee_status is of the boolean type.  

Having these (and more) data types in MongoDB allows us to store the data in a more efficient format and also perform highly efficient and robust queries on the stored data.

Using the correct data type for storing the data fields in a document is crucial to the success of the database system. Here are some of the most used data types available in MongoDB.

  • String
  • Integer
  • Boolean
  • Double
  • Date
  • Mix/Max keys
  • Arrays
  • Timestamp
  • Object
  • Null
  • Symbol
  • Regular Expressions

We will have a look at all of these with examples but before that, let’s have a look at JSON and BSON to understand how MongoDB stores the data.

JSON and BSON

JSON stands for JavaScript Object Notation. It is a very common format used by APIs and web services to return the data to the client. This format is widely used because of its simplicity and ease of parsing. Most modern programming languages do not need an additional application layer to parse JSON data.

JSON objects are simple associative containers, where the data is stored as a set of key-value pairs. In this case, a key is mapped to a value (which can be a number, string, function, or even another object).

MongoDB also stores the data as JSON documents but the JSON data is binary encoded. This results in BSON. BSON simply stands for Binary JSON. BSON’s binary structure encodes type and length information, which allows it to be parsed much more quickly and therefore delivers better performance.

In a nutshell, MongoDB stores data in BSON format both internally, and over the network, but that doesn’t mean you can’t think of MongoDB as a JSON database. Anything you can represent in JSON can be natively stored in MongoDB, and retrieved just as easily in JSON.

Different MongoDB data types

Let’s have a look at each data type offered by MongoDB with examples and understand the best use-cases for them.

  • String - This is one of the simplest and most used data types. String type is used to simply represent text. The strings in BSON are UTF-8 which allow us to represent most international characters in BSON strings without any problems. Here are some examples of string values in a document.
{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_salary":  40000,
"employee_status":  true,
}

The above document has two keys that have the value of String type. Namely, employee_name and employee_skills have the String values. These are the simplest values and are used to represent a bunch of characters.

  • Integer - The integer data type is used to store numeric values. It can store 32-bit or 64-bit integers which depends on the server. Here is an example of an integer stored in a document.
{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_salary":  40000,
"employee_status":  true,
}

The key employee_salary stores a numeric value and therefore it is of the type integer.

  • Double - The double datatype is used to store numeric values with 8 bytes (64-bit IEEE 754 floating point) floating-point. Here is an example of a document that contains a double value in the field employee_score.
{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_score":  97.67,
"employee_status":  true,
}
  • Boolean - The boolean datatype is used to store boolean (true or false) values. In the below example, you can see that the field employee_status stores the value true, hence this field is of the type boolean.
{
"employee_name":  "John Doe",
"employee_skills":  "UI Design",
"employee_score":  97.67,
"employee_status":  true,
}

Booleans use less storage than an integer or string and avoid any unexpected side effects of comparison.

  • Arrays - Arrays are used to store multiple values of the same type under a single key. Here is an example of an array stored within a document.
{
"employee_name":  "John Doe",
"employee_skills":  ["UI Design", "Graphic Design", "2D Animation"],
"employee_score":  97.67,
"employee_status":  true,
}

In the above example, the employee_skills field contains an array of type String where each value within the array is a String.

Here is another example where instead of an array of a simple type (String), documents are embedded within the array.

{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_stock": [{
"warehouse": "Warehouse A",
"qty": 1200
}, {
"warehouse": "Warehouse B",
"qty": 900
}],
}

In the above document, the field item_stock contains an array of embedded documents.

  • Date - The date datatype is used to store date and time in Unix-time format. Unix timestamps can be easily converted to and from the JavaScript Date object. Date is a 64-bit integer that represents the number of milliseconds since the Unix epoch (Jan 1, 1970). This results in a representable date range of about 290 million years into the past and future.

Here is an example of how a date is stored in a document.

{
"student_name": "Bob Stan",
"student_dob": ISODate("2006-02-10T10:50:42.389Z"),
"student_marks": 78.98
}

In the above example, the stored date can be easily converted to a readable format using JavaScript's new Date("2006-02-10T10:50:42.389Z") function. It will return the following output.

Fri Feb 10 2006 16:20:42 GMT+0530 (India Standard Time)

Internally, Date objects are stored as a signed 64-bit integer representing the number of milliseconds since the Unix epoch (Jan 1, 1970).

  • Min/Max keys - Min and Max keys are both internal data types. It is used to compare a value against the lowest and highest BSON elements.
  • Object - This data type is used to store embedded documents within a document. Let’s look at an example to understand it better.
{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_dimensions": {
"item_height": 1200,
"item_width": 100,
"item_depth": 900,
},
"item_availability": true,
}

In the above example, the item_dimensions field is an embedded document as it contains its own set of key-value pairs. This field therefore is of the type Object.

  • Timestamp - The timestamp type is a special type for internal MongoDB use and is not associated with the regular Date type. This internal timestamp type is a 64-bit value where the most significant 32 bits are seconds since the Unix epoch and the least significant 32 bits are an incrementing ordinal for operations within a given second.

Here is how the timestamp value looks like in the document when it is queried.

{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_created": Timestamp(1412180887, 1),
"item_availability": true,
}

The timestamp data type is generally used to keep track of document creation/editing/updation times. The new Timestamp() function is used during the insertion and the server automatically adds the timestamp to the field.

  • Null - The null datatype is used to store null or non-existent values. Here is how a field in a document with a null value would look like when queried.
{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_color": null,
"item_availability": true,
}

This is similar to the following document as well where the field is completely absent.

{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_availability": true,
}
  • ObjectID - This datatype is used to store a document’s unique ID. No two documents in a collection can have the same ObjectIDs. It is a 12-byte value that contains the timestamp, a random value and an incrementing counter value as well, all combined together to generate a unique ID.

Here is an example.

{
"_id": "5349b4ddd2781d08c09890f3",
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_availability": true,
}

The _id field is automatically added for every document if you do not specify a field explicitly with the ObjectID type.

  • Binary - This datatype is used to store binary data in a field. This data type corresponds to the Blob type in a Relational DBMS. There is, however, a limit of 16MB per document in MongoDB, so if the binary data plus other fields have a total size less than 16MB, then binary data can be embedded within the document using the Binary data type.

Here is an example.

{
"_id": "5349b4ddd2781d08c09890f3",
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_availability": true,
"item_picture":BinData(1, "wekud3298eyx2398ey293..."),
}

BinData here is the base64 representation of the binary content.

  • Undefined - This datatype is used to store the undefined value in a field. Note that MongoDB differentiates between null and undefined but the shell casts both to null. This behavior can, however, be changed.
{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_color": undefined,
"item_availability": true,
}

Undefined is now deprecated in MongoDB 4.4.

  • Regular Expression - This datatype is used to store Regular Expressions or RegExs in a field. These can be used for pattern matching across different languages. Here is an example.
{
"item_code": "1234-ABCD",
"item_price": 49.99,
"item_color": undefined,
"item_prefix": /%_Y675%,
}
  • JavaScript with Scope - It is possible to store a live function in MongoDB within a field. The functions with closure can also be stored. They will bind to the scope of the MongoDB session when they're executed.

In BSON, there are two different types defined for functions without closures, JavaScript and another one for functions with closures, JavaScript with Scope. JavaScript with Scope is now deprecated in MongoDB 4.4.

So these are all the key and most prominent datatypes in MongoDB. BSON supports more datatypes than JSON. Some older and less often used datatypes are removed from the MongoDB support shelf and the range or support for newer types is improved with time. This is an evergreen process.

Profile

KnowledgeHut

Author
KnowledgeHut is an outcome-focused global ed-tech company. We help organizations and professionals unlock excellence through skills development. We offer training solutions under the people and process, data science, full-stack development, cybersecurity, future technologies and digital transformation verticals.