When retrieving documents from a collection, you may not always know what Field value to look for. This is where regular expressions can help in the retrieval of data based on pattern matching search values.
A regular expression is a generalised method for matching patterns with character sequences. MongoDB supports UTF-8 and Perl compatible regular expressions (PCRE) version 8.42. For more information, check out Full Stack Developer jobs. We can do pattern matching in MongoDB in two ways:
- Using $regex operator for Pattern matching
- Pattern matching without the $regex operator
Using $regex operator for Pattern matching
In MongoDB, the regex operator is used to search for specific strings in a collection. The following example demonstrates how this can be accomplished. It is useful when we do not know the exact field value for which we are searching in the document.
Assume we have the Employee collection with the Field names "employeeid" and "employeename". Assume we also have the following documents in our collection.
Employee id | Employee name |
---|
13 | Ishan |
18 | Madan |
9 | Abhresh |
25 | Piyush |
11 | Chinmay |
19 | Ishan123 |
20 | Ishan12 |
Let’s see how we can add these documents in our database.
For this tutorial, you need to have MongoDB and MongoDB Compass installed on your system.
- Step 1: Create a database “regx” and collection “employee” using MongoDB compass.
- Step 2: Now, go to Add Data option and select insert document and add the following Documents in the collection:
{
employeeid: 13,
employeename: “Ishan”
}
{
employeeid: 19,
employeename: “Madan”
}
{
employeeid: 9,
employeename: “Abhresh”
}
{
employeeid: 25,
employeename: “Piyush”
}
{
employeeid: 11,
employeename: “Chinmay”
}
{
employeeid: 19,
employeename: “Ishan123”
}
{
employeeid: 20,
employeename: “Ishan12”
}
You will find the added documents as:
You can check the created documents in the database:
Open the command prompt and type mongo command in it and it will open the mongoshell for you. Now, use the following command:
db.employee.find().pretty()
The output will be like:
Note:
- You are not permitted to use the $regex operator within the $in operator.
- If you want to include a regular expression within a comma-separated list of query conditions, use the $regex operator.
- If you want to use the x and s options, you must use the $regex operator expression in conjunction with $options.
- Starting with the latest version of MongoDB, you can use the $not operator in combination with the $regex operator expression.
- If the index of the specified field is available for case-sensitive regular expression queries, MongoDB matches the regular expression to the values in the index. Rather than scanning all of the collections, this is the quickest way to find a match. They do not effectively use index for case-insensitive regular expression queries.
- If you want to use Perl compatible regular expressions that are not supported by JavaScript, you must use the $regex operator.
To advance your career in web design, enroll in best Web Design course by KnowledgeHut.
Syntax:
{ <field>: { $regex: /pattern/, $options: '<options>' } }
{ <field>: { $regex: 'pattern', $options: '<options>' } }
Write the code to find "employee name" with the initials "Is."
db.employee.find({employeename: {$regex: "Is"}}).forEach(printjson)
The output clearly shows that only documents with the 'Is' characters in the employeename are returned.
We want to find all Employee Names that contain the characters 'Is'. As a result, we use the $regex operator to define the 'Is' search criteria. Here, printjson is used to print each document returned by the query in a more efficient manner.
Example:
Assume you have the following documents in your collection, with an additional document containing the Employee Name "Ishan123." If you entered "Ishan12" as the search criteria, it would also return the document with "Ishan123."
But what if we didn't want this and just wanted to return the document with the name "Ishan12"? Then we can use exact pattern matching to accomplish this. We'll use the and $ characters to do exact pattern matching. We'll put the character at the beginning of the string and the $ at the end.
db.employee.find({employeename : {$regex: "^Ishan12$"}}).forEach(printjson)
In the search criteria, we use the and $ characters. The symbol ensures that the string begins with a specific character, and the symbol “$” ensures that the string ends with a specific character. As a result, when the code runs, it will only return the string "Ishan12".
Example:
Displaying the details of employee whose name ends with ‘h’.
db.employee.find({employeename:{$regex:"h$"}}).pretty()
We are displaying the documents of employees whose names begin with the letter 'h'. So, for the employeename field in the find() method, we pass a regular expression using the $regex operator (i.e. $regex: "h$").
Pattern matching without the regex operator
We can do pattern matching in MongoDB without using the $regex operator. Simply put, by specifying a regular expression using a regular expression object. You can also use regular expressions inside the $ in operator when using the regular expression object.
Syntax:
{ <field>: /pattern/<options> }
// indicates that you should specify your search criteria between these delimiters.
Example:
db.employee.find({employeename: /Ish/}).forEach(printjson)
We are displaying the documents of employees whose names contain the "Ish" string in this section. So, in the find() method, we pass a regular expression (i.e., employeename: /Ish/) for the employeename field. / indicates that you want to specify your search criteria between these delimiters, i.e., /Ish/. The "//" options essentially mean to limit your search criteria to these delimiters.
Using regex Expression with Case Insensitive
In a case-sensitive situation, regular expression makes use of $options and the parameter with the value $i. In this example, we show how the regular expression works in a case-sensitive situation. The following query will return the value containing "Ishan," regardless of whether the words are smaller or capitalised.
The following options are available for use with regular expression in MongoDB:
- i: To match both lower and upper-case patterns in the string.
- m: To include ^ and $ in the pattern in the match, i.e. to look specifically for ^ and $ within the string. These anchors will match at the beginning or end of the string if this option is not selected.
- x: Indicates that all white space characters in the $regex pattern should be ignored.
- s: To make the dot character "." match all characters, including newline characters.
We will take forward the same example as used in the previous section.
For example, you wanted to find all documents with the word 'Ish' in the Employee Name, regardless of whether it was case sensitive or not. If we want such a result, we must use the $options with case insensitivity parameter.
Employee id | Employee name |
---|
13 | Ishan |
18 | Madan |
9 | Abhresh |
25 | Piyush |
11 | Chinmay |
19 | Ishan123 |
20 | Ishan12 |
Run the following query:
db.employee.find({employeename:{$regex: "ish",$options:'i'}}).forEach(printjson)
In the above query, we search for ‘ish’ in the employeename irrespective of its case. But it is still showing us all the documents, this is the power of $options.
The $options with 'I' parameter (case insensitivity) specifies that we want to run the search regardless of whether we find the letters 'Gu' in lower or upper case.
Looking to kickstart your coding journey? Discover the best online course for Python beginners. Unleash your potential with Python's versatility and power. Start coding today!
Optimizing Regular Expression Queries
We can optimise MongoDB regular expression queries in the same way that we do in relational database queries. Some of the concepts used in MongoDB for query optimization are as follows:
- We can create database indexes on the document fields so that the MongoDB query can use the indexed values to match the regular expression. When compared to traditional regular expression scanning, this speeds up search and data retrieval from the collection.
- Create a query that uses regular expression as a prefix expression and requires all matches to begin with a specific string character.
- For example, if the regex expression is example, then this query will only look for strings that begin with example.
- A simple search against a text index returns all documents with indexed text that contain the words we are looking for. This is too broad, but it is already a subset of the desired result.
- A regex query combined with a logical and will only traverse the text search query's superset.
- If the text search generates no results, the regex query will not be executed.
- This will significantly reduce CPU load and speed up your queries, especially for large datasets. In my tests, queries ran 10 times faster while returning the same results as regex queries alone.
Doing More With Regular Expressions
We learned about MongoDB regular expressions for pattern matching using $regex and $option.
- The $regex operator can be used to match patterns. This operator can be used to search the collection for specific strings.
- The ^ and $ symbols can be used for exact text searches, with being used to ensure that the string begins with a specific character and $ being used to ensure that the string ends with a specific character.
- The I operator, in conjunction with the $regex operator, can be used to specify case insensitivity, allowing strings to be searched whether they are in lower or upper case.
- The delimiters // can also be used to match patterns.
- A regular expression can search an indexed document by matching the values to the values in the index. If the regular expression is in prefix form, it will look for all results that begin with a specific prefix.