
Domains
Agile Management
Master Agile methodologies for efficient and timely project delivery.
View All Agile Management Coursesicon-refresh-cwCertifications
Scrum Alliance
16 Hours
Best Seller
Certified ScrumMaster (CSM) CertificationScrum Alliance
16 Hours
Best Seller
Certified Scrum Product Owner (CSPO) CertificationScaled Agile
16 Hours
Trending
Leading SAFe 6.0 CertificationScrum.org
16 Hours
Professional Scrum Master (PSM) CertificationScaled Agile
16 Hours
SAFe 6.0 Scrum Master (SSM) CertificationAdvanced Certifications
Scaled Agile, Inc.
32 Hours
Recommended
Implementing SAFe 6.0 (SPC) CertificationScaled Agile, Inc.
24 Hours
SAFe 6.0 Release Train Engineer (RTE) CertificationScaled Agile, Inc.
16 Hours
Trending
SAFe® 6.0 Product Owner/Product Manager (POPM)IC Agile
24 Hours
ICP Agile Certified Coaching (ICP-ACC)Scrum.org
16 Hours
Professional Scrum Product Owner I (PSPO I) TrainingMasters
32 Hours
Trending
Agile Management Master's Program32 Hours
Agile Excellence Master's ProgramOn-Demand Courses
Agile and ScrumRoles
Scrum MasterTech Courses and Bootcamps
Full Stack Developer BootcampAccreditation Bodies
Scrum AllianceTop Resources
Scrum TutorialProject Management
Gain expert skills to lead projects to success and timely completion.
View All Project Management Coursesicon-standCertifications
PMI
36 Hours
Best Seller
Project Management Professional (PMP) CertificationAxelos
32 Hours
PRINCE2 Foundation & Practitioner CertificationAxelos
16 Hours
PRINCE2 Foundation CertificationAxelos
16 Hours
PRINCE2 Practitioner CertificationSkills
Change ManagementMasters
Job Oriented
45 Hours
Trending
Project Management Master's ProgramUniversity Programs
45 Hours
Trending
Project Management Master's ProgramOn-Demand Courses
PRINCE2 Practitioner CourseRoles
Project ManagerAccreditation Bodies
PMITop Resources
Theories of MotivationCloud Computing
Learn to harness the cloud to deliver computing resources efficiently.
View All Cloud Computing Coursesicon-cloud-snowingCertifications
AWS
32 Hours
Best Seller
AWS Certified Solutions Architect - AssociateAWS
32 Hours
AWS Cloud Practitioner CertificationAWS
24 Hours
AWS DevOps CertificationMicrosoft
16 Hours
Azure Fundamentals CertificationMicrosoft
24 Hours
Best Seller
Azure Administrator CertificationMicrosoft
45 Hours
Recommended
Azure Data Engineer CertificationMicrosoft
32 Hours
Azure Solution Architect CertificationMicrosoft
40 Hours
Azure DevOps CertificationAWS
24 Hours
Systems Operations on AWS Certification TrainingAWS
24 Hours
Developing on AWSMasters
Job Oriented
48 Hours
New
AWS Cloud Architect Masters ProgramBootcamps
Career Kickstarter
100 Hours
Trending
Cloud Engineer BootcampRoles
Cloud EngineerOn-Demand Courses
AWS Certified Developer Associate - Complete GuideAuthorized Partners of
AWSTop Resources
Scrum TutorialIT Service Management
Understand how to plan, design, and optimize IT services efficiently.
View All DevOps Coursesicon-git-commitCertifications
Axelos
16 Hours
Best Seller
ITIL 4 Foundation CertificationAxelos
16 Hours
ITIL Practitioner CertificationPeopleCert
16 Hours
ISO 14001 Foundation CertificationPeopleCert
16 Hours
ISO 20000 CertificationPeopleCert
24 Hours
ISO 27000 Foundation CertificationAxelos
24 Hours
ITIL 4 Specialist: Create, Deliver and Support TrainingAxelos
24 Hours
ITIL 4 Specialist: Drive Stakeholder Value TrainingAxelos
16 Hours
ITIL 4 Strategist Direct, Plan and Improve TrainingOn-Demand Courses
ITIL 4 Specialist: Create, Deliver and Support ExamTop Resources
ITIL Practice TestData Science
Unlock valuable insights from data with advanced analytics.
View All Data Science Coursesicon-dataBootcamps
Job Oriented
6 Months
Trending
Data Science BootcampJob Oriented
289 Hours
Data Engineer BootcampJob Oriented
6 Months
Data Analyst BootcampJob Oriented
288 Hours
New
AI Engineer BootcampSkills
Data Science with PythonRoles
Data ScientistOn-Demand Courses
Data Analysis Using ExcelTop Resources
Machine Learning TutorialDevOps
Automate and streamline the delivery of products and services.
View All DevOps Coursesicon-terminal-squareCertifications
DevOps Institute
16 Hours
Best Seller
DevOps Foundation CertificationCNCF
32 Hours
New
Certified Kubernetes AdministratorDevops Institute
16 Hours
Devops LeaderSkills
KubernetesRoles
DevOps EngineerOn-Demand Courses
CI/CD with Jenkins XGlobal Accreditations
DevOps InstituteTop Resources
Top DevOps ProjectsBI And Visualization
Understand how to transform data into actionable, measurable insights.
View All BI And Visualization Coursesicon-microscopeBI and Visualization Tools
Certification
24 Hours
Recommended
Tableau CertificationCertification
24 Hours
Data Visualization with Tableau CertificationMicrosoft
24 Hours
Best Seller
Microsoft Power BI CertificationTIBCO
36 Hours
TIBCO Spotfire TrainingCertification
30 Hours
Data Visualization with QlikView CertificationCertification
16 Hours
Sisense BI CertificationOn-Demand Courses
Data Visualization Using Tableau TrainingTop Resources
Python Data Viz LibsCyber Security
Understand how to protect data and systems from threats or disasters.
View All Cyber Security Coursesicon-refresh-cwCertifications
CompTIA
40 Hours
Best Seller
CompTIA Security+EC-Council
40 Hours
Certified Ethical Hacker (CEH v12) CertificationISACA
22 Hours
Certified Information Systems Auditor (CISA) CertificationISACA
40 Hours
Certified Information Security Manager (CISM) Certification(ISC)²
40 Hours
Certified Information Systems Security Professional (CISSP)(ISC)²
40 Hours
Certified Cloud Security Professional (CCSP) Certification16 Hours
Certified Information Privacy Professional - Europe (CIPP-E) CertificationISACA
16 Hours
COBIT5 Foundation16 Hours
Payment Card Industry Security Standards (PCI-DSS) CertificationOn-Demand Courses
CISSPTop Resources
Laptops for IT SecurityWeb Development
Learn to create user-friendly, fast, and dynamic web applications.
View All Web Development Coursesicon-codeBootcamps
Career Kickstarter
6 Months
Best Seller
Full-Stack Developer BootcampJob Oriented
3 Months
Best Seller
UI/UX Design BootcampEnterprise Recommended
6 Months
Java Full Stack Developer BootcampCareer Kickstarter
490+ Hours
Front-End Development BootcampCareer Accelerator
4 Months
Backend Development Bootcamp (Node JS)Skills
ReactOn-Demand Courses
Angular TrainingTop Resources
Top HTML ProjectsBlockchain
Understand how transactions and databases work in blockchain technology.
View All Blockchain Coursesicon-stop-squareBlockchain Certifications
40 Hours
Blockchain Professional Certification32 Hours
Blockchain Solutions Architect Certification32 Hours
Blockchain Security Engineer Certification24 Hours
Blockchain Quality Engineer Certification5+ Hours
Blockchain 101 CertificationOn-Demand Courses
NFT Essentials 101: A Beginner's GuideTop Resources
Blockchain Interview QsProgramming
Learn to code efficiently and design software that solves problems.
View All Programming Coursesicon-codeSkills
Python CertificationInterview Prep
Career Accelerator
3 Months
Software Engineer Interview PrepOn-Demand Courses
Data Structures and Algorithms with JavaScriptTop Resources
Python TutorialA regular expression also known as regex is a sequence of characters that defines a search pattern. Regular expressions are used in search algorithms, search and replace dialogs of text editors, and in lexical analysis. It is also used for input validation. It is a technique that developed in theoretical computer science and formal language theory.
Different syntaxes are used for writing regular expressions. One is the POSIX standard and another, widely used, is the Perl syntax.
Manipulation of textual data plays important role in data science projects that require large scale text processing. Many programming languages including Python provide regex capabilities, built-in or via libraries. Python's standard library has 're' module for this purpose.
The most common applications of regular expressions are:
Methods in re module use raw strings as the pattern argument. A raw string is having prefix 'r' or 'R' to the normal string literal.
>>> normal="computer"
>>> print (normal)
computer
>>> raw=r"computer"
>>> print (raw)
computer
Both strings appear similar. The difference is evident when the string literal embeds escape characters ('\n', '\t' etc.)
>>> normal="Hello\nWorld"
>>> print (normal)
Hello
World
>>> raw=r"Hello\nWorld"
>>> print (raw)
Hello\nWorld
In case of normal string, the print() function interprets the escape character. In this case '\n' produces effect of newline character. However because of the raw string operator 'r' the effect of escape character is not translated as per its meaning. The output shows actual construction of string not treating '\n' as newline character.
Regular expressions use two types of characters in the matching pattern string: Meta characters are characters having a special meaning, similar to * in wild card. Literals are alphanumeric characters.
Following list of characters are called the metacharacters.
. ^ $ * + ? { } [ ] \ | ( )
The square brackets[ and ] are used for specifying a set of characters that you wish to match. Characters can be listed individually, or a range of characters can be indicated by giving two characters and separating them by a '-'.
[abc] | Match any of the characters a, b, or c |
[a-c] | Which uses a range to express the same set of characters. |
[a-z] | Match only lowercase letters. |
[0-9] | Match only digits. |
'^' | Complements the character set in [].[^5] will match any character except '5'. |
'\'is an escaping metacharacter followed by various characters to signal various special sequences. If you need to match a [ or \, you can precede them with a backslash to remove their special meaning: \[ or \\.
Some of the special sequences beginning with '\' represent predefined sets of characters.
\d | Matches any decimal digit; this is equivalent to the class [0-9]. |
\D | Matches any non-digit character; this is equivalent to the class [^0-9]. |
\s | Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v]. |
\S | Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v]. |
\w | Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_]. |
\W | Matches any non-alphanumeric character. equivalent to the class [^a-zA-Z0-9_]. |
. | Matches with any single character except newline ‘\n’. |
? | Match 0 or 1 occurrence of the pattern to its left |
+ | 1 or more occurrences of the pattern to its left |
* | 0 or more occurrences of the pattern to its left |
\b | Boundary between word and non-word and /B is opposite of /b |
[..] | Matches any single character in a square bracket and [^..] matches any single character not in square bracket |
\ | It is used for special meaning characters like \. to match a period or \+ for plus sign. |
{n,m} | Matches at least n and at most m occurrences of preceding |
a| b | Matches either a or b |
The re module has following functions:
This method finds match for the pattern if it occurs at start of the string.
re.match(pattern, string)
This function returns None if no match can be found. If they’re successful, a match object instance is returned, containing information about the match: where it starts and ends, the substring it matched, etc.
>>> import re
>>> string="Simple is better than complex."
>>> obj=re.match(r"Simple", string)
>>> obj
<_sre.SRE_Match object; span=(0, 6), match='Simple'>
>>> obj.start()
0
>>> obj.end()
6
The match object's start() method returns the starting position of pattern in the string, and end() returns the endpoint.
If the pattern is not found, the match object is None.
This function searches for first occurrence of RE pattern within string from any position of the string but it only returns the first occurrence of the search pattern.
>>> import re
>>> string="Simple is better than complex."
>>> obj=re.search(r"is", string)
>>> obj.start()
7
>>> obj.end()
9
It helps to get a list of all matching patterns. The return object is the list of all matches.
>>> import re
>>> string="Simple is better than complex."
>>> obj=re.findall(r"ple", string)
>>> obj
['ple', 'ple']
To obtain list of all alphabetic characters from the string
>>> obj=re.findall(r"\w", string)
>>> obj
['S', 'i', 'm', 'p', 'l', 'e', 'i', 's', 'b', 'e', 't', 't', 'e', 'r', 't', 'h', 'a', 'n', 'c', 'o', 'm', 'p', 'l', 'e', 'x']
To obtain list of words
>>> obj=re.findall(r"\w*", string)
>>> obj
['Simple', '', 'is', '', 'better', '', 'than', '', 'complex', '', '']
This function helps to split string by the occurrences of given pattern. The returned object is the list of slices of strings.
>>> import re
>>> string="Simple is better than complex."
>>> obj=re.split(r' ',string)
>>> obj
['Simple', 'is', 'better', 'than', 'complex.']
The string is split at each occurrence of a white space ' ' returning list of slices, each corresponding to a word. Note that output is similar to split() function of built-in str object.
>>> string.split(' ')
['Simple', 'is', 'better', 'than', 'complex.']
This function returns a string by replacing a certain pattern by its substitute string. Usage of this function is :
re.sub(pattern, replacement, string)
In the example below, the word 'is' gets substituted by 'was' everywhere in the target string.
>>> string="Simple is better than complex. Complex is better than complicated."
>>> obj=re.sub(r'is', r'was',string)
>>> obj
'Simple was better than complex. Complex was better than complicated.'
This function compiles a regular expression pattern into a regular expression object. This is useful when you need to use an expression several times.
>>> string
'Simple is better than complex. Complex is better than complicated.'
>>> pattern=re.compile(r'is')
>>> obj=pattern.match(string)
>>> obj=pattern.search(string)
>>> obj.start()
7
>>> obj.end()
9
>>> obj=pattern.findall(string)
>>> obj
['is', 'is']
>>> obj=pattern.sub(r'was', string)
>>> obj
'Simple was better than complex. Complex was better than complicated.'
Some important cases of using re module
>>> string='Errors should never pass silently. Unless explicitly silenced.'
>>> obj=re.findall(r'\b[aeiouAEIOU]\w+', string)
>>> obj
['Errors', 'Unless', 'explicitly']
>>> emails=['aa@xyz.com', 'bb@abc.com', 'cc@mnop.com']
>>> gmails=[re.sub(r'@\w+.(\w+)','@gmail.com', x) for x in emails]
>>> gmails
['aa@gmail.com', 'bb@gmail.com', 'cc@gmail.com']