
Domains
Agile Management
Master Agile methodologies for efficient and timely project delivery.
View All Agile Management Coursesicon-refresh-cwCertifications
Scrum Alliance
16 Hours
Best Seller
Certified ScrumMaster (CSM) CertificationScrum Alliance
16 Hours
Best Seller
Certified Scrum Product Owner (CSPO) CertificationScaled Agile
16 Hours
Trending
Leading SAFe 6.0 CertificationScrum.org
16 Hours
Professional Scrum Master (PSM) CertificationScaled Agile
16 Hours
SAFe 6.0 Scrum Master (SSM) CertificationAdvanced Certifications
Scaled Agile, Inc.
32 Hours
Recommended
Implementing SAFe 6.0 (SPC) CertificationScaled Agile, Inc.
24 Hours
SAFe 6.0 Release Train Engineer (RTE) CertificationScaled Agile, Inc.
16 Hours
Trending
SAFe® 6.0 Product Owner/Product Manager (POPM)IC Agile
24 Hours
ICP Agile Certified Coaching (ICP-ACC)Scrum.org
16 Hours
Professional Scrum Product Owner I (PSPO I) TrainingMasters
32 Hours
Trending
Agile Management Master's Program32 Hours
Agile Excellence Master's ProgramOn-Demand Courses
Agile and ScrumRoles
Scrum MasterTech Courses and Bootcamps
Full Stack Developer BootcampAccreditation Bodies
Scrum AllianceTop Resources
Scrum TutorialProject Management
Gain expert skills to lead projects to success and timely completion.
View All Project Management Coursesicon-standCertifications
PMI
36 Hours
Best Seller
Project Management Professional (PMP) CertificationAxelos
32 Hours
PRINCE2 Foundation & Practitioner CertificationAxelos
16 Hours
PRINCE2 Foundation CertificationAxelos
16 Hours
PRINCE2 Practitioner CertificationSkills
Change ManagementMasters
Job Oriented
45 Hours
Trending
Project Management Master's ProgramUniversity Programs
45 Hours
Trending
Project Management Master's ProgramOn-Demand Courses
PRINCE2 Practitioner CourseRoles
Project ManagerAccreditation Bodies
PMITop Resources
Theories of MotivationCloud Computing
Learn to harness the cloud to deliver computing resources efficiently.
View All Cloud Computing Coursesicon-cloud-snowingCertifications
AWS
32 Hours
Best Seller
AWS Certified Solutions Architect - AssociateAWS
32 Hours
AWS Cloud Practitioner CertificationAWS
24 Hours
AWS DevOps CertificationMicrosoft
16 Hours
Azure Fundamentals CertificationMicrosoft
24 Hours
Best Seller
Azure Administrator CertificationMicrosoft
45 Hours
Recommended
Azure Data Engineer CertificationMicrosoft
32 Hours
Azure Solution Architect CertificationMicrosoft
40 Hours
Azure DevOps CertificationAWS
24 Hours
Systems Operations on AWS Certification TrainingAWS
24 Hours
Developing on AWSMasters
Job Oriented
48 Hours
New
AWS Cloud Architect Masters ProgramBootcamps
Career Kickstarter
100 Hours
Trending
Cloud Engineer BootcampRoles
Cloud EngineerOn-Demand Courses
AWS Certified Developer Associate - Complete GuideAuthorized Partners of
AWSTop Resources
Scrum TutorialIT Service Management
Understand how to plan, design, and optimize IT services efficiently.
View All DevOps Coursesicon-git-commitCertifications
Axelos
16 Hours
Best Seller
ITIL 4 Foundation CertificationAxelos
16 Hours
ITIL Practitioner CertificationPeopleCert
16 Hours
ISO 14001 Foundation CertificationPeopleCert
16 Hours
ISO 20000 CertificationPeopleCert
24 Hours
ISO 27000 Foundation CertificationAxelos
24 Hours
ITIL 4 Specialist: Create, Deliver and Support TrainingAxelos
24 Hours
ITIL 4 Specialist: Drive Stakeholder Value TrainingAxelos
16 Hours
ITIL 4 Strategist Direct, Plan and Improve TrainingOn-Demand Courses
ITIL 4 Specialist: Create, Deliver and Support ExamTop Resources
ITIL Practice TestData Science
Unlock valuable insights from data with advanced analytics.
View All Data Science Coursesicon-dataBootcamps
Job Oriented
6 Months
Trending
Data Science BootcampJob Oriented
289 Hours
Data Engineer BootcampJob Oriented
6 Months
Data Analyst BootcampJob Oriented
288 Hours
New
AI Engineer BootcampSkills
Data Science with PythonRoles
Data ScientistOn-Demand Courses
Data Analysis Using ExcelTop Resources
Machine Learning TutorialDevOps
Automate and streamline the delivery of products and services.
View All DevOps Coursesicon-terminal-squareCertifications
DevOps Institute
16 Hours
Best Seller
DevOps Foundation CertificationCNCF
32 Hours
New
Certified Kubernetes AdministratorDevops Institute
16 Hours
Devops LeaderSkills
KubernetesRoles
DevOps EngineerOn-Demand Courses
CI/CD with Jenkins XGlobal Accreditations
DevOps InstituteTop Resources
Top DevOps ProjectsBI And Visualization
Understand how to transform data into actionable, measurable insights.
View All BI And Visualization Coursesicon-microscopeBI and Visualization Tools
Certification
24 Hours
Recommended
Tableau CertificationCertification
24 Hours
Data Visualization with Tableau CertificationMicrosoft
24 Hours
Best Seller
Microsoft Power BI CertificationTIBCO
36 Hours
TIBCO Spotfire TrainingCertification
30 Hours
Data Visualization with QlikView CertificationCertification
16 Hours
Sisense BI CertificationOn-Demand Courses
Data Visualization Using Tableau TrainingTop Resources
Python Data Viz LibsCyber Security
Understand how to protect data and systems from threats or disasters.
View All Cyber Security Coursesicon-refresh-cwCertifications
CompTIA
40 Hours
Best Seller
CompTIA Security+EC-Council
40 Hours
Certified Ethical Hacker (CEH v12) CertificationISACA
22 Hours
Certified Information Systems Auditor (CISA) CertificationISACA
40 Hours
Certified Information Security Manager (CISM) Certification(ISC)²
40 Hours
Certified Information Systems Security Professional (CISSP)(ISC)²
40 Hours
Certified Cloud Security Professional (CCSP) Certification16 Hours
Certified Information Privacy Professional - Europe (CIPP-E) CertificationISACA
16 Hours
COBIT5 Foundation16 Hours
Payment Card Industry Security Standards (PCI-DSS) CertificationOn-Demand Courses
CISSPTop Resources
Laptops for IT SecurityWeb Development
Learn to create user-friendly, fast, and dynamic web applications.
View All Web Development Coursesicon-codeBootcamps
Career Kickstarter
6 Months
Best Seller
Full-Stack Developer BootcampJob Oriented
3 Months
Best Seller
UI/UX Design BootcampEnterprise Recommended
6 Months
Java Full Stack Developer BootcampCareer Kickstarter
490+ Hours
Front-End Development BootcampCareer Accelerator
4 Months
Backend Development Bootcamp (Node JS)Skills
ReactOn-Demand Courses
Angular TrainingTop Resources
Top HTML ProjectsBlockchain
Understand how transactions and databases work in blockchain technology.
View All Blockchain Coursesicon-stop-squareBlockchain Certifications
40 Hours
Blockchain Professional Certification32 Hours
Blockchain Solutions Architect Certification32 Hours
Blockchain Security Engineer Certification24 Hours
Blockchain Quality Engineer Certification5+ Hours
Blockchain 101 CertificationOn-Demand Courses
NFT Essentials 101: A Beginner's GuideTop Resources
Blockchain Interview QsProgramming
Learn to code efficiently and design software that solves problems.
View All Programming Coursesicon-codeSkills
Python CertificationInterview Prep
Career Accelerator
3 Months
Software Engineer Interview PrepOn-Demand Courses
Data Structures and Algorithms with JavaScriptTop Resources
Python TutorialData Science
4.7 Rating 3 Questions 8 mins read7 Readers

Natural Language Processing (NLP) is an amalgamation of machine learning, computer science and linguistics that gives machines the ability to understand natural language in the manner it is spoken and written.
The study of NLP has been around for over 50 years and prior to having powerful computers, the implementation of NLP was limited to heuristic-based rules which often lead to inaccurate results and limited the scope of its use cases. But now with the help of computers, tasks like text summarization, language translation, chatbot, spelling correction, text auto-completion, image captioning etc. have been made possible.
Usage of Natural Language Processing can be seen all around us. Here are two very commonly occurring use cases:
Some of the commonly performed NLP tasks include:
Listed below are some of the most used NLP libraries:
When performing feature engineering on text data, we can extract the below mentioned features:
Collocations are defined as phrases or expressions containing multiple words, that are highly likely to co-occur. For example, words like ‘ice cream’, ‘machine learning’, ‘natural language processing’ etc. Collocations are different from bi-grams or tri-grams. Bi-grams don’t always form meaningful phrases. Pointwise Mutual Information or PMI score is used for identifying these collocations from texts. It is calculated as below.

In the above formula, p(a,b) indicates the probability that two token ‘a’ and ‘b’ occur together in a piece of text, p(a) and p(b) indicate the probability that tokens ‘a’ and ‘b’ occur individually in the same text. We can decide a threshold above which all collocations can be filtered from the document.
Phonetic Hashing is Lexical Processing technique which is used to reduce the different pronunciations of the same word to a common base form. A common example of this problem occurs with pronunciation of the capital of India which is New Delhi. Delhi is also pronounced as “Dilli” and hence it’s not surprising to find both these variants in uncleaned text corpus.
Phonetic Hashing buckets all the similar Phonemes, which are words with similar sound or pronunciation, into a single bucket and gives all these variations a single hash code. Hence, the words ‘Dilli’ and ‘Delhi’ will have the same code. It is performed using the Soundex Algorithm. Let’s compute the Soundex hash code of the word “Mississippi”.
Zero Probability Problem occurs when we have an instance in the test dataset which contains a category that was absent in the training dataset. In such a case the conditional probability of P(x|Ci) becomes 0 which in-turn makes the overall probability estimate of P(Ci|x) equal to zero. And thus, we are unable to estimate any probability for the classes. In order to overcome this problem a commonly used technique is called Laplace Smoothing in which we add a small number such as 1 to each value in our dataset.
Linear Discriminant Analysis or LDA is a dimensionality reduction technique which primarily used for supervised classification problems. It creates a linear combination of features such that the new features separate the two or more classes in the original data. The two criteria that are used by LDA are:
In other words, it is trying to find a lower dimensional space in which the ratio
There are 4 main types of RNN architectures. Let’s take a look at them one-by-one.
Name Entity Recognition or NER is a subtask of Information Extraction. The term ‘Named Entity’ refers to a real-world object, such as a person, location, organization and money. Narendra Modi, India and KnowledgeHut are all examples of Named Entities. NER is process of classification and extraction of these entities from documents into pre-defined categories like person, location, quantities, organization etc. For example, in the sentence “Narendra Modi is the Prime Minister of India”, NER would identify ‘Narendra Modi’ as a Person and ‘India’ as a Location.
Spacy is an open-source python library which can be used to perform NER. Below is a code snippet which demonstrates how we can perform NER using Spacy.
import spacy #Load the spacy model to be used in the program ner_model = spacy.load("en_core_web_sm") #Sample text to run NER text = "Narendra Modi is the Prime Minister of India" text_doc = ner_model(text) #iterate through each entity in the document and print the NER label for entity in text_doc.ents: print(entity.text, entity.label_)
The output of the above process would look like below:
Narendra Modi Person India GPE
NLTK or “Natural Language ToolKit” is an open-source python library. We can perform a host of NLP tasks using this library such as:
Tokenization
#Importing all the dependencies import nltk from nltk.tokenize import sent_tokenize #Sample text to tokenize text = "This is a sample text which is to be tokenized” #Print the list of tokens generted print(word_tokenize(text))
import nltk from nltk.corpus import stopwords #Printing stopwords which are available in the package. print(stopwords.words("english"))
rom nltk.stem.porter import PorterStemmer text = "This is a sample text which is to be stemmed” text_tokens = word_tokenize(text)) # Reduce words to their stems stemmed = [PorterStemmer().stem(w) for w in text_tokens] print(stemmed)
from nltk.stem.wordnet import WordNetLemmatizer text = "This is a sample text which is to be lemmatized” text_tokens = word_tokenize(text)) # Reduce words to their root form lemmed = [WordNetLemmatizer().lemmatize(w) for w in text_tokens] print(lemmed)
POS Tagging
import nltk from nltk.tokenize import word_tokenize text = "This sentence is an used for performing POS Tagging using NLTK" text_tokens = word_tokenize(text) for token in text_tokens: token_tag = nltk.pos_tag(token) print(token_tag)
Just like NLTK, we can also use Spacy to perform a host of NLP related tasks:
import spacy
nlp = spacy.load('en_core_web_sm')
# Create an nlp object
doc = nlp("This is a sample sentence which is to be tokenized.")
# Print the list of tokens generated
print([token for token in doc])
import spacy
nlp = spacy.load('en_core_web_sm')
# Create an nlp object
doc = nlp("This is a sample sentence which is to be tokenized.")
# Iterate over the tokens
for token in doc:
# Print the token and if it's a stopword
print(token.text, token.is_stop)
import spacy
nlp = spacy.load('en_core_web_sm')
# Create an nlp object
doc = nlp("This is a sample sentence which is to be tokenized.")
# Iterate over the tokens
for token in doc:
# Print the token and it's lemma.
print(token.text, token.lemma_)
import spacy
#Load the spacy model to be used in the program
ner_model = spacy.load("en_core_web_sm")
#Sample text to run NER
text = "Narendra Modi is the Prime Minister of India"
text_doc = ner_model(text)
#iterate through each entity in the document and print the NER label
for entity in text_doc.ents:
print(entity.text, entity.label_)
Principal Component Analysis or PCA is a dimensionality reduction or feature extraction technique. It is a statistical process that converts observations containing correlated features into a set of orthogonal, uncorrelated features called “Principal Components” (PC). If the original dataset contains ‘n’ number of features, then PCA will create ‘n’ Principal Components. Consider this example given below:


In the above example, Figure 1 shows 2 features X1 and X2 in the original dataset. PCA will try to find directions that can capture as much variance as possible from the original data. Hence, once the algorithm is run, the two Principal Components, Z1 and Z2 are shown in Figure 2. Given below are the properties of these Principal Components:

In case of NLP when we use feature extraction techniques like BOW, TF-IDF etc., it results in a high dimensional dataset. PCA can help us identify the main Principal Components that carry maximum amount of information. We can then project the original data on the new principal components to get a lower dimensional dataset.
Sequence data are data points in which the observations are ordered in a meaningful manner such as a time series data in which the observations are ordered based on time. An audio clip is another sequence data in which the words are present in the order in which they are being spoken.
Sequence Models are machine learning models that input or output sequential data. Recurrent Neural Networks (RNN) are a popular example of Sequence Models. Below are a couple of use cases of these models: