10X Sale
kh logo
All Courses
  1. Tutorials
  2. Data Science

Removing Stop Words with NLTK in Python

Updated on Aug 26, 2025
 
12,519 Views
Table of Content

The process of processing the sentences or words that come in the form of input/sent by the user is known as data pre-processing. One of the most important steps in data pre-processing is removing useless data or data that is not complete.

When working on Natural Language Processing problems, it is important to realize that the process shouldn't put its efforts into processing words such as 'the', 'is', 'there' and so on. These words are known as stop words. If stop words are not programmed to be ignored/removed, it will take

up additional space in the database or memory. This way, the efficiency of the code reduces by a great extent.

The NLTK package has a separate package of stop words that can be downloaded. NLTK has stop words in 16 languages which can be downloaded and used. Once it is downloaded, it can be passed as an argument indicating it to ignore these words.

Before getting into the Python code, let us look at a few comparisons of statement with stop words and the same statements without stop words.

Before removing stop words

After removing stop words

Hello my name is Bob. I am the king of my universe

Hello name Bob. king universe

Can you fetch water?

Fetch water

Downloading stop words of the English language

import nltk 
from nltk.corpus import stopwords 
set(stopwords.words('english'))

Output:

{'a', 
'about', 
'above', 
'after', 
'again', 
'against', 
'ain', 
'all', 
'am', 
'an', 
'and', 
'any', 
'are', 
'aren', 
"aren't", 
'as', 
'at', 
'be', 
'because', 
'been', 
'before', 
'being', 
'below', 
'between', 
'both', 
'but', 
'by', 
'can', 
'couldn', 
"couldn't", 
'd', 
'did', 
'didn', 
"didn't", 
'do', 
'does', 
'doesn', 
"doesn't", 
'doing', 
'don', 
"don't", 
'down', 
'during', 
'each', 
'few', 
'for', 
'from', 
'further', 
'had', 
'hadn', 
"hadn't", 
'has', 
'hasn', 
"hasn't", 
'have', 
'haven', 
"haven't", 
'having', 
'he', 
'her', 
'here', 
'hers', 
'herself', 
'him', 
'himself', 
'his', 
'how', 
'i', 
'if', 
'in', 
'into', 
'is', 
'isn', 
"isn't", 
'it', 
"it's", 
'its', 
'itself', 
'just', 
'll', 
'm', 
'ma', 
'me', 
'mightn', 
"mightn't", 
'more', 
'most', 
'mustn', 
"mustn't", 
'my', 
'myself', 
'needn', 
"needn't", 
'no', 
'nor', 
'not', 
'now', 
'o', 
'of', 
'off', 
'on', 
'once', 
'only', 
'or', 
'other', 
'our', 
'ours', 
'ourselves', 
'out', 
'over', 
'own', 
're', 
's', 
'same', 
'shan', 
"shan't", 
'she', 
"she's", 
'should', 
"should've", 
'shouldn', 
"shouldn't", 
'so', 
'some', 
'such', 
't', 
'than', 
'that', 
"that'll", 
'the', 
'their', 
'theirs', 
'them', 
'themselves', 
'then', 
'there', 
'these', 
'they', 
'this', 
'those', 
'through', 
'to', 
'too', 
'under', 
'until', 
'up', 
've', 
'very', 
'was', 
'wasn', 
"wasn't", 
'we', 
'were', 
'weren', 
"weren't", 
'what', 
'when', 
'where', 
'which', 
're', 
's', 
'same', 
'shan', 
"shan't", 
'she', 
"she's", 
'should', 
"should've", 
'shouldn', 
"shouldn't", 
'so', 
'some', 
'such', 
't', 
'than', 
'that', 
"that'll", 
'the', 
'their', 
'theirs', 
'them', 
'themselves', 
'then', 
'there', 
'these', 
'they', 
'this', 
'those', 
'through', 
'to', 
'too', 
'under', 
'until', 
'up', 
've', 
'very', 
'was', 
'wasn', 
"wasn't", 
'we', 
'were', 
'weren', 
"weren't", 
'what', 
'when', 
'where', 
'which', 
'while', 
'who', 
'whom', 
'why', 
'will', 
'with', 
'won', 
"won't", 
'wouldn', 
"wouldn't", 
'y', 
'you', 
"you'd", 
"you'll", 
"you're", 
"you've", 
'your', 
'yours', 
'yourself', 
'yourselves'}

Explanation: The 'nltk' package was imported. The 'nltk' package has a folder named 'corpus' whichcontains stop words of different languages. We specifically considered the stop words from the English language.

Now let us pass a string as input and indicate the code to remove stop words:

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize

example = "Hello there, my name is Bob. I will tell you about Sam so that you know them properly. Sam is a hardworking person with a zealous heart. He is enthusiastic about sports as well as music. He composes his own music with the help of Apu. Apu loves and appreciates Sam's music"

stop_words = set(stopwords.words('english')) 
word_tokens = word_tokenize(example) 
filtered_sentence = [w for w in word_tokens if not w in stop_words] 
filtered_sentence = [] 
for w in word_tokens: 
if w not in stop_words: 
filtered_sentence.append(w) 
print(word_tokens) 
print("\n") 
print(filtered_sentence)

Output:

['Hello', 'there', ',', 'my', 'name', 'is', 'Bob', '.', 'I', 'will', 'tell', 'you', 'about', 'Sam', 'so', 'that', 'you', 'know', 'them', 'properly', '.', 'Sam', 'is', 'a', 'hardworking', 'person', 'with', 'a', 'zealous', 'heart', '.', 'He', 'is', 'enthusiastic', 'about', 'sports', 'as', 'well', 'as', 'music', '.', 'He', 'composes', 'his', own, 'music', 'with', 'the', 'help', 'of', 'Apu', '.', 'Apu', 'loves', 'and', 'appreciates', 'Sam', "'s", 'music'] 
['Hello', ',', 'name', 'Bob', '.', 'I', 'tell', 'Sam', 'know', 'properly', '.', 'Sam', 'hardworking', 'person', 'zealous', 'heart', '.', 'He', 'enthusiastic', 'sports', 'well', 'music', '.', 'He', 'composes', 'music', 'help', 'Apu', '.', 'Apu', 'loves', 'appreciates', 'Sam', "'s", 'music']

In addition to this, domain specific stop words can also be removed by explicitly programming the code to do so. Below is a demonstration of the same.

from nltk.corpus import stopwords 
from nltk.tokenize import word_tokenize

example = "Hello there, my name is Bob. I will tell you about Sam so that you know them properly. Sam is a hardworking person with a zealous heart. He is enthusiastic about sports as well as music. He composes his own music with the help of Apu. Apu loves and appreciates Sam's music"

stop_words = set(stopwords.words('english')) 
word_tokens = word_tokenize(example) 
filtered_sentence = [w for w in word_tokens if not w in stop_words] 
filtered_sentence = [] 
for w in word_tokens: 
if w not in stop_words: 
filtered_sentence.append(w) 
print(word_tokens) 
print("\n") 
more_stop_words = ['Bob', 'Sam', 'Apu'] 
for w in word_tokens: 
if w in more_stop_words: 
filtered_sentence.remove(w) 
print(filtered_sentence)

Output:

['Hello', 'there', ',', 'my', 'name', 'is', 'Bob', '.', 'I', 'will', 'tell', 'you', 'about', 'Sam', 'so', 'that', 'you', 'know', 'them', 'properly', '.', 'Sam', 'is', 'a', 'hardworking', 'person', 'with', 'a', 'zealous', 'heart', '.', 'He', 'is', 'enthusiastic', 'about', 'sports', 'as', 'well', 'as', 'music', '.', 'He', 'composes', 'his', own, 'music', 'with', 'the', 'help', 'of', 'Apu', '.', 'Apu', 'loves', 'and', 'appreciates', 'Sam', "'s", 'music'] ['Hello', ',', 'name', '.', 'I', 'tell', 'know', 'properly', '.', 'hardworking', 'person', 'zealous', 'heart', '.', 'He', 'enthusiastic', 'sports', 'well', 'music', '.', 'He', 'composes', 'music', 'help', '.', 'loves', 'appreciates', "'s", 'music'] 

Explanation: We provided a few sentences as input and wished to remove certain names that we considered as stop words. These words were explicitly passed to a variable as a list of words and were removed using the remove function.

Conclusion

In this post, we understood how to ignore stop words with the help of NLTK package in Python.

+91

By Signing up, you agree to ourTerms & Conditionsand ourPrivacy and Policy

Get your free handbook for CSM!!
Recommended Courses