Suman is a Data Scientist working for a Fortune Top 5 company. His expertise lies in the field of Machine Learning, Time Series & NLP. He has built scalable solutions for retail & manufacturing organisations.
Data Structures and Algorithms are one of computer science's most important fields of study. Any app or software which we use has thousands of lines of code written by leveraging several Data Structures and Algorithms. Like any other programming language, Python also provides the flexibility to use simple and complex data structures and algorithms. Refer to the Data Science Bootcamp Curriculum to get a deeper understanding of Data Structures and Algorithms using Python.
Python is one of the most used programming languages in Data Science. The multidimensional nature of Python makes it usable for various types of projects. It is simple, flexible, and provides a rich vein of libraries and packages to perform various data operations.
In terms of Data Science, there are two aspects of Python which are commonly used in real life –
Data Structures allow efficient storing and retrieval of data which is highly desirable in an enterprise setting.
The significance of Data Structures using Python cannot be understated. The Data Science course explains various built-in data structures and their functionalities. Some built-in types of Data Structures in Python are often used in our day-to-day activities. Let’s quickly learn two new terms before we dive ahead:
1. Tuple: Tuples are immutable Data Structures in Python. You cannot perform addition, deletion operation in a tuple. Additionally, you cannot modify items once it is already defined in a tuple.
2. List: List is a sequential data structure which is mutable in nature.
3. Dictionary: A dictionary data structures follows the concept of key-value pairs. Though the dictionary keys are immutable, the values could be modified.
4. Set: A lot of times, you need to pass unique data to your program. A set is an unordered collection of items, where multiple items can be stored into a single variable.
Apart from the built-in Data Structures we mentioned, several user-defined data structures are used extensively while building large-scale systems in various companies.
1. Arrays: You could use the array's data structure to store elements in contiguous memory locations. In an array, each element could be accessed by its index. An array stores elements of the same data type together, wherein adding an offset to the base value gives the position of an element. Given the element's index, it is very fast to retrieve an element.
2. Stack: It is a data structure with linear characteristics. A stack could be described as a collection of dishes stacked on top of one another.
3. Queues: Similar to a ticket queue outside any movie theater or railway station, the Python Queue data structure follows the First In First Out (FIFO) principle. The item, which is inserted first in this data structure, is the one which is removed first. When you insert elements in a queue, it is termed as enqueue, whereas when it is removed, it’s referred to as dequeue.
4. Trees: A non-linear Data Structure consisting of nodes which are connected via edges. Some of the notable terminologies in a tree data structure are node, edge, root, depth of a node, height of a node, height of a tree, etc.
class BinaryTreeNode: def __init__(self, data): self.data = data self.leftChild = None self.rightChild=None
5. Linked Lists: A linear data structure where a series of nodes are connected via an address. Each node has data and an address that points to the next node. The address of the first node is generally termed as Head, and the last node generally points to NULL. The types of Linked Lists are Single, Double, and Circular.
Class Node: def __init__(self, data, next = None): self.data = data self.next = next
6. Graphs: In a Graph Data Structure, all nodes have data and are connected to each other. E.g., On Facebook, everything is connected via a Graph Data Structure. Whenever you join a new group or like a page, a new edge is created to store that relationship within the graph data structure. Some of the key terminologies of this data structure are Adjacency, Path, and Directed Graph.
def generate_edges(graph): edges =  # for each node in graph for node in graph: # for each neighbour node of a single node for neighbour in graph[node]: # if edge exists then append edges.append((node, neighbour)) return edges
7. Hash Maps: Like a Dictionary, a hash-map stores elements in the form of key-value pairs. Some of the implementations of hash maps are for constant-time data search, cryptographic applications, and so on.
When working on a business problem, you need to devise a set of instructions or rules that could solve that problem. These sets of rules could be deemed as Algorithms. These are mainly executed in a finite sequential order. An algorithm could solve problems ranging from simple sorting to recommending certain products in e-commerce.
Certain steps need to be followed while developing any algorithm.
While developing an algorithm, the end goal or the objective function should be clear. There are certain assumptions that validate a good algorithm.
To know more about the importance of Data Structures and algorithms, refer to KnowledgeHut’s best Data Science Bootcamps.
The algorithms could be divided into several classes.
Sorting is a technique which is often used in our daily work stream. Any Data Structures and Algorithms course would teach different sorting algorithms. Some of the mostly used sorting python algorithms examples are -
1. Selection Sort: The original unsorted array is repeatedly sorted by finding the minimum element and replacing the first element with this value. Its time complexity in all of worst, average and best scenarios is O(n^2). Whenever, the list of elements is small, selection sort could be used.
2. Bubble Sort: In this sorting algorithm, two adjacent elements are compared and swapped until they are in their intended order. The best time complexity is O(n) where both the average and worst time complexity is O(n^2). It is most simple of sorting algorithms.
3. Insertion Sort: Here the first element is assumed sorted, and it is compared against the next element. This way after each iteration unsorted elements are placed in their intended place. The time complexity for insertion sort is same as bubble sort. Mostly used when the number of elements is less.
4. Merge Sort: It is based on the principle of Divide and Conquer as already mentioned earlier. The time complexity of merge sort is O(n*logn) for all the three cases.
5. Quick Sort: It is also based on divide and conquer principle where a pivot element is selected first such that elements less than that pivot are placed on its left while those which are higher are placed on its right. Both left and right sub-arrays now created follows this same approach until each has one single element. The worst time complexity is O(n^2), whereas the best and average time complexity is O(n*logn).
6. Shell Sort: The elements which are apart from each other are sorted first which reduces the interval between the elements that are to be sorted. The time complexity is same as Quick sort.
Data Structures and Algorithms go together. Algorithms are generally created to retrieve, search, and sort elements stored in a Data Structure. While choosing the best algorithm, it is important to understand the software, or the system being built and its complexities.
If a project requires low latency, then it’s recommended to choose an algorithm with faster computation. On the other hand, if space is a constraint, you need to understand which data structure to use to ensure less space complexity. Thus, the optimal one should be considered for problem-solving with algorithms and data structures in python.
In this blog, we discussed about various Data Structures and Algorithms in Python. Since building large scale enterprise system requires the best usage of DSA, it is important to learn Data structures and Algorithms.
Python is definitely an intuitive language to learn. The functionalities of Python would make it easier to master DSA. This blog has provided a comprehensive view on some of the common data structure algorithm in python.
Python has both 4 built-in Data Structures. Those are list, tuple, sets and dictionary. There are other user defined Data Structures as well.
The best way to learn data structures and algorithms is to pick up a problem statement across various DSA topics and start practicing them using Python. You would also find many data structures in python tutorial online.
There is no such Data Structure as most important. The usefulness is based on the problem statement. However, arrays, strings, stacks, queues, heaps, trees are some of the most important ones.
Syntax are the numerical and English words arranged to create instructions for solving a problem.