# What Are Data Structures and Algorithms in Python?

2K
• by Abhresh S
• 30th Apr, 2021
• Last updated on 30th Apr, 2021

Data structures are a group of data elements put together under a single name. They represent a particular way of storing and organizing data in a computer to use them efficiently, whenever required. We can characterize data structures based on how we keep individual data and what operations are available for accessing and manipulating the data.

Data structures are of two types:

• Primitive Data Structures
• Non-Primitive Data Structures

### Primitive Data Structures

Primitive data structures are the most basic form of representing data. They contain pure and simple data. The four primitive data types in Python are:

1. Integer
2. Float
3. String
4. Boolean

Integer

Integer data types represents a whole number ranging from negative infinity to infinity – the same way we define integers in mathematics. You can store both positive and negative values, even zero. Here is an example of how to define integers in Python. Python automatically understands the data type defined; even when the data type name is not mentioned. The credit for this goes to the dynamic nature of the Python programming language.

my_int = 3
print(my_int)
print(type(my_int))
OUTPUT:
3
<class 'int'>

Float

The float data type is a floating-point number and is used to define rational numbers. Like int data type, you can define both positive and negative values, even zero. Here is an example of how to define integers in Python.

my_float = 3.6
print(my_float)
print(type(my_float))
OUTPUT:
3.6
<class 'float'>

String

String data type refers to the collection of characters. If you want to use text, use a string. In Python, you can define a String using a single or double quote. Here is an example of how to define integers in Python.

my_string = 'Goodbye, World!'
print(my_string)
print(type(my_string))
OUTPUT:
Goodbye, World!
<class 'str'>

Boolean

Boolean data type refers to truth statements. A variable of Boolean data type can have either of the two values – True or False. Here is an example of how to define integers in Python.

### In-built non-primitive data structures

Non-primitive data does not store a single value but a collection or group of values. The built-in non-primitive data types in Python are:

• List
• Tuples
• Dictionaries
• Sets

List

List is a versatile data type in Python. It is a sequence in which elements are present, separated by a comma.

my_List = [1, 2, 3, 4, 5]
print(my_List)
print(type(my_List))
OUTPUT:
[1, 2, 3, 4, 5]
<class 'list'>

Tuple

Similar to a list, a tuple is another built-in data type but differs in two things. Firstly, tuples are immutable; once you define values in a tuple, you cannot make any changes. Secondly, we use parentheses to define the set of values in a tuple.

my_Tuple = (1, 2, 3, 4, 5)
print(my_Tuple)
print(type(my_Tuple))
OUTPUT:
(1, 2, 3, 4, 5)
<class 'tuple'>

Dictionary

Dictionary is a data structure where we can store a pair of a key and value. Every key-value pair is separated by a colon (:), and consecutive items are stored by a comma.

my_Dictionary = {"Language" : "Python", "Version" : "3.8"}
print(my_Dictionary["Language"])
print(my_Dictionary["Version"])
print(type(my_Dictionary))
OUTPUT:
Python
3.8
<class 'dict'>

Set

A set comprises of unique and unordered elements. To make it simpler, even if the element is present more than once, it will be counted once in the set list. We use a flower or curly braces to define elements in a set.

my_Set = {1, 2, 3, 2, 4, 3, 5}
print(my_Set)
print(type(my_Set))
OUTPUT:
{1, 2, 3, 4, 5}
<class 'set'>

## User-defined data structures in Python

Many data structures are not available in Python. Through user-defined data structures, you can define those data structures and reflect their functionality. We can implement data structures directly in Python using:

A linked list is a linear data structure where elements are not linked explicitly in memory locations but linked with pointers. All linked list creates a series of node or a chain like structure. It is second mode preferred technique after array.

Stack

A stack is a linear data structure that uses the last in first out (LIFO) principle. That means the element inserted last is the first to be taken out. Stack supports push, pop, and peep operations.

Queue

A stack is a data structure that follows the first-in-first-out (FIFO) principle. That means the element inserted first is the first to be taken out. Stack supports insert, delete, and peep operations.

Tree

A tree is a non-linear data structure consisting of roots and nodes. The topmost base node is the root, and the elements present at the end of the tree are leaves.

Graph

A graph is a non-linear data structure consisting of nodes and edges. Nodes are known as vertices, and the lines connecting two nodes are generally called nodes.

HashMap

Hash Maps are indexed data structures and perform the same function as that performed by dictionary in Python. A hash map uses a hash function to compute index-key values into arrays.

## What are Algorithms?

An algorithm is a step-by-step approach followed in a sequential order to solve problems. They are not language-oriented; there is no particular language used to write algorithms.

## How to write an algorithm?

There is no one way to write an algorithm—just as there is no one way to parent a child or roast a turkey. But there are incredible ways to do all three. Simply put, there is no hard and fast rule to write an algorithm. However, the following steps in the sequence are generally preferred by most of the programmers. Here are a few steps you must follow when writing an algorithm:

Step-1: Define the problem
Step-2: Decide where to start
Step-3: Decide where to stop
Step-4: Take care of intermediate steps
Step-5: Review and revise

### Algorithm Classes

Divide and Conquer: This class of algorithm involves dividing the problem into parts and calling the divided parts explicitly using recursive function until we obtain the desirable solution.

Dynamic Programming:  Dynamic Programming or simply DP is an algorithmic approach used for solving a problem by breaking it down into simpler subproblems where the overall problem depends upon the optimal solution to its subproblems.

Greedy Algorithms: As the name suggests, this involves building solutions piece by piece and choosing the most lucrative. Simply put, we choose the easiest-step first without thinking over the complexities of the later steps.

## Elements of a good algorithm?

• Clarity: Clear and easy to understand.
• Well-defined inputs: Must accept a set of defined inputs
• Output/s specified: Must produce outputs
• Finiteness: Must stop after a certain number of steps
• Programming fundamentals oriented: Must not be language-oriented.

## Tree Traversal Algorithms

Tree Traversal involves processing the data of a node exactly once in some order in a tree. Unlike an array or linked list, the tree is a non-linear data structure -- a tree can be transverse in many ways.

Tree Traversal Algorithms are of two types:

• Breadth-First Traversal or Level Order Traversal

As the name suggests, the transverse mode in a level-by-level fashion.

• Depth First Traversal

Pre-order Traversal –> <root><left><right>
In-order Traversal–> <left><root><right>
Post-order Traversal –> <left><right><root>

### Sorting Algorithms:

We use Sorting algorithms to sort data into some given order. The most common sorting algorithms include:

• Bubble Sort
• Selection Sort
• Insertion Sort
• Merge Sort

### Bubble Sort Algorithm:

Bubble sort is a comparison algorithm that first compares and then sorts adjacent elements if they are not in the specified order.

Time Complexity: Ω(n)

Step 1: Starting from the first element indexed at 0 and comparing the next item in the sequence.
Step 2: While comparing, check whether the elements you are comparing are in order or not. If not, start swapping.
Step 3: After each swap, keep moving to the next element.

def bubble_sort(num):
for i in range(len(num)-1, 0, -1):
for j in range(i):
if num[j] > num[j+1]:
temp = num[j]
num[j] = num[j+1]
num[j+1] = temp
num = [10, 6, 16, 12, 14, 4]
bubble_sort(num)
print(num)
OUTPUT:
[4, 6, 10, 12, 14, 16]

Selection Sort

The problem with bubble sort is that we have to swap multiple times, which is strenuous, time-consuming, and memory draining.

The Selection sort algorithm divides the given list into two halves – a sorted list and an unsorted list. The sorted list is empty, and all elements to sort are present in the unsorted list.

Time Complexity: Ω(n^2)

def selection_sort(num):
for i in range(5):
min_position = i
for j in range(i,6):
if num[j] < num[min_position]:
min_position = j
temp = num[i]
num[i] = num[min_position]
num[min_position] = temp
num = [10, 6, 16, 12, 14, 4]
selection_sort(num)
print(num)
OUTPUT:
[4, 6, 10, 12, 14, 16]

Insertion Sort: The insertion_sort() function starts by assuming that the first item is in its proper position. Next, an iteration is performed over the remaining items to insert each element into its right location within the sorted portion of the sequence.

It is not a fast-sorting algorithm because it uses nested loops to sort and is useful for only small data sets.

Time Complexity: Ω(n^2)

def insertion_sort(num):
for i in range(1, len(num)):
for j in range(i-1, -1, -1):
if num[j] > num[j+1]:
temp = num[j]
num[j] = num[j + 1]
num[j + 1] = temp
num = [10, 6, 16, 12, 14, 4]
insertion_sort(num)
print(num)
OUTPUT:
[4, 6, 10, 12, 14, 16]

### Merge Sort Algorithm

The merge sort algorithm uses the divide and conquer approach to sort the elements stored in a mutable sequence. The sequence of values is recursively divided into smaller sub-sequences until each value is present within its sub-sequences. The sub-sequences get merged back together to create a sorted sequence.

Time Complexity: Ω(nlogn)

def merge_sort(my_List, left, right):
if right - left > 1:
middle = (left + right) // 2
merge_sort(my_List, left, middle)
merge_sort(my_List, middle, right)
the_List(my_List, left, middle, right)
def the_List(my_List, left, middle, right):
leftlist = my_List[left:middle]
rightlist = my_List[middle:right]
k = left
i = 0
j = 0
while (left + i < middle and middle + j < right):
if (leftlist[i] <= rightlist[j]):
my_List[k] = leftlist[i]
i += 1
else:
my_List[k] = rightlist[j]
j += 1
k = k + 1
if left + i < middle:
while k < right:
my_List[k] = leftlist[i]
i += 1
k += 1
else:
while k < right:
my_List[k] = rightlist[j]
j += 1
k += 1
my_List = input('Please Enter the Values You Want to Sort: ').split()
my_List = [int(x) for x in my_List]
merge_sort(my_List, 0, len(my_List))
print('Hey! Your Sorted Items Are: ')
print(my_List)
OUTPUT:
Please Enter the Values You Want to Sort: 5 4 5 3 5 6
Hey! Your Sorted Items Are:
[3, 4, 5, 5, 5, 6]

### Searching Algorithms

When there is a need to find an element from a sequence, we use searching algorithms. The two most renowned searching algorithms are:

• Linear Search
• Binary Search

Linear Search

For finding elements within a list, we use a linear search algorithm. It checks each value presents in the sequence until it finds a match.

def search(the_List, n):
i = 0
while i<len(the_List):
if the_List[i] == n:
return True
i +=1
return False
the_List = [10, 20, 30, 40, 50, 60]
n = 40
if search(the_List, n):
print("Element Found: ", n)
else:
print("Oops! Not Found")

Binary Search

Make sure to sort all the elements. The value present at the first position is the lower bound, and the value present at the nth position is the upper bound.

If the value you are searching for is smaller than the mid-value, change the upper bound, and the mid-value becomes the upper bound.

If the value you are searching for is larger than the mid-value, change the lower bound, and the mid-value becomes the lower bound.

Time Complexity: O(logn)

pos = -1
def search(the_List, n):
lb = 0
ub = len(the_List)-1
while lb <= ub:
mid_value = (lb + ub) // 2
if the_List[mid_value] == n:
globals() ['pos'] = mid_value
return True
else:
if the_List[mid_value] < n:
lb = mid_value
else:
up = mid_value
the_List = [10, 20, 30, 40, 50, 60]
n = 40
if search(the_List, n):
print("Element Found at: ", pos+1 )
else:
print("Oop! Not Found")

### Algorithm Analysis

Algorithms help in solving problems in a straightforward and tech-savvy way. Of course, a problem can have many different solutions, but not all are effective. How then are we to decide which solution is the most efficient for that problem? One approach is to measure the execution time. We can implement the solution by constructing a computer program using any preferable programming language of our choice.

Algorithm execution time depends on the amount of data processed. With the increase in data size, the execution time also increases. Second, the execution times vary depending on the type of hardware. With the use of a multi-processor multi-user system, the execution time of the program differs. Finally, the preference of programming language and compiler used to implement an algorithm impacts the execution time. Some compilers are just better at optimizing than others, and some languages produce better-optimized code than others.

Conclusion

Data structures store a collection of values but differ in how they organize and are handled. The choice of a particular data structure depends on the problem at hand. Some data structures work better than others. The process becomes seamless with practice and experience.

### Abhresh S

Author

An Online Technical Trainer by profession! And Content writer by hobby! Interested in sharing quality knowledge to make the Industry grow better towards better success and better tomorrow! With a Guru Mantra of - "Keep Learning & Keep Practicing".

## Role of Unstructured Data in Data Science

5741
Role of Unstructured Data in Data Science

Data has become the new game changer for busines... Read More