HomeBlogProgrammingHashing in Data Structure: Types, Functions & Examples

Hashing in Data Structure: Types, Functions & Examples

Published
18th Jun, 2024
Views
view count loader
Read it in
14 Mins
In this article
    Hashing in Data Structure: Types, Functions & Examples

    Linear search involves scanning through an entire array to find a specific value. This can be time-consuming, especially for large datasets. On the other hand, Hashing in Data Structure offers a more efficient solution.  

    Hashing uses a hash function to convert each element in the data structure (often an array) into a unique fixed-length value called a hash code. This hash code then acts as an index for storing the element. When searching for a specific value, the hash function is again used to generate the corresponding hash code, which is then compared to the stored hash codes. This allows for quick retrieval of the desired element, significantly improving search times, making Hashing a valuable tool for various data structures and applications. 

    If you want to learn more about Hashing, you can consider enrolling in KnowledgeHut’s comprehensive skill-oriented online course on Python Programming and get hands on with the nitty gritty of data science. 

    What is Hashing in Data Structure?

    Hashing in data structure pertains to a method of breaking up a huge amount of data into smaller tables. Also known as the message digest function, it is a method for distinguishing one distinct object from a group of related ones. By condensing the original input strings and data assets to short alphanumeric hash keys, developers and programmers can reduce both time and file space. 

    Hashing assists in focusing a search for a particular item on a data map. Hash codes create an index to hold values in this scenario. Therefore, hashing is employed to index and retrieve data from a database since it speeds up the process; finding an item using a smaller hashed key than just its original value is much simpler. is employed in this situation to index and retrieve data from a database since it speeds up the process; it is much simpler to find an item using a smaller hashed key than just its original value. 

    Hash tables in data structures are used to store the data in an array format. There is a unique index number for each value within the array. Hash tables create these distinct index numbers for each value stored in an array format using a method that they use, called the hash technique. 

    Some real-life hashing example in data structure include - 

    • In Libraries: A library contains an endless supply of books. The librarian assigns each book a unique number. This distinctive number aids in locating the exact position of the books on the bookshelf. 
    • In Schools: Each student in a class is assigned a unique roll number for easy identification. The school authority, later on, uses the unique roll number to retrieve relevant information about the particular student. 
    • In diagnostic centers and laboratories - Medical laboratories use unique serial numbers to identify and distinguish patient samples and patient information. 
    • Security Purposes: One must take precautions to ensure that the account does not end up in the wrong hands when they first visit a website that requests authentication through "Sign Up" and where they submit the login information to access the personal accounts. As a result, the database stores the entered password as a hash.

    How Does Hashing in Data Structure Work?

    How Does Hashing in Data Structure Work
    technicalsandWhen it comes to data structures, hashing is a technique used to store and retrieve data in a database. It is fundamental to many data structures, such as hash tables and hash trees. Hashing involves mapping data to a unique value, called a hash code. The hash code is then used to index into an array, where the data is stored. To retrieve the data, the hash code is simply re-computed and used to index into the array.

    Hashing is an efficient way to store and retrieve data in a data structure because it avoids the need for comparisons between elements. It also allows duplicate values to be stored in the same structure without causing collisions. Hashes are typically generated using a hashing algorithm, which takes an input and produces a hash code. There are many different hashing algorithms, but they all share the same basic principle: map data to a value that can be used to index into an array. Some popular hashing algorithms include SHA-1, MD5, and Murmur Hash. Hashing is a fundamental technique that is used in many different applications. It is particularly useful for storing and retrieving data from large collections. is a fundamental technique that is used in many different applications. It is particularly useful for storing and retrieving data from large collections. 

    In a nutshell, hashing works by mapping data to a hash code, which is then stored in a hash table. When data needs to be retrieved, the hash code is used to look up the data in the hash table. Hash tables are often used for quick lookup data, such as in caches and databases. One key advantage of hashing is that it can store data of any type, including integers, strings, and objects. Furthermore, hashing is relatively simple to implement and efficient when done correctly. As a result, hashing data structure is frequently used in computer programming. is frequently used in computer programming. 

    Example

    Let's look at a hashing data structure example to get a better understanding of how it works - 

    For example, let's say we want to map a list of string keys to a list of string values to grasp the concept better. Map the capital cities of countries, for example. Let's say we wish to save the information from the map's Table 1.

    KeyValue
    ItalyRome
    FranceParis
    EnglandLondon
    AustraliaCanberra
    SwitzerlandBerne

    Now, assuming that the hash function is just to determine the length of the string, the hash table will look like this -

    Position (hash = key length)KeyValue
    1

    2

    3

    4

    5ItalyRome
    6FranceParis
    7EnglandLondon
    8

    9AustraliaCanberra
    10

    11SwitzerlandBerne

    As Italy's hash code (length) is 5, we placed Italy in the 5th position in the Keys array and Rome on the fifth index… and so on. 

    Types of Hashing in Data Structure

    There are two primary hashing techniques in a data structure.  

    • Open hashing / separate chaining / closed addressing 
    • Closed hashing / open addressing 

    1. Open Hashing / Separate Chaining

    Separate chaining is the most used collision hashing technique in data structures that uses a lined list. Any two or more components that meet at the same point are chained together to form a single-linked list known as a chain. Every linked list member that hashes is chained to the same position here. Also known as closed addressing, open hashing is used to avoid any hash collisions, using an array of linked lists in order to resolve the collision. 

    Example:

    • In a simple hash function, h(key) = key mod table size. 
    • Let us take a simple hash table with table size = 6 
    • Sequence keys = 50, 600, 74, 113, 14, 99 

    Therefore, 

    0600
    1
    2507414
    399
    4
    5113

    (As collision occurs after inserting keys 74 and 14, they are added to the chain) 

    Advantages of Open hashingDisadvantages
    Easy and basic to implementWastage of space
    As there are a lot of empty spaces in the hash table, we can add more keys to the table.With a long chain, the search time is increased
    Comparatively less sensitive to different load factorsPoor cache performance as compared to closed hashing.
    Mostly used when unsure about the amount and frequency of the keys to be implemented in the hash table.

    2. Closed hashing (Open addressing)

    Open addressing stores all entry records within the array itself, as opposed to linked lists. The phrase 'open addressing' refers to the notion that the hash value of an item does not identify its location or address. In order to insert a new entry, the array is first checked before computing the hash index of the hashed value, starting with the hashed index. If the space at the hashed index is empty, the entry value is inserted there; otherwise, some probing sequences are used until an empty slot is found. 

    The procedure used to navigate through entries is known as the probe sequence. You can vary the time between succeeding entry slots or probes in different probe sequences. 

    In Closed hashing, there are three techniques that are used to resolve the collision:

    A. Linear Probing 

    Linear probing involves systematically checking the hash table from its very beginning. A different site is searched if the one received is already occupied. In linear probing, the interval between the probes is usually fixed (generally, to a value of 1). 

    • The formula for linear probing: index = key % hashTableSize 
    • The hash(n) is the index computed using a hash function, and T is the table size.  
    • If slot index = ( hash(n) % T) is full, then the next slot index is calculated by adding 1 ((hash(n) + 1) % T).  

    The sequence goes as:

    • index = ( hash(n) % T) 
    • (hash(n) + 1) % T 
    • (hash(n) + 2) % T 
    • (hash(n) + 3) % T … and so on. 

    Example: 

    • For a hash table, Table Size = 20 
    • Keys = 3,2,46,6,11,13,53,12,70,90 

    Therefore,

    SL. NOKEYHASHINDEXINDEX  
    (AFTER LINEAR PROBING)
    133%2033
    222%2022
    34646%2066
    466%2067
    51111%201111
    61313%201313
    75353%201314
    81212%201212
    97070%201010
    109090%201011

    B. Quadratic Probing  

    The only distinction between linear and quadratic probing is the space between succeeding probes or entry slots. When a hashed index slot for an entry record is already taken, you must start traversing until you discover an open slot. The spacing between slots is calculated by adding each subsequent value of any arbitrary polynomial in the initial hashed index. 

    The formula for quadratic probing: index = index % hashTableSize 

    • The hash(n) is the index computed using a hash function, and T is the table size.  
    • If slot index = ( hash(n) % T) is full, then the next slot index is calculated by adding 1 (hash(n) + 1 x 1) % T  

    The sequence goes as - 

    • index = ( hash(n) % T) 
    • (hash(n) + 1 x 1) % T 
    • (hash(n) + 2 x 2) % T 
    • (hash(n) + 3 x 3) % T … and so on  

    Example:

    • For a hash table, Table Size = 7 
    • Keys = 22,30,50 

    Thus,

    SL. NOKEYHASHINDEXINDEX  
    (AFTER QUADRATIC PROBING)
    12222%711
    23030%722
    3



    4



    55050%711(1+2 x 2)
    6



    C. Double-Hashing in data structure 

    It is another hash function that determines the intervals between probes. An optimized method of reducing clustering is double hashing. An additional hash function is used to calculate the increments for the probing sequence. 

    The formula for double hashing - (first hash(key) + i * secondHash(key)) % size of the table 

    The sequence goes as follows -  

    • index = hash(x) % S 
    • (hash(x) + 1*hash2(x)) % S 
    • (hash(x) + 2*hash2(x)) % S 
    • (hash(x) + 3*hash2(x)) % S … an so on 

    Example:

    • For a hash table, Table Size = 7 
    • Keys = 27,43,98,72 

    Thus,

    SL. NOKEYHASHINDEXINDEX  
    (AFTER DOUBLE HASHING)
    14343%71
    29292%76[h1(92) + i * (h2(92)] % 7

    = [6 + 1 * (1 + 92 % 5)] % 7

    = 9 % 7

    = 2
    3



    4



    57272%72[h1(72) + i * (h2(72)] % 7

    = [2 + 1 * (1 + 72 % 5)] % 7

    = 5 % 7

    = 5
    62727%76

    Need for Hash Data Structure  

    The ever-expanding volume of data on the internet from different streams presents a significant challenge regarding efficient storage for organizations. Although the data might not seem overwhelmingly large in day-to-day programming, managing it effectively remains crucial. One of the go-to data structures for handling this task has been the Array data structure.  

    But why do we have to look for alternatives when we already have Arrays? The key lies in the pursuit of efficient storage and retrieval. While storing in Array takes a time of O(1), the search operation takes at least O(log n). Although seemingly inconsequential, this time complexity can lead to substantial issues when dealing with extensive datasets, making the Array structure inefficient.

    The hash based data structure is a solution born out of the need for constant time operations. In contrast to Arrays, Hashing allows for data storage and retrieval in O(1) time, providing a more efficient alternative. This becomes particularly crucial when dealing with large datasets where swift access to information is vital. Hashing, with its ability to facilitate constant storage and retrieval, has become a cornerstone in addressing the challenges posed by the exponential growth of internet data today.

    Components of Hashing Data Structure

    Hashing in data structure utilizes a mathematical algorithm to transform information into numerical hash keys. The system enables fast and efficient retrieval of information and has three components:  

    • Key: A Hash Key functions as a compact representation of a larger dataset. It acts as an input parameter for a hash function. The key, which can be either a string or an integer, is input to the hash function to determine an index or storage location within the data structure.
    • Hash Function: A Hash Function is a critical component in the hash system. It is used to reduce a sizable input, such as a large number, to a practical integer value. Within a C++ hash table, this resultant integer operates as an index for efficient data storage and retrieval. The hash function transforms a given key into a designated slot index to make a systematic mapping of every possible key to a unique index.
    • Hash Table: Hashing uses a c hash table to store the Key and value pairs. Employing a hash function, the hash table generates unique indices needed to execute insert, update, and search operations. Conceptually, the data structure hash table functions as a box and adopts an array format for data storage. Each data entity possesses a distinct index value, which optimizes the efficiency of data access when index values are known. This array-based structure streamlines the process, enabling swift access to stored data.

    Hash Function in Data Structure with Example 

    Here are some simple examples of hashing in data structure:  

    • Student Information: In schools and colleges, the administrator assigns each student a distinct roll number. Subsequently, the teacher utilizes this roll number as a reference to access specific information about that student.
    • Password Validation: In user authentication systems, passwords are commonly hashed for security. Instead of actual passwords, systems store their hash values generated by a hash function. A user inputs a password, and the system applies a hash function to generate a hash code, which is stored. During login, the system hashes the entered password and compares it with the stored hash for authentication.  
    • Library Book Codes: Within a library housing an extensive collection of books, each book is designated a unique identifier by the librarian. This distinctive identifier plays a crucial role in pinpointing the precise location of each book on the library shelves.
    • Hashing in Blockchain: In blockchain, hashing ensures data integrity and security. Each block contains a unique hash computed from its content and the previous block's hash. This cryptographic linkage creates an unchangeable chain, as altering any block would necessitate changing subsequent blocks.

    What is a Hash Table?
    Hash table

    The hash table is an associative data structure that stores data as associated keys. An array format is used to store hash table data, where each value is assigned its unique index. By knowing the index of the desired data, we can access it very quickly. 

    As a result, inserting and searching data is very fast, regardless of data size. In a Hash Table, elements are stored in an array, and an index is generated using hashing techniques in a data structure. 

    What is the Hash Function in DAA?

    A key is transformed into a hash key through a fixed process in hash functions. Hash values are length-restricted values that can be derived from a key. Despite the fact that the hash value is usually less than the original, the original sequence of characters is still reflected in the hash value. Following the transfer of the digital signature, the recipient receives both the hash value and the digital signature. A comparison is made between the hash value generated by the receiver and the one received along with the message using the same hash algorithm. Messages are sent without error if their hash values match exactly. 

    A. Characteristics of the hash function in data structure

    • There is no limit to the length of a message 
    • Message digests are generated with a fixed length 
    • A message digest can be computed quickly (and easily) 
    • Message digests cannot be generated from the hash - they are irreversible 
    • Message values change dramatically when small changes are made 
    • The hash is collision-free because two different messages can't result in the same hash value 

    Some important characteristics of the hash function are described below -  

    • Collision free: A hash function in data structure is collision-free because it contains two key features. The function should, first and foremost, transfer equally likely inputs to all feasible outputs. Second, the function must be deterministic, guaranteeing that it will always yield the same results when given the same input. A collision-free hash function ensures that no two input hash maps out the same output hash. 
    • Property to be hidden: A hash function is used to map data of arbitrary size to fixed-size data. One key characteristic of a good hash function is that it should be hard to guess the input value from its output. In other words, it should be difficult to find two different inputs that produce the same output. The output should be evenly distributed across all possible values. This ensures that every possible input will have a unique output and that the outputs will be spread evenly throughout the range of possible values. 
    • Puzzle friendly: A hash function is ought to be suitable for puzzles. The choice of an input that yields a predetermined result ought to be challenging. As a result, it is best to choose the input from a range that is as diverse as feasible.

    B. Types of Hash Functions

    Many hash functions use alphanumeric or numeric keys. The main hash functions cover -  

    • Division Method.  
    • Mid Square Method.  
    • Folding Method.  
    • Multiplication Method.  

    Let's examine these methods in more detail. 

    1. Division Method

    The division method is the simplest and easiest method used to generate a hash value. In this hash function, the value of k is divided by M and uses the remainder as obtained. 

    Formula - h(K) = k mod M 

    (where k = key value and M = the size of the hash table) 

    AdvantagesDisadvantages
    This method works well for any value of MThis may lead to poor performance as consecutive keys are mapped to consecutive hash values in the hash table.
    The division approach is extremely quick because it only calls for one operation.There are situations when choosing the value of M requires particular caution.

     Example: 

    • k = 1320 
    • M = 11 
    • h (1320) = 1320 mod 11 
    • = 0

    2. Mid-Square Method

    The steps involved in computing this hash method include the following - 

    1. Squaring the value of k ( like k*k) 
    2. Extract the hash value from the middle r digits. 

    Formula - h(K) = h(k x k)

     (where k = key value ) 

    AdvantagesDisadvantages
    Since most or all of the key value's digits contribute to the outcome, this strategy performs well. The middle digits of the squared result are produced by a process in which all of the essential digits participate.One of this method's constraints is the size of the key; if the key is large, its square will have twice as many digits.
    The top or bottom digits of the original key value do not predominate in the outcome.Chance of repeated collisions.

     Example:

    Let's take the hash table with 200 memory locations and r = 2, as decided on the size of the mapping in the table.  

    • k = 50 
    • Therefore, 
    • k = k x k 
    • = 50 x 50 
    • = 2500 
    • Thus, 
    • h(50) = 50

    3. Folding Method

    There are two steps in this method:  

    • The key-value k should be divided into a specific number of parts, such as k1, k2, k3,..., kn, each having the very same number of digits aside from the final component, which may have fewer digits than the remaining parts. 
    • Add each component separately. The last carry, if any, is disregarded to determine the hash value. 

    Formula - k = k1, k2, k3, k4, ….., kn 

    s = k1+ k2 + k3 + k4 +….+ kn 

    h(K)= s

    (Where, s = addition of the parts of key k) 

    AdvantagesDisadvantages
    Breaks up the key value into precise equal-sized segments for an easy hash valueSometimes inefficient if there are too many collisions
    Independent of distribution in a hash table

    Example:

    • k = 54321 
    • k1 = 54 ; k2 = 32 ; k3 = 1 
    • Therefore, 
    • s = k1 + k2 + k3 
    • = 54 + 32+ 1 
    • = 87 
    • Thus, 
    • h (k) = 87

    4. Multiplication Method

    Steps to follow:

    • Pick up a constant value A (where 0 < A < 1) 
    • Multiply A with the key value 
    • Take the fractional part of kA 
    • Take the result of the previous step and multiply it by the size of the hash table, M. 

    Formula - h(K) = floor (M (kA mod 1)) 

    (Where, M = size of the hash table, k = key value and A = constant value) 

    AdvantagesDisadvantages
    It may be applied to any number between 0 and 1, albeit some numbers tend to produce better results than others.When the table size is a power of two, the multiplication technique is typically appropriate since multiplication hashing makes it possible to compute the index by key quickly.

    Example: 

    • k = 1234 
    • A = 0.35784 
    • M = 100 
    • So,  
    • h (1234) = floor [ 100(1234 x 0.35784 mod 1)] 
    • = floor [ 100 ( 441.57456 mod 1)] 
    • = floor [100 ( 0. 57456)] 
    • = floor [ 57.456] 
    • = 57 
    • Thus, 
    • h (1234) = 57

    C. Choosing a Good Hash Function  

    • Creating an effective hash function that distributes the added item's index value evenly across the database is important. 
    • Quick and easier to compute according to the requirements. 
    • An approach to successfully resolve collisions in hash tables is essential for generating an index for a key whose hash index corresponds to an existing spot. 

    What is a Hash Collision?

    A hash collision happens when the same hash value is produced for two different input values by a hash algorithm. But it's important to point out that collisions aren't a problem; they're a fundamental aspect of hashing algorithms. 

    Collisions occur because different hashing techniques in data structure convert every input into a fixed-length code, regardless of its length. Since there are an endless number of inputs and a limited number of outputs, the hashing algorithms will eventually produce repeating hashes.

    What is the 'Key' in Hashing?

    Hashing Key is the raw data that has to be hashed in a hash table. The hashing algorithm carries out a function to translate the hash key into the hash value. The outcome of feeding the hash key through the hashing algorithm is what is known as the hash value. 

    Hash Key = Key Value % Number of Slots in the Table 

    The different types of hashing keys are -  

    • Public key: Public key often termed 'asymmetric' key, is a type of key only used for data encryption. The mechanism is relatively slower because the public key is an open key. The common uses of public keys include the functions of cryptography, the transfer of bitcoins, and securing online sessions. However, Public key functions along with a set of private keys, and thus, the overall security is not compromised. 
    • Private key: The private key is employed in both encryption and decryption. Each party that sends or receives sensitive information that has been encrypted shares a key. Due to the fact that both parties share it, the private key is also referred to as "symmetric". A private key is typically a long, impossible-to-guess string of bits generated at random or artificially random. 
    • SSH public key: SSH uses both a public and a private key. SSH is a set of keys that can be used to authenticate and decrypt a communication sent from a distance. Both the distant servers and the stakeholders have access to the public key.

    Conclusion

    In conclusion, I must emphasize that hashing in data structures remains a fundamental and indispensable operation in our domain. It effectively streamlines the storage and retrieval of information through efficient algorithms. Having spent years exploring the intricate world of data structures and algorithm hashing, I can attest that its true power lies in its remarkable ability to transform complex data into easily manageable, fixed-size values. 

    From my own experience, I've witnessed the far-reaching applications of hashing in various fields. Whether it's enhancing password security or ensuring data integrity within blockchain technology, the impact of hashing is diverse and profound. If you're a professional looking to advance in this field, I strongly recommend learning more about hashing in DSA through KnowledgeHut's Full-Stack Web Development course online because it's a crucial skill. 

    Frequently Asked Questions (FAQs)

    1Why is hashing used in data structure?

    On average, hashing offers constant-time search, insert, and remove operations. Since unique elements, counting item frequencies, discovering duplicates, etc., are a few examples of problems that may be solved using hashing, it is one of the most popular data structures. 

    2What is the difference between hashing and hash tables?

    Hash tables are a type of data structure that makes use of hashing. Through the use of a key-value lookup system, hash tables store data in an associative manner. It really means that a hash table maps keys to specific values. This method of data organization makes it possible to locate data quickly and effectively. 

    3Does Python use hashing?

    In Python, the hash method is a method that is employed to retrieve an object's hash value. The hash technique is used in programming to return numerical values that are compared against dictionary keys utilizing a dictionary lookup function.

    Profile

    Ramulu Enugurthi

    Blog Author

    Ramulu Enugurthi, a distinguished computer science expert with an M.Tech from IIT Madras, brings over 15 years of software development excellence. Their versatile career spans gaming, fintech, e-commerce, fashion commerce, mobility, and edtech, showcasing adaptability in multifaceted domains. Proficient in building distributed and microservices architectures, Ramulu is renowned for tackling modern tech challenges innovatively. Beyond technical prowess, he is a mentor, sharing invaluable insights with the next generation of developers. Ramulu's journey of growth, innovation, and unwavering commitment to excellence continues to inspire aspiring technologists.

    Share This Article
    Ready to Master the Skills that Drive Your Career?

    Avail your free 1:1 mentorship session.

    Select
    Your Message (Optional)

    Upcoming Programming Batches & Dates

    NameDateFeeKnow more
    Course advisor icon
    Course Advisor
    Whatsapp/Chat icon