For enquiries call:

Phone

+1-469-442-0620

HomeBlogDatabaseIndexing in DBMS: What You Need to Know

Indexing in DBMS: What You Need to Know

Published
05th Sep, 2023
Views
view count loader
Read it in
15 Mins
In this article
    Indexing in DBMS: What You Need to Know


     Performance and efficiency are crucial in the area of database management systems (DBMS). The demand for speedy and effective data retrieval increases as databases get bigger and more complicated. Indexing in DBMS is a basic idea that speeds up data retrieval by offering a structured and organized means to access and locate particular data within a database. This blog will focus on indexing in DBMS and its types to help you optimize data retrieval and enhance overall performance.

    Also, having a solid foundation in full-stack web development can greatly enhance your skills as a developer. Full Stack Web Development certification will equip you with the knowledge and expertise to build dynamic and responsive web applications.

    What is Indexing in DBMS? 

    The term "indexing" in DBMS refers to the process of adding new index data structures in DBMS to enhance data retrieval and query efficiency. It involves developing a logical mapping between key values and the database's corresponding physical locations.

    In a database, the data is typically stored in tables, and each table may contain a large number of records. Without an index, the DBMS would have to scan every row of the database sequentially to find a particular piece of information, which would take time and resources, especially when working with huge datasets. This issue is resolved by indexing, which produces a distinct structure that makes quick data lookup possible.

    Structure of an Index

    In DBMS, an index is a key-value pair of two columns- Search Key and Data Reference:

    The search key column contains copies of selected columns, such as the primary key or candidate key, from the database table. These selected keys are often stored in a sorted manner to optimize query time by enabling binary search instead of linear search.

    The data reference column contains a set of pointers that store the addresses of disk blocks. Each disk block contains the actual data that corresponds to the search key. The data reference column is sometimes referred to as the block pointer since it utilizes block-based addressing points, lines, and polygons, based on their spatial relationships and coordinates. Spatial indexes employ specialized data structures, such as R-trees or quad-trees, to enable fast spatial queries, such as range searches, nearest neighbor searches, and spatial joins. They are well-suited for data types that represent spatial information, such as points, lines, polygons, and more.

    It is beneficial to gain expertise in web development to complement your knowledge of indexing in DBMS. Web Development course will help you master technologies that create seamless websites and impress top-tech recruiters.

    Tree Indexing in DBMS 

    Tree based indexing in DBMS is a widely used technique for efficient data retrieval and storage. Two commonly used tree-based index structures are B-tree indexing and B+ tree indexing. Let us explore each of these types:

    1. B Tree Indexing in DBMS

    B-tree indexing in DBMS is a balanced tree-based indexing structure that organizes data in a hierarchical manner. It has effective search and retrieval functions and is made to manage massive amounts of data. The B-tree index maintains a sorted order of the keys and allows for quick lookup by traversing the tree from the root to the leaf nodes.

    The branching factor of a B-tree ensures that the depth of the tree is kept to a minimum, allowing for quicker access. B-tree indexing is commonly used in scenarios where the data size is too large to fit in memory and where efficient range queries and updates are required.

    2. B+ Tree Indexing in DBMS 

    B+ tree indexing is an extension of the B-tree index structure that further optimizes the performance and storage characteristics. It is particularly suitable for disk-based storage systems. In a B+ tree, the keys are stored only in the leaf nodes, while the internal nodes act as navigational pointers.

    Leaf nodes are connected in a linked list, allowing for efficient range scans and sequential access. The B+ tree index structure provides better utilization of disk blocks, reduces disk I/O operations, and allows for faster range queries and ordered traversals. B+ tree indexing is commonly used in DBMS for handling large datasets and supporting efficient range-based queries.

    Types of Indexing in DBMS

    Effective database management requires a thorough understanding of the numerous indexing techniques. In this section, you will learn about the different indexing techniques in DBMS, which are divided into categories based on the attributes of the index, the structure of data files, and particular use cases.

    1. Based on Characteristics of the Index Attribute

    In DBMS, indexes can be categorized based on the characteristics of the index attribute they are created on. The Three common types of indexes based on these characteristics are:

    a. Primary Index 

     The primary index is created based on the primary key of a table. It provides a unique, ordered, and one-to-one mapping between the primary key values and the physical locations of the corresponding data records. The primary index enables fast retrieval of specific records based on their primary key values, as it offers direct access to the desired data.Here are the characteristics of a Primary Index in DBMS:

    • Each search key value in the primary index is unique, as it is typically based on the primary key or candidate key of the table. This uniqueness ensures that each search key maps to a unique data record.
    • The search keys in primary indexing are arranged in sorted order, usually ascending or descending. This ordering facilitates efficient search and retrieval operations, as it allows for quick binary search or other optimized algorithms.
    • In primary indexing, the search keys must have valid values and cannot be null. This is because the primary index points to a specific block or location on the disk where the associated data is stored.
    • Primary indexing enables fast and efficient searching of data records. With the search keys being unique and in sorted order, primary indexing supports direct access to the desired data record based on its key value. This direct access minimizes the number of disk accesses and enhances search performance.

    b. Clustered Index

    Clustered Index is employed when multiple related records are physically stored together. It is based on ordering the data in a specific manner. In clustered indexing, the index table is created using the key values of the underlying data table. The primary objective is to enhance retrieval speed by grouping columns with similar characteristics. This grouping is accomplished through the creation of indexes, known as the clustering Index. Here are the characteristics of clustered index in DBMS:

    • Clustered Indexing is created using the key attribute(s) of the table, such as primary or candidate keys. These attributes are used to order and organize the data.
    • The search keys in clustered indexing are arranged in a specific sorted order, which determines the physical storage layout of the data.
    • Clustered indexing does not support null values in the search keys. Each indexed record must have a valid value for the indexed attribute.
    • In a clustered index, the search keys must be unique. This means that each indexed record must have a unique value for the indexed attribute.
    • The creation of clustered indexing involves additional efforts compared to other types of indexing. It requires reorganizing the data based on the selected attributes to form clusters, which may require additional storage or computational resources.

    c. Secondary Index 
     
     Secondary Index in DBMS, also known as non-clustered indexing, is a two-level indexing technique that aims to reduce the mapping size of the primary index. Unlike primary indexing, where the actual data is sorted, secondary indexing points to specific locations where the data is stored without maintaining a sorted order.Here are the characteristics of Secondary Indexing:

    • The search keys used in secondary indexing are typically candidate keys, which are unique identifiers for records in a table.
    • While the search keys in secondary indexing are typically sorted, the actual data associated with these keys may or may not be stored in sorted order.
    • Secondary indexing generally requires more time for retrieval compared to primary indexing due to the additional level of indirection involved in locating the actual data.
    • Secondary indexes do not support null values in the search keys. Every indexed record must have a valid value for the indexed attribute.
    • Secondary indexing offers faster retrieval compared to clustered indexing, as it doesn't involve rearranging the physical order of data. However, it is typically slower than primary indexing due to the additional indirection required to locate the data.

    2. Based on Data File

    Based on the data file, indexes can be further classified into the following types:

    a. Dense Index: Dense index is a type of index in which an entry exists for every search key value in the data file. It provides a direct mapping between search key values and their corresponding disk block addresses. In a dense index, the index entries are typically sorted in the order of the search key values.This enables efficient lookup operations as the index allows for direct access to the desired data block based on the search key value. However, dense indexing requires more space to store the index entries as compared to sparse index in DBMS.
     
    b. Sparse Index: Sparse index, on the other hand, does not have an entry for every search key value in the data file. Instead, it contains entries only for selected search key values, usually at specific intervals or predetermined points in the data file. These selected values are referred to as index key values. Sparse indexing helps reduce the size of the index, especially for large data files with a wide range of search key values.

    To locate a specific search key value, the sparse index directs the search to the nearest index key value that is less than or equal to the desired value, and then linearly scans the data blocks from that point. Sparse indexing requires less storage space for the index but may involve more disk accesses during the search process compared to dense indexing.

    3. Based on Specific Scenarios

    Based on specific scenarios, several types of indexes can be used to address specific data characteristics or query requirements. Some of them are:

    • Bitmap Index: Bitmap indexing is suitable for handling data with low cardinality attributes, where the attribute values have a limited number of distinct values compared to the overall data size. It uses bitmap vectors to represent the presence or absence of attribute values in the data. Each bit in the bitmap corresponds to a specific attribute value, and the bit is set if the value is present for a particular record.Bitmap indexes are efficient for performing logical operations like AND, OR, and NOT, making them well-suited for decision support systems and data warehouses. Bitmap indexing is suitable for data with low cardinality attributes, where the attribute values have a limited number of distinct values compared to the overall data size, such as categorical data or boolean flags.
    • Reverse Index: A reverse index, also known as an inverted index, is commonly used in full-text search systems and information retrieval applications. It maps terms or words to the documents or records where they appear. Unlike traditional indexes that map from documents to terms, a reverse index allows for quick searching based on terms or keywords, facilitating efficient text-based searches. Reverse indexes are suitable for text-based data, such as documents, web pages, articles, or any data containing textual content.
    • Hash Index: Hash based indexing in DBMS utilizes a hash function to map key values directly to the location of the corresponding data block or bucket. It is particularly useful for equality-based searches, as it provides constant-time access to records based on their key values. However, hash indexes do not support range queries efficiently, and collisions can occur if multiple key values map to the same hash value, requiring additional handling techniques. Hash indexes are suitable for data with key-value pairs, where the key values are of fixed length and can be easily mapped to a hash function. This type of index is commonly used for primary key or unique key columns in a database.
    • Filtered Index: Filtered indexing is employed when only a subset of data in a table needs to be indexed. It involves creating an index on a specific subset of rows that satisfy a predefined filter condition. By indexing only the relevant data, filtered indexes reduce the index size and improve query performance for the filtered subset of records while minimizing the overhead of maintaining the index. Filtered indexes are suitable for data tables where a specific subset of rows needs to be indexed based on a filter condition. The data types suitable for filtered indexes can vary depending on the specific filter condition and the columns involved. However, commonly suitable data types for filtered indexes include numeric, string, date, and boolean data types.
    • Function-based Index: Function-based indexing allows the creation of indexes on expressions or function outputs rather than directly on columns. It enables indexing derived values or computed results from column data. Function-based indexes are beneficial when queries involve complex expressions, transformations, or calculations, providing optimized access to the computed data. Function-based indexes can be applied to a wide range of data types, as they primarily depend on the specific expressions or functions used in index creation. However, some commonly suitable data types for function-based indexes include numeric types (integers, decimals), string types (varchar, text), date/time types, and boolean types.
    • Spatial Index: Spatial indexing is designed for efficiently storing and querying spatial or geographical data. It organizes spatial objects, such as points, lines, and polygons, based on their spatial relationships and coordinates. Spatial indexes employ specialized data structures, such as R-trees or quad-trees, to enable fast spatial queries, such as range searches, nearest neighbor searches, and spatial joins. They are well-suited for data types that represent spatial information, such as points, lines, polygons, and more.

    How Do You Create an Index in DBMS?

    It is important to carefully analyze the data properties, query patterns, and performance requirements while building an index in a DBMS. Indexes can considerably improve query performance and boost the overall effectiveness of database operations if the indexed columns and index types are chosen carefully and maintained on a regular basis.

    In SQL, you can use the CREATE INDEX statement to create an index. The syntax for creating an index is as follows:

    CREATE INDEX index_name

    ON table_name;

    Here, index_name is the name you assign to the index, and table_name specifies the table on which the index is to be created.

    You can create a single-column index by providing the column_name representing the column based on which the index is built. The syntax for that is as follows:

    CREATE INDEX index_name

    ON table_name column_name;

    If you need to create an index based on multiple columns, you can use the following syntax:

    CREATE INDEX index_name

    ON table_name (column_name1, column_name2);

    This type of index, known as a composite index, is useful when multiple columns are frequently used as filters in the WHERE clause of queries.

    To remove an index, you can utilize the DROP INDEX statement with the following syntax:

    DROP INDEX index_name;

    Advantages of Indexing

    Indexing offers several advantages in a database management system. Here are some key benefits of indexing:

    1. Improved Data Retrieval: Indexing significantly reduces the number of I/O operations required to retrieve data. By providing a separate structure for efficient lookup, indexes allow for faster search and retrieval of data. Instead of scanning the entire table, the database can directly access the index and locate the desired rows, resulting in quicker response times.
    2. Faster Query Performance: Indexes enhance query performance by enabling the database to quickly locate and retrieve relevant data. Queries that involve filtering, sorting, or joining can leverage indexes to eliminate the need for full-table scans, thereby optimizing query execution time.
    3. Reduced Table Space: Indexing can help reduce the overall tablespace in a database. Since indexes store only the key values and pointers to the actual data, there is no need to duplicate the entire table. This leads to more efficient storage utilization and can result in space savings.

    Disadvantages of Indexing 

    While indexing provides several advantages, there are also some limitations and drawbacks to consider:

    1. Requirement of Primary Key: In order to create an index, a primary key with unique values is typically required on the table. This constraint ensures that the index remains consistent and accurate. However, it may impose restrictions on table design or the ability to create indexes on certain tables.
    2. Limited Indexing Options: Once a table is indexed, there are limitations on the additional indexes that can be created. Indexing is generally performed on specific columns or combinations of columns, and the number of indexes that can be created per table may be limited. Careful consideration is required to select the most beneficial columns for indexing based on the query patterns and performance requirements.
    3. Impact on Write Operations: While indexing improves data retrieval, it can have a negative impact on write operations such as INSERT, UPDATE, and DELETE queries. When data is modified, the corresponding indexes also need to be updated, which incurs additional overhead. This can result in slower write performance, especially for tables with multiple indexes. 
    4. Additional Disk Overhead: Indexes require additional storage space on disk to store the index data structures. This can increase the overall disk usage of the database, which may become a concern when dealing with large datasets or limited storage resources.

    Wrapping Up

    Indexing plays a crucial role in enhancing the efficiency and performance of DBMS. Indexes provide quicker query processing, increased search performance, and optimized data access by organizing and structuring data in a way that promotes quick and efficient data retrieval.

    Indexes require additional storage space and entail maintenance overhead, especially during data modification operations. Therefore, it is essential to strike a balance between the benefits and costs associated with indexing. The Database certification course will help you explore the most popular databases leveraged by organizations worldwide.

    Profile

    Ashutosh Krishna

    Author

    Ashutosh is an Application Developer at Thoughtworks. Apart from his love for Backend Development and DevOps, he has a keen interest in writing technical blogs and articles. 

    Share This Article
    Ready to Master the Skills that Drive Your Career?

    Avail your free 1:1 mentorship session.

    Select
    Your Message (Optional)

    Upcoming Database Batches & Dates

    NameDateFeeKnow more
    Whatsapp/Chat icon