For enquiries call:

Phone

+1-469-442-0620

April flash sale-mobile

HomeBlogDatabaseNormalization in DBMS: 1NF, 2NF, 3NF and BCNF

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF

Published
03rd Jan, 2024
Views
view count loader
Read it in
19 Mins
In this article
    Normalization in DBMS: 1NF, 2NF, 3NF and BCNF

    Data duplication could occur in a large database defined as a single relation. This data repetition could lead to the following:

    1. Expanding relationships greatly.
    2. Maintaining and updating data is difficult since it necessitates searching through numerous related entries.
    3. Waste and inefficient use of resources and disc space.
    4. Errors and inconsistencies are more likely to occur.

    Decomposing relations with redundant data into smaller, simpler, and well-structured relations that satisfy desired attributes is the best way to solve these difficulties. Decomposing the relations into relations with fewer attributes is the process of normalization.

    Redundancy from a relation or group of relations is minimized through normalization. Insertion, deletion, and update abnormalities could result from relational redundancy. Therefore, it helps to reduce relational redundancy. Redundancy in database tables is removed or reduced using normal forms.

    A normalization database or normalization in DBMS design method avoids data duplication and removes undesired traits like Insertion, Update, and Deletion Anomalies. Using relationships, normalization rules break up larger tables into smaller ones. SQL normalization serves the dual purpose of removing unnecessary (repetitive) data and ensuring logical data storage.

    With the introduction of the First Normal Form, the relational model's creator Edgar Codd put out the notion of data normalization. He later expanded it with the Second and Third Normal Forms. Later, he collaborated with Raymond F. Boyce to create the Boyce-Codd Normal Form theory.

    The process of structuring the data and properties in a database is called normalization. It is done to make sure that data is kept logically and to lessen data redundancy in a database. Redundant data is the same data present in several locations. Data redundancy must be eliminated since it leads to anomalies in normalization in DBMS, making it difficult for a database administrator to maintain it.

    What is Normalization in DBMS?

    Data organization in a database is done by normalization. This entails building tables and linking those tables together in accordance with principles intended to safeguard the data and increase the database's adaptability by removing duplication and inconsistent reliance.

    Redundant data waste disk space, and maintenance issues result. If data that already exists in multiple locations needs to be modified, it must be changed in the same way everywhere. Changing a customer's address is significantly simpler if the information is kept solely in the Customers table and nowhere else in the database.

    An "inconsistent reliance" is what? While it makes sense for a user to look in the Customers database for a specific customer's address, it may not make sense to do so for the employee who calls on that customer's behalf. The employee's wage should be transferred to the Employees table because it is connected to or dependent upon the employee. Data might become difficult to access due to inconsistent dependencies because the path to finding the data may be incomplete or damaged.

    A few guidelines govern database normalization. There is a "normal form" for each rule. The database is considered to be in "first normal form" if the first rule is followed. The database is regarded as being in "third normal form" if the first three requirements are followed. The third normal form is thought to be the maximum level required for most applications, while other levels of normalization are feasible.

    Real-world circumstances don't always allow for flawless conformity, as is the case with many official regulations and specifications. Most of the time, normalizing necessitates adding more tables, which some clients find burdensome. The best MongoDB Course can help you achieve your objective of learning more about normalization if you're considering doing so. 

    Why do we Need normalization?

    The main issue now is, "Why?" Why do I need to spend so much time and effort normalizing my data? If you're not observing any significant issues from non-standardized fields, it can appear superfluous. 

    You are blind to all the unintended consequences of bad data that are never discussed or brought up to management. However, over time, these consequences may harm businesses that rely on big data. That's why we have a need for normalization in DBMS.  

    Your marketing teams are hesitant to add additional data-based personalization to marketing campaigns when you have low-quality data. Even seemingly insignificant errors, like incorrectly capitalizing names, can have a long-term effect on the reputation of your brand. 

    Your sales teams also suffer. Because of the poor quality and missing data, companies lack the crucial context they require to address customers' and prospects' top problems. Poor quality analysis and reduced sales are the direct results of this. 

    Furthermore, poor-quality data negatively influences lead scoring, making it more difficult for sales representatives to accurately segment and categorize prospects so they can engage with them. 

    Here are the top five reasons why all businesses should standardize their customer data: 

    1. Identify duplicate data

    Duplicate customer records are much easier to discover and combine when the data has been standardized. Every step of your customers' customer journey—including all interactions with marketing, sales, and support before and after the sale—is hampered by duplicate customer records. Companies may never be sure they are dealing with all the information when referencing a single record if there are duplicate records. 

    Duplicate data can lead to your prospects receiving identical marketing materials more than once when it comes to marketing. Sales reps may communicate with prospects without the necessary information if a customer's data is split between two records. 

    2. Improve Marketing Segmentation

    If you don't have confidence in the data you are utilizing, how can you successfully segment and categorize customers and prospects so that you can send them targeted messages? 

    Consider yourself a B2B business. According to their work titles, you should divide up your prospects. Yes, it makes sense. To a CEO, you shouldn't pitch your solution the same way you would to a CFO. Your communications should take into account the fact that they have a different set of requirements and worries. 

    Without a normalized database, you can discover that many prospects with similar job titles that should be classified into the same bucket are not. Based on the CEO segment's non-standardized data, you can observe other segments, including: 

    • CEO 
    • Principal Executive 
    • Owner 
    • CEO and founder 
    • Co-Founder/CEO 

    A variety of names might refer to the same piece. These prospects can wind up in distinct buckets without normalization and uniformity, making analysis challenging. 

    3. Improve Lead Scoring & Routing

    Leading scoring involves giving certain leads or accounts in your CRM value to select the greatest possibilities successfully. High-quality data is necessary for effective lead scoring to categorize those prospects actively. Using the earlier example, a B2B may score leads by including their job title as one of the variables. The higher score enables your sales teams to give those leads greater priority. For example, a CEO might be a more valued lead than a Marketing Manager. 

    Well, many of your prospects will receive incorrect scores without normalized data for the job title. This applies to every field that is utilized in the lead-scoring procedure. A complete database's lack of normalization could mean that every lead is affected. 

    4. Inject More Data into Marketing Automation

    Your marketing team must have confidence in the data they are using if they are to improve customization within your marketing initiatives. They won't utilize it if they can't reasonably include first names in a marketing campaign without many instances of "MIKE," "mike," or other frequent typos in the information provided to the clients. 

    Sending tailored messages with poor-quality, non-normalized data is more detrimental than helpful. 

    5. Identify, Aggregate, or Remove Redundant Data

    Even when redundant data is spread across numerous fields in a customer data set, normalized data will enable you to locate it. Before normalizing the data, you need to consider what the field is attempting to express. What is being tracked by the data? Once it is worked out, you can recognize when two fields are attempting to express the same thing and integrate them to reduce expenses and confusion. Data aggregation is essential for analysis. 

    6. Ensure Integrated Apps Work as Intended

    Data normalization enables you to maintain the flawless operation of third-party apps and integrations. Every app has various requirements for how data must be formatted in order to function with the software. Even a minor formatting change can damage a third-party integration. 

    We hope that this post has piqued your interest in DBMS. You can check out the MongoDB Administration Training of KnowledgeHut can help you launch your career. 

    Types of Anomalies in DBMS

    1. Insertion Anomaly

    Imagine that we have a table with four columns. Student identification number, name, address, and grades. Now, even if the first three properties can be filled when a new student enrolls in school, the fourth attribute will have a NULL value because he doesn't yet have any grades.

    2. Deletion Anomaly

    This oddity suggests that crucial data was unnecessarily removed from the table. Suppose we have the following student data and a list of the courses they have taken: (student ID, Student Name, Course, address). Any entry pertaining to a student who leaves the school will be removed. Nevertheless, even if the course depends on the school and not the student, that deletion will also erase the course information.

    The goal of normalization is to make the tables as granular as possible to avoid these problems. To put it simply, it attempts to divide tables into several tables and establish associations between them using keys.

    3. Updation Anomaly

    Consider a table with 10 columns, of which 2 are designated as employee Name and employee Address. Now, we need to update the table whenever an employee changes location. However, if the table is not normalized, one employee may have many entries, and if all of those entries are updated at once, one of them may be overlooked.

    Types of DBMS Normalization Forms [with example]

    types of DBMS normalization forms

    Source

    1. 1NF (First Normal Form)

    According to the first normal form, each table cell can only contain one value. This means that any point where two rows or columns meet must contain atomic values. For instance, each entry for a column with the name and phone number must only save one phone number. 

    If a relation doesn't have any multivalued attributes, it is said to be in 1NF (first normal form). In other terms, a relation is said to be in 1NF if each attribute only includes atomic (single) values. 

    A table's attribute (column) cannot contain more than one value, according to the first normal form rule. It ought to only save atomic values. 

    Let's consider a database normalization example, where a business needs to keep track of its employees' names and contact information. In the database, it makes a table that appears as follows:

    Employee CodeEmployee NameEmployee Phone Number
    101Jolly98765623,998234123
    101Jolly89023467
    102Rehan76213908
    103Snehi98132452

    Employee Phone Number is a multivalued characteristic in this case. This relationship is, therefore, not 1NF. 

    As demonstrated below, we create new rows for each employee's phone number in order to transform this table into 1NF: 

    <EmployeeDetail> 

    Employee CodeEmployee NameEmployee Phone Number
    101Jolly998234123
    101Jolly98765623
    101Jolly89023467
    102Rehan76213908
    103Snehi98132452

    2. 2NF (Second Normal Form)

    The reduction of partial dependencies is necessary for the normalization of 1NF relations to 2NF. Any non-prime attributes, or attributes that are not included in the candidate key, that are only partially functionally dependent on one of the candidate keys are referred to as partial dependencies. 

    A relational table must conform to the following conditions in order to be in second normal form: 

    First, the table needs to be in regular form. 

    It must be completely functionally dependent on the primary key, with no partial dependencies allowed. 

    If there is a partial dependency, we can split the table in two and relocate the partially dependent characteristics to another table where they will fit in better. 

    To better understand partial dependency and how to normalize the table to the second normal form, let's use the following "EmployeeProjectDetail" table as a database normalization example with the solution: 

    <EmployeeProjectDetail>

    Employee CodeProject IDEmployee NameProject Name
    101P03JollyProject103
    101P01JollyProject101
    102P04RehanProject104
    103P02SnehiProject102

    Employee Code and Project ID are the table's primary attributes in the aforementioned table. Because Employee Code may identify Employee Name and Project Name can be determined by Project ID, we have partial dependencies in this table. As a result, the relational table mentioned above breaks the 2NF constraint. 

    The key characteristics that make up one or more candidate keys are the prime characteristics. 

    The EmployeeProjectDetail table can be divided into the following three tables to eliminate partial dependencies and normalize it into the second normal form:

    Employee CodeEmployee Name
    101Jolly
    101Jolly
    102Rehan
    103Snehi

    <EmployeeProject>  

    Employee CodeProject ID
    101P03
    101P01
    102P04
    103P02

    <ProjectDetail>

    Project IDProject Name
    P03Project103
    P01Project101
    P04Project104
    P02Project102

    So, by splitting the EmployeeProjectDetail database into the EmployeeDetail, ProjectDetail, and EmployeeProject tables, we were able to convert it into 2NF. As you can see, the aforementioned tables adhere to the same two 2NF rules as 1NF, and each non-prime property completely depends on the primary key. 

    There is little doubt that the relations in 2NF are less redundant than those in 1NF. Due to the transitive reliance, the decomposed relations could still have one or more anomalies. The Third Normal Form's transitive dependencies will be eliminated. 

    3. 3NF (Third Normal Form)

    Remove any fields that are independent of the key. 

    Records in the table should not contain values that are not a part of the key for that record. In general, whenever a group of fields' contents may be relevant to more than one record in the table, think about putting those data in a different table. 

    For instance, the name and address of a candidate's institution may be listed in an employee recruitment table. For group mailings, however, you require a complete list of universities. There is no way to list universities that don't currently have any candidates if university information is kept in the Candidates table. Make a separate Universities table and use a university code key to connect it to the Candidates table. 

    EXCEPTION: Adhering to the third normal form is not always possible, despite being theoretically desirable. Create separate tables for cities, ZIP codes, sales reps, customer classes, and any other factor that might be repeated in many records if you have a customer database and want to eliminate any potential inter-field dependencies. Normalization in DBMS is worthwhile to pursue in theory. However, numerous small tables might cause performance to suffer or go beyond open file and RAM limits. 

    Applying the third normal form to regularly changing data may be more practical. If any dependent fields still exist, build your application to ask the user to confirm changes to all linked fields.

    Employee CodeEmployee NameEmployee ZipcodeEmployee City
    101Jolly110033Model Town
    101Jolly110044Badarpur
    102Rehan110028Naraina
    103Snehi110064Hari Nagar
    • Employee Code -> Employee City transitive dependency prevents the table mentioned above from being in 3NF because: 
    • Customer Code -> Customer Zipcode 
    • Employee City -> Employee Zipcode 

    Additionally, neither Employee City nor Employee Zipcode is the primary attribute. 

    We can divide the "EmployeeDetail" table into the following two tables to eliminate transitive dependency from it and normalize it into the third normal form:

    Employee CodeEmployee NameEmployee Zipcode
    101Jolly110033
    101Jolly110044
    102Rehan110028
    103Snehi110064

    <EmployeeLocation>

    Employee ZipcodeEmployee City
    110033Model Town
    110044Badarpur
    110028Naraina
    110064Hari Nagar

    By breaking the "EmployeeDetail" data down into the "EmployeeDetail" and "EmployeeLocation" tables, which are both in 2NF and do not have any transitive dependencies, we were able to transform the "EmployeeDetail" table into 3NF. 

    The 2NF and 3NF remove any redundant dependence on candidate keys and set certain additional restrictions on them. There may, however, still be certain dependencies that result in database redundancy. A stricter normal form called BCNF eliminates these redundant elements. 

    4. BCNF (Boyce-Codd Normal Form)

    Boyce-Codd As it has more limitations than 3NF, Normal Form is an improved form of 3NF. 

    A relational table must conform to the following conditions to be in Boyce-Codd normal form: 

    The third normal form of the table is required. 

    X is the table's superkey for every non-trivial functional dependence X -> Y. As a result, if Y is a prime attribute, X cannot be a non-prime attribute. 

    A group of one or more attributes known as a superkey can be used to identify a row in a database table specifically. 

    To better understand how to normalize the table to the BCNF, let's use the following "EmployeeProjectLead" table as an example:

    Employee CodeProject IDProject Leader
    101P03Grey
    101P01Christian
    102P04Hudson
    103P02Petro

    The preceding table satisfies all conventional forms up to 3NF. However, since its candidate key is "Employee Code, Project ID," it defies BCNF's criteria. Project Leader is a non-prime property for the non-trivial functional dependence Project Leader -> Project ID, whereas Project ID is a prime attribute. In BCNF, this is not permitted. 

    We divide the given table into three tables and then translate them into BCNF. 

    <EmployeeProject> 

    Employee CodeProject ID
    101P03
    101P01
    102P04
    103P02

    <ProjectLead>  

    Project LeaderProject ID
    GrapeP03
    ChristianoP01
    HarryP04
    PetricP02

    As a result, we divided the "EmployeeProjectLead" database into the "EmployeeProject" and the "ProjectLead" tables before converting it to BCNF. 

    5. 4NF (Fourth Normal Form)

    It is in fourth normal form if no database table instance has two or more independent, multivalued pieces of data characterizing the relevant object. 

    If a relationship has no multivalued dependencies and is in Boyce Codd's normal form, it is said to be in 4NF. 

    The relationship is said to be a multivalued dependency if there are multiple values of B for a given value of A in a dependency A B. 

    You saw types of normalization in DBMS and how to perform normalization in DBMS with examples. It should give a good idea of going about it and making a better database model. 

    Advantages of Normalization

    There are many benefits to normalizing a database. Some of the main advantages are as follows: 

    1. Utilizing normalized database or data redundancy
    2. Duplication might be removed.
    3. Reduced null values may result from normalization.
    4. results in a more compact database (since there is less data duplication or zero).
    5. Reduce/avoid problems caused by data alteration.
    6. It simplifies the questions.
    7. The database's structure is clearer and easier to understand.
    8. The database can have existing data added to it without any negative effects.

    Disadvantages of Normalization

    There are various drawbacks to normalizing a database. A few disadvantages are as follows: 

    1. When information is dispersed over many tables, it becomes necessary to link them together, extending the work. Additionally, the database becomes more intriguing to recognize. 
    2. Tables will include codes rather than actual data since rewritten data will be saved as lines of numbers rather than actual data. As a result, the query table must constantly be consulted. 
    3. Being designed for programs rather than ad hoc querying, the information model proves to be exceedingly difficult to query. It is made up of SQL that has been accumulated through time, and operating framework cordial query devices often carry out this task. As a result, it might be difficult to demonstrate knowledge and understanding without first comprehending the client's needs. 
    4. The show's pace gradually slows down compared to the typical structural type. 
    5. To successfully finish the standardization cycle, it is vital to have a thorough understanding of the many conventional structures. A bad plan with substantial irregularities and data inconsistencies can result from careless use. 

    What is the Purpose of Normalization?

    As data's usefulness to various types of businesses rises, the purpose of normalization in DBMS, the manner that data is organized when it is present in huge quantities, becomes even more critical. It is evident that good Data normalization of the database is used to get better results like: 

    • Overall business performance increases. 
    • Improving group analysis without being concerned with redundancy. 
    • Imagine the consequences if you failed to arrange your data and missed out on crucial growth opportunities because a website wouldn't load or a vice president didn't get your notes.  

    This in no way exudes achievement or progress. One of the most crucial things you can do for your company right now is to decide to standardize data. 

    Data normalisation is more than simply restructuring the data in a database, as data has increasing value for all businesses. Here are a few of its main advantages: 

    1. cuts down on superfluous data 
    2. ensures consistency of data throughout the database 
    3. improved database design 
    4. more robust database security 
    5. improved and expedited performance 
    6. improved database organization in general 

    A business can gather all the information it needs from any source. However, without data normalization, most of it would just be wasted and not be of any real use of normalization in DBMS to the organization. 

    KnowledgeHut provides the lowest MongoDB dba course costs with an in-depth understanding of the KnowledgeHut MongoDB Course.

    Conclusion

    Redundancy is eliminated using normal forms, which also improves database storage. Throughout this article, we have understood the different types of anomalies in DBMS and normalization forms. In 1NF, we examine the atomicity of a relation's properties. In 2NF, we look for partial relational dependencies. In 3NF, we examine a relation for transitive dependencies. In BCNF, we look for all functional dependents' superkeys in LHS.

    Normalization in DBMS: FAQs

    1What is the main objective of normalizing data?

    Database normalization's primary goals are to reduce data update errors, eliminate redundant data, and make queries easier. Finally, normalization can enhance productivity, boost security, and cut expenses in addition to just standardizing data.

    2Why is Database normalization so important?

    A database must be normalized to reduce duplication (duplicate data) and ensure that only relevant data is kept in each table. Additionally, it stops any problems brought on by insertions, deletions, and updates to the database. The normal forms are the levels of an organization. 

    3Why normalization is required in machine learning?

    The process of normalization is frequently used to prepare data for machine learning. The objective of normalization is to convert the dataset's numerical columns to a common scale without losing information or distorting the ranges of values. 

    Profile

    KnowledgeHut .

    Author

    KnowledgeHut is an outcome-focused global ed-tech company. We help organizations and professionals unlock excellence through skills development. We offer training solutions under the people and process, data science, full-stack development, cybersecurity, future technologies and digital transformation verticals.

    Share This Article
    Ready to Master the Skills that Drive Your Career?

    Avail your free 1:1 mentorship session.

    Select
    Your Message (Optional)

    Upcoming Database Batches &amp; Dates

    NameDateFeeKnow more
    Whatsapp/Chat icon