Good database management is crucial for cybersecurity. When data is duplicated in large databases, it can cause issues like complicated relationships, hard maintenance, wasted resources, and more errors. To solve these problems, we use a process called ‘normalization’, introduced by Edgar Codd. This process breaks large tables into smaller, simpler ones, reducing redundancy and data anomalies.
Codd created the First, Second, and Third Normal Forms, and later developed the Boyce-Codd Normal Form with Raymond F. Boyce. DBMS Normalization helps store data logically and makes managing databases easier and more reliable. This is essential for keeping databases secure and protecting them from potential threats.
Let’s explore these ideas further and see how they build a strong foundation for cybersecurity in this article.
What is Normalization in DBMS?
MediumData organization in a database is done by normalization. This entails building tables and linking them to safeguard data and increase adaptability by removing duplication and inconsistent reliance. Redundant data wastes disk space and causes maintenance issues. If data exists in multiple locations, modifications must be made everywhere. Changing a customer's address is simpler if it's only on the Customers table.
Database normalization follows guidelines called "normal forms." The database is in "first normal form" if the first rule is followed and in "third normal form" if the first three rules are followed. Third normal form is usually sufficient for most applications, though other levels are possible.
Real-world circumstances don't always allow flawless conformity. Normalization often adds more tables, which some clients find burdensome. The best MongoDB Course can help you learn more about DBMS normalization.
- 1NF (First Normal Form)
- 2NF (Second Normal Form)
- 3NF (Third Normal Form)
- BCNF (Boyce-Codd Normal Form)
- 4NF (Fourth Normal Form)
Source
1. 1NF (First Normal Form)
According to the first normal form, each table cell can only contain one value. This means that any point where two rows or columns meet must contain atomic values. For instance, each entry for a column with the name and phone number must only save one phone number.
If a relation doesn't have any multivalued attributes, it is said to be in 1NF (first normal form). In other terms, a relation is said to be in 1NF if each attribute only includes atomic (single) values.
A table's attribute (column) cannot contain more than one value, according to the first normal form rule. It ought to only save atomic values.
Let's consider a database normalization example, where a business needs to keep track of its employees' names and contact information. In the database, it makes a table that appears as follows:
Employee Code | Employee Name | Employee Phone Number |
---|
101 | Jolly | 98765623,998234123 |
101 | Jolly | 89023467 |
102 | Rehan | 76213908 |
103 | Snehi | 98132452 |
Employee Phone Number is a multivalued characteristic in this case. This relationship is, therefore, not 1NF.
As demonstrated below, we create new rows for each employee's phone number in order to transform this table into 1NF:
<EmployeeDetail>
Employee Code | Employee Name | Employee Phone Number |
---|
101 | Jolly | 998234123 |
101 | Jolly | 98765623 |
101 | Jolly | 89023467 |
102 | Rehan | 76213908 |
103 | Snehi | 98132452 |
2. 2NF (Second Normal Form)
The reduction of partial dependencies is necessary for the DBMS normalization of 1NF relations to 2NF. Any non-prime attributes, or attributes that are not included in the candidate key, that are only partially functionally dependent on one of the candidate keys are referred to as partial dependencies.
A relational table must conform to the following conditions in order to be in second normal form:
First, the table needs to be in regular form.
It must be completely functionally dependent on the primary key, with no partial dependencies allowed.
If there is a partial dependency, we can split the table in two and relocate the partially dependent characteristics to another table where they will fit in better.
To better understand partial dependency and how to normalize the table to the second normal form, let's use the following "EmployeeProjectDetail" table as a database normalization example with the solution:
<EmployeeProjectDetail>
Employee Code | Project ID | Employee Name | Project Name |
---|
101 | P03 | Jolly | Project103 |
101 | P01 | Jolly | Project101 |
102 | P04 | Rehan | Project104 |
103 | P02 | Snehi | Project102 |
Employee Code and Project ID are the table's primary attributes in the aforementioned table. Because Employee Code may identify Employee Name and Project Name can be determined by Project ID, we have partial dependencies in this table. As a result, the relational table mentioned above breaks the 2NF constraint.
The key characteristics that make up one or more candidate keys are the prime characteristics.
The EmployeeProjectDetail table can be divided into the following three tables to eliminate partial dependencies and normalize it into the second normal form:
Employee Code | Employee Name |
---|
101 | Jolly |
101 | Jolly |
102 | Rehan |
103 | Snehi |
<EmployeeProject>
Employee Code | Project ID |
---|
101 | P03 |
101 | P01 |
102 | P04 |
103 | P02 |
<ProjectDetail>
Project ID | Project Name |
---|
P03 | Project103 |
P01 | Project101 |
P04 | Project104 |
P02 | Project102 |
So, by splitting the EmployeeProjectDetail database into the EmployeeDetail, ProjectDetail, and EmployeeProject tables, we were able to convert it into 2NF. As you can see, the aforementioned tables adhere to the same two 2NF rules as 1NF, and each non-prime property completely depends on the primary key.
There is little doubt that the relations in 2NF are less redundant than those in 1NF. Due to the transitive reliance, the decomposed relations could still have one or more anomalies. The Third Normal Form's transitive dependencies will be eliminated.
3. 3NF (Third Normal Form)
Remove any fields that are independent of the key. Records in the table should not contain values that are not a part of the key for that record. In general, whenever a group of fields' contents may be relevant to more than one record in the table, think about putting those data in a different table.
For instance, the name and address of a candidate's institution may be listed in an employee recruitment table. For group mailings, however, you require a complete list of universities. There is no way to list universities that don't currently have any candidates if university information is kept in the Candidates table. Make a separate Universities table and use a university code key to connect it to the Candidates table.
EXCEPTION: Adhering to the third normal form is not always possible, despite being theoretically desirable. Create separate tables for cities, ZIP codes, sales reps, customer classes, and any other factor that might be repeated in many records if you have a customer database and want to eliminate any potential inter-field dependencies. Normalization in DBMS is worthwhile to pursue in theory. However, numerous small tables might cause performance to suffer or go beyond open file and RAM limits.
Applying the third normal form to regularly changing data may be more practical. If any dependent fields still exist, build your application to ask the user to confirm changes to all linked fields.
Employee Code | Employee Name | Employee Zipcode | Employee City |
---|
101 | Jolly | 110033 | Model Town |
101 | Jolly | 110044 | Badarpur |
102 | Rehan | 110028 | Naraina |
103 | Snehi | 110064 | Hari Nagar |
- Employee Code -> Employee City transitive dependency prevents the table mentioned above from being in 3NF because:
- Customer Code -> Customer Zipcode
- Employee City -> Employee Zipcode
Additionally, neither Employee City nor Employee Zipcode is the primary attribute.
We can divide the "EmployeeDetail" table into the following two tables to eliminate transitive dependency from it and normalize it into the third normal form:
Employee Code | Employee Name | Employee Zipcode |
---|
101 | Jolly | 110033 |
101 | Jolly | 110044 |
102 | Rehan | 110028 |
103 | Snehi | 110064 |
<EmployeeLocation>
Employee Zipcode | Employee City |
---|
110033 | Model Town |
110044 | Badarpur |
110028 | Naraina |
110064 | Hari Nagar |
By breaking the "EmployeeDetail" data down into the "EmployeeDetail" and "EmployeeLocation" tables, which are both in 2NF and do not have any transitive dependencies, we were able to transform the "EmployeeDetail" table into 3NF.
The 2NF and 3NF remove any redundant dependence on candidate keys and set certain additional restrictions on them. There may, however, still be certain dependencies that result in database redundancy. A stricter normal form called BCNF eliminates these redundant elements.
4. BCNF (Boyce-Codd Normal Form)
Boyce-Codd As it has more limitations than 3NF, Normal Form is an improved form of 3NF.
A relational table must conform to the following conditions to be in Boyce-Codd normal form:
The third normal form of the table is required.
X is the table's superkey for every non-trivial functional dependence X -> Y. As a result, if Y is a prime attribute, X cannot be a non-prime attribute.
A group of one or more attributes known as a superkey can be used to identify a row in a database table specifically.
To better understand how to normalize the table to the BCNF, let's use the following "EmployeeProjectLead" table as an example:
Employee Code | Project ID | Project Leader |
---|
101 | P03 | Grey |
101 | P01 | Christian |
102 | P04 | Hudson |
103 | P02 | Petro |
The preceding table satisfies all conventional forms up to 3NF. However, since its candidate key is "Employee Code, Project ID," it defies BCNF's criteria. Project Leader is a non-prime property for the non-trivial functional dependence Project Leader -> Project ID, whereas Project ID is a prime attribute. In BCNF, this is not permitted.
We divide the given table into three tables and then translate them into BCNF.
<EmployeeProject>
Employee Code | Project ID |
---|
101 | P03 |
101 | P01 |
102 | P04 |
103 | P02 |
<ProjectLead>
Project Leader | Project ID |
---|
Grape | P03 |
Christiano | P01 |
Harry | P04 |
Petric | P02 |
As a result, we divided the "EmployeeProjectLead" database into the "EmployeeProject" and the "ProjectLead" tables before converting it to BCNF.
5. 4NF (Fourth Normal Form)
It is in fourth normal form if no database table instance has two or more independent, multivalued pieces of data characterizing the relevant object.
If a relationship has no multivalued dependencies and is in Boyce Codd's normal form, it is said to be in 4NF.
The relationship is said to be a multivalued dependency if there are multiple values of B for a given value of A in a dependency A B.
You saw types of normalization in DBMS and how to perform normalization in DBMS with examples. It should give a good idea of going about it and making a better database model.
Why Do We Need DBMS Normalization?
DBMS Normalization is essential for maintaining high-quality data in DBMS. Even if issues aren't apparent, poor data can harm businesses relying on big data.
Low-quality data hinders marketing teams from personalizing campaigns and can damage brand reputation with simple errors like name capitalization. Sales teams also suffer from missing context, resulting in poor analysis and reduced sales. Lead scoring is negatively impacted, making accurate segmentation difficult for sales reps.
We hope that this post has piqued your interest in DBMS. You can check out the MongoDB Administration Training of KnowledgeHut can help you launch your career.
Types of Anomalies in DBMS
1. Insertion Anomaly
Imagine that we have a table with four columns. Student identification number, name, address, and grades. Now, even if the first three properties can be filled when a new student enrolls in school, the fourth attribute will have a NULL value because he doesn't yet have any grades.
2. Deletion Anomaly
This oddity suggests that crucial data was unnecessarily removed from the table. Suppose we have the following student data and a list of the courses they have taken: (student ID, Student Name, Course, address). Any entry pertaining to a student who leaves the school will be removed. Nevertheless, even if the course depends on the school and not the student, that deletion will also erase the course information.
The goal of normalization is to make the tables as granular as possible to avoid these problems. To put it simply, it attempts to divide tables into several tables and establish associations between them using keys.
3. Updation Anomaly
Consider a table with 10 columns, of which 2 are designated as employee Name and employee Address. Now, we need to update the table whenever an employee changes location. However, if the table is not normalized, one employee may have many entries, and if all of those entries are updated at once, one of them may be overlooked.
What is the Purpose of DBMS Normalization?
As data's usefulness to various types of businesses rises, the purpose of normalization in DBMS, the manner that data is organized when it is present in huge quantities, becomes even more critical. It is evident that good Data normalization of the database is used to get better results like:
- Overall business performance increases.
- Improving group analysis without being concerned with redundancy.
- Imagine the consequences if you failed to arrange your data and missed out on crucial growth opportunities because a website wouldn't load or a vice president didn't get your notes.
A business can gather all the information it needs from any source. However, without data normalization, most of it would just be wasted and not be of any real use of normalization in DBMS to the organization.
KnowledgeHut provides the lowest MongoDB dba course costs with an in-depth understanding of the KnowledgeHut MongoDB Course.
Advantages of DBMS Normalization
There are many benefits to normalizing a database. Some of the main advantages are as follows:
- Utilizing normalized database or data redundancy
- Duplication might be removed.
- Improved database design
- More robust database security
- Improved and expedited performance
- Reduced null values may result from normalization.
- results in a more compact database (since there is less data duplication or zero).
- Reduce/avoid problems caused by data alteration.
- It simplifies the questions.
- The database's structure is clearer and easier to understand.
- The database can have existing data added to it without any negative effects.
Disadvantages of Normalization in DBMS
There are various drawbacks to normalizing a database. A few disadvantages are as follows:
- When information is dispersed over many tables, it becomes necessary to link them together, extending the work. Additionally, the database becomes more intriguing to recognize.
- Tables will include codes rather than actual data since rewritten data will be saved as lines of numbers rather than actual data. As a result, the query table must constantly be consulted.
- Being designed for programs rather than ad hoc querying, the information model proves to be exceedingly difficult to query. It is made up of SQL that has been accumulated through time, and operating framework cordial query devices often carry out this task. As a result, it might be difficult to demonstrate knowledge and understanding without first comprehending the client's needs.
- The show's pace gradually slows down compared to the typical structural type.
- To successfully finish the standardization cycle, it is vital to have a thorough understanding of the many conventional structures. A bad plan with substantial irregularities and data inconsistencies can result from careless use.
Conclusion
Redundancy is eliminated using normal forms, which also improves database storage. Throughout this article, we have understood the different types of anomalies in DBMS and normalization forms. In 1NF, we examine the atomicity of a relation's properties. In 2NF, we look for partial relational dependencies. In 3NF, we examine a relation for transitive dependencies. In BCNF, we look for all functional dependents' superkeys in LHS.