Published on

Normalization

Authors
  • avatar
    Name
    Balaram Shiwakoti
    Twitter

When I was preparing for loksewa, this concept really confused me at first. Let me break this down based on what I've learned.

Introduction to Normalization

Normalization is a systematic process used in relational database design to organize table columns and primary keys to minimize data redundancy and improve data integrity..

Its primary goal is to eliminate redundant data (duplication) and ensure data dependencies are logical, meaning data is stored only once and is consistently updated..

Normalization helps in creating a database that is both efficient and reliable, reducing anomalies that can occur during data insertion, deletion, and updates.. Right, my friend helped me understand this. This actually became fun once I understood it.


why do we need normalization?

normalization is essential for several reasons:

  1. Reduce Data Redundancy:

    • Storing the same data multiple times wastes storage space and can lead to inconsistencies. Normalization ensures that each piece of data is stored only once. This was my weak point during prep.
  2. Improve Data Integrity:

    • By reducing redundancy, normalization helps maintain the accuracy and consistency of data. When data is stored in one place, updates are easier and less prone to errors.
  3. Here's the thing - eliminate anomalies:

    • normalization addresses three types of anomalies:
      • insertion anomaly: difficulty inserting new data if it depends on other data that isn't yet available.
      • deletion anomaly: loss of data when a record is deleted, even if that data is still needed.
      • update anomaly: inconsistencies arise when updating redundant data in multiple places.
  4. Better Database Design:

    • It leads to a more logical and organized database structure, making it easier to manage, query, and scale.
  5. Faster Queries (in some cases):

    • While heavily normalized databases might require more joins, reduced redundancy can sometimes lead to faster query performance due to smaller table sizes and less data to scan.

Functional Dependency

Before diving into Normal Forms, it's crucial to understand Functional Dependency.

Let me tell you what worked for me. - A functional dependency is a relationship between attributes in a table, where one attribute or a set of attributes uniquely determines another attribute. Well, i used to mix this up all the time.

we denote it as aba \rightarrow b, meaning attribute a functionally determines attribute b. If we know the value of A, we well, can uniquely determine the value of B..

  • Example: In a student table, StudentID \rightarrow StudentName. Knowing the StudentID allows us to uniquely identify the StudentName.

Normal Forms (NF)

Normalization is achieved by moving through a series of Normal Forms, each addressing specific types of data anomalies.

1. First Normal Form (1NF)

Rules:.

This was my favorite topic by the end.

  1. Each column must contain atomic (single) values. No multi-valued attributes.
  2. Each column must have a unique name.
  3. The order of data does not matter.
  • Example of non-1NF: A Students table where Student_Phone anyway, contains multiple phone numbers in a single cell.
  • Converting to 1NF: Create separate rows for each phone number or a separate table for phone numbers with a foreign key to the Students table.

2. Second Normal Form (2NF)

  • Rules:
    1. Must be in 1NF.
    2. All non-key attributes must be fully functionally dependent on the primary key. This means no partial dependencies.
    • Partial Dependency: A non-key attribute depends on only a part of the composite primary key, not the whole key.
    • Example of non-2NF: A Order_Details table with a composite primary key (OrderID, ProductID), where ProductName depends only on ProductID (part of the key).
    • Converting to 2NF: Create a new table for Products (ProductID, ProductName) and link it with Order_Details using ProductID as a foreign key.

3. Third Normal Form (3NF)

Rules:.

My teacher explained this three times before I got it.

  1. Must be in 2NF.
  2. No transitive dependencies of non-key attributes on the primary key.
  • Transitive Dependency: A non-key attribute depends on another non-key attribute, which in turn depends on the primary key (e.g., ABA \rightarrow B and BCB \rightarrow C, then ACA \rightarrow C is a transitive dependency).
  • Example of non-3NF: A Students table with StudentID, StudentName, DepartmentName, DepartmentHead. Here, DepartmentHead depends on DepartmentName, and DepartmentName depends on StudentID. So DepartmentHead transitively depends on StudentID.
  • Converting to 3NF: kind of Create a new table for Departments (DepartmentName, DepartmentHead) and link it with Students using DepartmentName as a foreign key.

4. Boyce-Codd Normal Form (BCNF)

Rules:.

My teacher explained this three times before I got it.

  1. Must be in 3NF.
  2. For every functional dependency ABA \rightarrow B, A must be a superkey. Right, (a superkey is any attribute or set of attributes that can uniquely identify a row in a table).
  • bcnf is a stricter form of 3nf. Here's the thing - it addresses cases where 3nf might still allow anomalies if a non-key attribute determines a part of the primary key.
  • example (where 3nf isn't bcnf): consider a table (studentid, course, instructor). If an instructor can teach multiple courses but a course is taught by only one instructor, and a student can take multiple courses, there might be a functional dependency like (Course, Instructor) \rightarrow StudentID. If (StudentID, Course) is the primary key, Instructor determines Course which is part of the primary key. BCNF would require splitting this table.
  • Converting to BCNF: Decompose the table into smaller tables until every determinant is a superkey.

Denormalization

While normalization is crucial for data integrity, sometimes, for performance reasons (especially in read-heavy applications or data warehousing), denormalization is applied.

  • Denormalization is the process of intentionally introducing redundancy into a database to improve read performance, often by combining data from multiple tables. It sacrifices some data integrity and redundancy reduction for faster query execution.. When to use: Reporting systems, data warehouses, or applications where query speed is prioritized over strict normalization.. I was worried about this topic.

Tips That Worked for Me

  • Normalization is a design principle, not a hard-and-fast rule that must always be followed to its extreme.
  • The choice of normal form (usually up to 3NF or BCNF) depends on the specific application's requirements and performance considerations.

Common Loksewa Questions

During my exam prep, I noticed these questions keep showing up:

  1. "What is the main goal of normalization?"
  • Answer: To reduce data redundancy and improve data integrity.
  • Tip: This appeared in my practice tests multiple times
  1. "Which Normal Form eliminates partial dependencies?"
  • Answer: Second Normal Form (2NF).
  • Tip: This appeared in my practice tests multiple times
  1. "What anomaly is related to deleting a record losing unintended data?"
  • Answer: Deletion Anomaly.
  • Tip: This appeared in my practice tests multiple times
  1. "What is a superkey in the context of BCNF?"
  • Answer: Any attribute or set of attributes that can uniquely identify a row in a table.
  • Tip: This appeared in my practice tests multiple times