Normalization
(This is a text version of the slides presented in class. The text
version may be more economical to print than the usual postscript
large size of the slides. The reference for almost all of this material is
the book by Kendall and Kendall, Systems Analysis and Design, Chapter 17.)
-------------------------------------------------------------------------
Normalization
Normalization = transformation of complex data stores into
smaller, simpler, stable data structures.
Purpose of normalization:
(1) to make each attribute have
equal importance (which allows indexing on any attribute), and
(2) to remove redundancy in the data (which facilitates maintenance of
consistency.
slide 1
---------------------------------------------------------------------
Three Steps of Normalization
Step 1: Flatten the relation, i.e., get rid of composite
attributes and multi-valued attributes --> 1NF
Step 2: Remove partial key dependencies --> 2NF
Step 3: Remove nonkey dependencies --> 3NF
slide 2
---------------------------------------------------------------------
Step 1: Flatten the relation --> 1NF
For example, the user view might be a table called
SALES-REPORT, of the form:
(Salesperson-No, Salesperson-Name, Sales-Area, Customer-record*)
where Customer-record* is a multivalued composite attribute:
(Customer-No, Customer-Name, Warehouse-No, Warehouse-location, Sales)*
A couple of rows of the table might look like:
Sp-No Sp-Name Area C-No C-Name W-No Ware-loc Sales
--------------------------------------------------------
3462 Waters West (18765 D Services 4 Fargo 13,540)
(18830 M. Levy 3 Bisma 10,600)
1234 Johnson East (45455 Compuhou 7 James 9,444)
(54512 Building 7 James 7,545)
(12132 Hydraler 9 Kitik 15,444)
--------------------------------------------------------
slide 3
---------------------------------------------------------------------
To convert to first-normal form is easy, simply combine each row in
the non-simple column with the items in the other columns to make
a row in the 1NF table:
Sp-No Sp-Name Area C-No C-Name W-No Ware-loc Sales
-------------------------------------------------------------
3462 Waters West 18765 D Services 4 Fargo 13,540
3462 Waters West 18830 M. Levys 3 Bismarck 10,600
1234 Johnson East 45455 Compuhou 7 Jamestown 9,444
1234 Johnson East 54512 Build Em 7 Jamestown 7,545
1234 Johnson East 12132 Hydraler 9 Kitikack 15,444
-------------------------------------------------------------
Now each attribute has
equal importance and may be indexed for quick access.
slide 4
---------------------------------------------------------------------
Step 2: Remove partial key dependencies --> 2NF
What is the key of the previous table?
--------------- ---------------
Since it is not a single attribute, it is called a compound key.
Do you see any attributes that depend on only part of the key?
Functional dependencies can be written in the form:
attribute1 --> attribute2
"attribute 1 determines attribute 2"
-------------------- --> ----------------------
-------------------- --> ----------------------
-------------------- --> ----------------------
-------------------- --> ----------------------
-------------------- --> ----------------------
-------------------- --> ----------------------
slide 5
---------------------------------------------------------------------
Step 2: Remove partial key dependencies --> 2NF
The key of the previous table:
Salesperson-No, Customer-No
Salesperson-No --> Salesperson-Name
Salesperson-No --> Sales-Area
So split the SALES-REPORT table into two tables
SALESPERSON (Salesperson-No, Salesperson-name, Sales-area), and
SALESPERSON-CUSTOMER (Salesperson-No, Customer-No, Customer-name,
Warehouse-No, Warehouse-loc, Sales-amount)
SALESPERSON
Salesperson-No Salesperson-name Sales-area
------------------------------------------
3462 Waters West
1234 Johnson East
etc.
-------------------------------------------
Is this table in 2NF...how do you know?
slide 6
---------------------------------------------------------------------
Step 2: Remove partial key dependencies --> 2NF
SALESPERSON-CUSTOMER table
Sp-No C-No C-Name W-No Warehouse-loc Sales
------------------------------------------------
3462 18765 Delta Services 4 Fargo 13,540
3462 18830 M. Levy Sons 3 Bismarck 10,600
1234 45455 Compuhouse 7 Jamestown 9,444
1234 54512 Building Sup. 7 Jamestown 7,545
1234 12132 Hydraler Co. 9 Kitikack 15,444
-----------------------------------------------
What is the key?
Is this table in 2NF?
Salesperson-No --> --------------
Customer-No --> ------------------
Customer-No --> ------------------
Customer-No --> ------------------
Customer-No --> ------------------
Salesperson-No, Customer-No --> --------------
slide 7
---------------------------------------------------------------------
Step 2 (cont.): Remove partial key dependencies --> 2NF
So split the Salesperson-Customer table into two:
SALES
(Salesperson-No, Customer-No, Sales-amount), and
CUSTOMER-WAREHOUSE
(Customer-No, Customer-name, Warehouse-No, Warehouse)
SALES
Salesperson-No Customer-No Sales-amount
------------------------------------------
3462 18765 13540
3462 18380 10600
3462 19242 9999
1234 45455 9444
1234 54512 7545
1234 12132 15444
etc.
-------------------------------------------
Is this table in 2NF?
slide 8
---------------------------------------------------------------------
Step 3: Remove non-key dependencies --> 3NF
CUSTOMER-WAREHOUSE table
C-No C-Name W-No Warehouse-loc.
----------------------------------
18765 Delta Services 4 Fargo
18830 M. Levy Sons 3 Bismarck
45455 Compuhouse 7 Jamestown
54512 Building Sup. 7 Jamestown
12132 Hydraler Co. 9 Kitikack
98765 Seward System 9 Kitikack
----------------------------------
Is this table in 2NF?
What is the key?
What are some of the dependencies?
Customer-No --> --------------
Customer-No --> --------------
Customer-No --> --------------
Customer-Name --> ------------
Warehouse-No --> ------------------
Warehouse-loc --> ------------------
slide 9
---------------------------------------------------------------------
Step 3: Remove non-key dependencies --> 3NF
So split the CUSTOMER-WAREHOUSE table into two:
CUSTOMER (Customer-No, Customer-Name, Warehouse-No), and
WAREHOUSE (Warehouse-No, Warehouse-location)
CUSTOMER table
C-No C-Name W-No
----------------------------------
18765 Delta Services 4
18830 M. Levy Sons 3
45455 Compuhouse 7
54512 Building Sup. 7
12132 Hydraler Co. 9
98765 Seward System 9
----------------------------------
Is this table in 3NF?
WAREHOUSE table
W-No Warehouse-loc.
----------------------------------
4 Fargo
3 Bismarck
7 Jamestown
9 Kitikack
----------------------------------
Is this table in 3NF?
slide 10
---------------------------------------------------------------------
So, the intial unnormalized relation SALES-REPORT
produced four 3NF relations:
SALES-REPORT
(Salesperson-No, Salesperson-Name, Sales-Area,
(Customer-No, Customer-Name, Warehouse-No, Warehouse-location, Sales)*)
SALESPERSON
(Salesperson-No, Salesperson-name, Sales-area)
SALES
(Salesperson-number, Customer-number, Sales-amount)
CUSTOMER
(Customer-number, Customer-name, Warehouse-number)
and
WAREHOUSE
(Warehouse-No, Warehouse-location)
Foreign key = a nonkey attribute in one relation which is a key
attribute in another relation
What is the foreign key in the CUSTOMER relation?
Underline it with a dotted line.
slide 11
---------------------------------------------------------------------