Normalization


(This is a text version of the slides presented in class.  The text
version may be more economical to print than the usual postscript
large size of the slides.  The reference for almost all of this material is 
the book by Kendall and Kendall, Systems Analysis and Design, Chapter 17.)

-------------------------------------------------------------------------
                         Normalization

Normalization = transformation of complex data stores into
smaller, simpler, stable data structures.

Purpose of normalization: 

(1) to make each attribute have
equal importance (which allows indexing on any attribute), and 

(2) to remove redundancy in the data (which facilitates maintenance of 
consistency.
                                                         slide 1
---------------------------------------------------------------------
                 Three Steps of Normalization


Step 1: Flatten the relation, i.e., get rid of composite
  attributes and multi-valued attributes --> 1NF

Step 2: Remove partial key dependencies --> 2NF

Step 3: Remove nonkey dependencies --> 3NF
                                                         slide 2
---------------------------------------------------------------------
           Step 1: Flatten the relation --> 1NF

For example, the user view might be a table called 
SALES-REPORT, of the form:

(Salesperson-No, Salesperson-Name, Sales-Area, Customer-record*)

where Customer-record* is a multivalued composite attribute:

(Customer-No, Customer-Name, Warehouse-No, Warehouse-location, Sales)*

A couple of rows of the table might look like:


Sp-No Sp-Name  Area   C-No   C-Name W-No Ware-loc  Sales
--------------------------------------------------------
3462 Waters   West (18765 D Services  4  Fargo   13,540)
                   (18830 M. Levy     3  Bisma   10,600)
1234 Johnson  East (45455 Compuhou    7  James    9,444)
                   (54512 Building    7  James    7,545)
                   (12132 Hydraler    9  Kitik   15,444)
--------------------------------------------------------
                                                         slide 3
---------------------------------------------------------------------

To convert to first-normal form is easy, simply combine each row in
the non-simple column with the items in the other columns to make
a row in the 1NF table:

Sp-No Sp-Name Area C-No  C-Name   W-No  Ware-loc      Sales
-------------------------------------------------------------
3462 Waters   West 18765 D Services  4  Fargo          13,540
3462 Waters   West 18830 M. Levys    3  Bismarck       10,600
1234 Johnson  East 45455 Compuhou    7  Jamestown       9,444
1234 Johnson  East 54512 Build Em    7  Jamestown       7,545
1234 Johnson  East 12132 Hydraler    9  Kitikack       15,444
-------------------------------------------------------------

Now each attribute has
equal importance and may be indexed for quick access.
                                                         slide 4
---------------------------------------------------------------------
         Step 2: Remove partial key dependencies --> 2NF


What is the key of the previous table?

 ---------------   ---------------

Since it is not a single attribute, it is called a compound key.

Do you see any attributes that depend on only part of the key?

Functional dependencies can be written in the form:

attribute1 --> attribute2

"attribute 1 determines attribute 2"

-------------------- --> ----------------------
-------------------- --> ----------------------
-------------------- --> ----------------------
-------------------- --> ----------------------
-------------------- --> ----------------------
-------------------- --> ----------------------
                                                         slide 5
---------------------------------------------------------------------

        Step 2: Remove partial key dependencies --> 2NF

The key of the previous table:
  Salesperson-No, Customer-No

Salesperson-No --> Salesperson-Name

Salesperson-No --> Sales-Area

So split the SALES-REPORT table into two tables

SALESPERSON (Salesperson-No, Salesperson-name, Sales-area), and

SALESPERSON-CUSTOMER (Salesperson-No, Customer-No, Customer-name, 
                        Warehouse-No, Warehouse-loc, Sales-amount)

SALESPERSON

Salesperson-No Salesperson-name Sales-area
------------------------------------------
3462               Waters        West
1234               Johnson       East
etc.
------------------------------------------- 

Is this table in 2NF...how do you know?


                                                         slide 6
---------------------------------------------------------------------


        Step 2: Remove partial key dependencies --> 2NF


SALESPERSON-CUSTOMER table

Sp-No   C-No   C-Name       W-No  Warehouse-loc  Sales
------------------------------------------------
3462  18765 Delta Services  4  Fargo         13,540
3462  18830 M. Levy Sons    3  Bismarck      10,600
1234  45455 Compuhouse      7  Jamestown      9,444
1234  54512 Building Sup.   7  Jamestown      7,545
1234  12132 Hydraler Co.    9  Kitikack      15,444
-----------------------------------------------

What is the key?
Is this table in 2NF?

Salesperson-No --> --------------
Customer-No --> ------------------
Customer-No --> ------------------
Customer-No --> ------------------
Customer-No --> ------------------
Salesperson-No, Customer-No --> --------------

                                                         slide 7
---------------------------------------------------------------------

      Step 2 (cont.): Remove partial key dependencies --> 2NF

So split the Salesperson-Customer table into two:

SALES 
(Salesperson-No, Customer-No, Sales-amount), and

CUSTOMER-WAREHOUSE 
(Customer-No, Customer-name, Warehouse-No, Warehouse)

SALES

Salesperson-No Customer-No Sales-amount
------------------------------------------
3462            18765        13540
3462            18380        10600
3462            19242         9999
1234            45455         9444
1234            54512         7545
1234            12132        15444
etc.
------------------------------------------- 
Is this table in 2NF?


                                                         slide 8
---------------------------------------------------------------------



        Step 3: Remove non-key dependencies --> 3NF

CUSTOMER-WAREHOUSE table

C-No   C-Name         W-No  Warehouse-loc.
----------------------------------
18765 Delta Services  4    Fargo    
18830 M. Levy Sons    3    Bismarck 
45455 Compuhouse      7    Jamestown
54512 Building Sup.   7    Jamestown
12132 Hydraler Co.    9    Kitikack 
98765 Seward System   9    Kitikack 
----------------------------------
Is this table in 2NF?

What is the key?
What are some of the dependencies?

Customer-No --> --------------
Customer-No --> --------------
Customer-No --> --------------
Customer-Name --> ------------
Warehouse-No --> ------------------
Warehouse-loc --> ------------------


                                                         slide 9
---------------------------------------------------------------------

            Step 3: Remove non-key dependencies --> 3NF

So split the CUSTOMER-WAREHOUSE table into two:

CUSTOMER (Customer-No, Customer-Name, Warehouse-No), and

WAREHOUSE (Warehouse-No, Warehouse-location)

CUSTOMER table

C-No   C-Name         W-No 
----------------------------------
18765 Delta Services  4      
18830 M. Levy Sons    3   
45455 Compuhouse      7  
54512 Building Sup.   7  
12132 Hydraler Co.    9  
98765 Seward System   9   
----------------------------------
Is this table in 3NF?



WAREHOUSE table

W-No  Warehouse-loc.
----------------------------------
 4    Fargo    
 3    Bismarck 
 7    Jamestown
 9    Kitikack 
----------------------------------
Is this table in 3NF?

                                                         slide 10
---------------------------------------------------------------------

So, the intial unnormalized relation SALES-REPORT 
produced four 3NF relations:

SALES-REPORT

(Salesperson-No, Salesperson-Name, Sales-Area, 
(Customer-No, Customer-Name, Warehouse-No, Warehouse-location, Sales)*)

SALESPERSON
(Salesperson-No, Salesperson-name, Sales-area)

SALES
(Salesperson-number, Customer-number, Sales-amount)

CUSTOMER
(Customer-number, Customer-name, Warehouse-number)

and 
WAREHOUSE
(Warehouse-No, Warehouse-location)

Foreign key = a nonkey attribute in one relation which is a key
    attribute in another relation

What is the foreign key in the CUSTOMER relation?
Underline it with a dotted line.

                                                         slide 11
---------------------------------------------------------------------