Functional Dependency and Normalization
Functional Dependency and Normalization
Functional Dependency and Normalization
Functional Dependency
Definition: A functional dependency occurs when one attribute in a relation uniquely determines another attribute. This can be written A -> B which would be the same as stating "B is functionally dependent upon A." Examples: In a table listing employee characteristics including Social Security Number (SSN) and name, it can be said that name is functionally dependent upon SSN (or SSN -> name) because an employee's name can be uniquely determined from their SSN. However, the reverse statement (name -> SSN) is not true because more than one employee can have the same name but different SSNs.
Functional dependencies also arise in relationships. Let C be the primary key of an entity and D be the primary key of another entity. Let the two entities have a relationship. If the relationship is one-to-one, we must have C -> D and D -> C. If the relationship is many-to-one, we would have C -> D but not D -> C. For many-to-many relationships, no functional dependencies hold. For example, if C is student number and D is subject number, there is no functional dependency between them. If however, we were storing marks and grades in the database as well, we
would have
(student_number, subject_number) -> marks and we might have marks -> grades
The second functional dependency above assumes that the grades are dependent only on the marks. This may sometime not be true since the instructor may decide to take other considerations into account in assigning grades, for example, the class average mark. For example, in the student database that we have discussed earlier, we have the following functional dependencies:
sno -> sname sno -> address cno -> cname cno -> instructor instructor -> office
Functional dependencies allow us to express constraints that we cannot express with superkeys. Consider the schema
Loan-info-schema = (loan-number, branch-name, customer-name, amount) which is simplification of the Lending-schema that we saw earlier. The set of functional dependencies that we expect to hold on this relation schema is loan-number amount loan-number branch-name We would not, however, expect the functional dependency loan-number customer-name
Anomalies in Database:
Database anomalies are the problems in relations that occur due to redundancy in the relations. These anomalies affect the process of inserting, deleting and modifying data in the relations. Some important data may be lost if a elations is updated that contains database anomalies. It is important to remove these anomalies in order to perform different processing on the relations without any problem. Tables that have redundant data have problems known as anomalies.So data redundancy is a cause of an anomaly. Redundancy is the duplication of the data. There are 3 types of anomalies.
Insert Anomaly:When you insert a record without having it stored on the related record. Delete Anomaly:When you delete some information and lose valuable related information at the same time. Update Anomaly: Any change made to your data will require you to scan all records to make the changes multiple time. Example of a database Anomaly: Suppose you have a hospital database and due to poor normalization, all patients and doctors are in same table. As doctors and patients are separate entities, so when you delete doctor's record, patient record is also deleted and vice versa.
Normalization:
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. Reason for normalization: to prevent possible corruption of DB stemming from update anomalies (insertion, deletion, modification).
It is a formal method that identifies relations based on their primary key and the functional dependencies among their attributes (Constraint between attributes). Functional dependency: Describes the relationship between attributes in a relation. If A and B are attributes of a relation R, B is functionally dependent on A (den. A B), if each value of A in R is associated with exactly one value of B in R.
Determinant: attribute or set of attributes on the left hand side of the arrow.
Types of Normalizations:
Level First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Rule An entity type is in 1NF when it contains no repeating groups of data. An entity type is in 2NF when it is in 1NF and when all of its non-key attributes are fully dependent on its primary key. An entity type is in 3NF when it is in 2NF and when all of its attributes are directly dependent on the primary key.
1. Dependence among attributes in a relation 2. Identification of an attribute or a set of attributes as the key of a relation 3. Multivalued dependency between auributes
Thus the relation is not in 2NF. It can be transformed to 2NF by splitting it into three relations as shown in table 3. In table 3 the relation Orders has Order no. as the key. The relation Order details has the composite key Order no. and Item code. In both relations the non-key attributes are functionally dependent on the whole key. Observe that by transforming to 2NF relations the
repetition of Order date (table 1) has been removed. Further, if an order for an item is cancelled. the price of an item is not lost. For example. if Order no. 1886 for Item cd,e 4629 is cancelled in table 1, then the fourth row will be removed and the price of the item is lost' In table 3 only the fourth row of the table 3(b) is omitted. The item price is not lost as it is available in table 3(c). The data of the order is also not lost as it is in table 3(a).
These relations in 2NF form meet all the "ideal" conditions specified. Observe that the three relations obtained as self-contained. There is no duplication of data within a relation.
Thus it is in 2NF. If it is known that in the college all first year students are accommodated in Ganga hostel, all second year students in Kaveri, all third year students in Krishna, and all fourth year students in Godavari, then the non-key attribute Hostel name is dependent on the non-key attribute Year. This dependency is shown in figure 6.
Observe that given the year of student, his hostel is known and vice versa. The dependency of hostel on year leads to duplication of data as is evident from table 4. If it is decided to ask all first year students to move to Kaveri hostel, and all second year students to Ganga hostel. This change should be made in many places in table 4. Also, when a student's year of study changes, his hostel change should also be noted in Table 4. This is undesirable. Table 4 is said to be in 3NF if it is in 2NF and no non-key attribute is functionally dependent on any other non-key attribute. Table 4 is thus not in 3NF. To transform it to 3NF, we should introduce another relation which includes the functionally related non-key attributes. This is shown in table 5.
Let us consider another example of a relation. The relation Employee is given below and its dependency diagram in figure 7. Employee (Employee code, Employee name, Dcpt., Salary. Project no.. Termination date of project). As can be seen from the figure, the termination date of a project is dependent on the Project no. Thus this reIation is not in 3NF. The 3NF relations are: Employee (Employee code, Employee name. Salary, Project no.) Project (Project no. Termination date)
The relation given in table 6 is in 3NF. Observe, however, that the names of Dept. and Head of Dept. are duplicated. Further, if Professor P2 resigns, rows 3 and 4 are deleted. We lose the information that Rao is the Head of Department of Chemistry. The normalization of the relation is done by creating a new relation for Dept. and Head af Dept. and deleting Head of Dept. From Professor relation. The normalized relations are shown in the following table 7.
and the dependency diagrams for these new relations in figure 8. The dependency diagram gives the important clue to this normalization step as is clear from figures 8 and 9.
Table 8 gives a relation for this problem and figure 10 the dependency diagram(s).
10