Implementing Data Integrity5
Implementing Data Integrity5
If checks are not applied while defining and creating tables, the data stored in the tables can become
redundant. For example, if you do not store the data about all the employees with complete address
details, then the data would not be useful. Similarly, if a database used by the Human Resource
department stores employee contact details in two separate tables, the details of the employees might
not match. This would result in inconsistency and confusion. Therefore, it is important to ensure that
the data stored in tables is complete and consistent. The concept of maintaining consistency and
completeness of data is called data integrity. Data integrity is enforced to ensure that the data in a
database is accurate, consistent, and reliable. It is broadly classified into the following categories:
Entity integrity: Ensures that each row can be uniquely identified by an attribute called the primary
key. The primary key column contains a unique value in all the rows. In addition, this column cannot be
NULL.
Consider a situation where there might be two candidates for an interview with the same name ‘Jack’.
By enforcing entity integrity, the two candidates can be identified by using the unique code assigned to
them. For example, one candidate can have the code 001 and the other candidate can have the code
002.
Domain integrity: Ensures that only a valid range of values is stored in a column. It can be enforced
by restricting the type of data, the range of values, and the format of data. For example, you have a
table called BranchOffice with a column called City that stores the names of the cities where the branch
offices are located. The offices are located in ‘Beijing’, ‘Nanjing’, ‘Hangzhou’, ‘Dalian’, ‘Suzhou’,
‘Chengdu’, and ‘Guangzhou’. By enforcing domain integrity, you can ensure that only valid values (as per
the list specified) are entered in the City column of the BranchOffice table. Therefore, the user will not
be allowed to store any other city names like ‘New York’ or ‘London’ in the City column of the
BranchOffice table.
Referential integrity: Ensures that the values of the foreign key match the value of the corresponding
primary key. For example, if a bicycle has been ordered and an entry is to be made in the OrderDetail
table, then that bicycle code should exist in the Product table. This ensures that an order is placed only
for the bicycle that is available.
User-defined integrity: Refers to a set of rules specified by a user, which do not belong to the
entity, domain, and referential integrity categories.
When creating tables, SQL Server allows you to maintain integrity by:
Applying constraints.
Applying rules.
Using user-defined data types.
Applying Constraints
Consider an example where a user entered a duplicate value in the EmployeeID column of the Employee
table. This would mean that the two employees have same employee ID. This would further result in
erroneous results when anybody queries the table. As a database developer, you can prevent this by
enforcing data integrity on the table by using constraints. Constraints define rules that must be followed
to maintain consistency and correctness of data. A constraint can be either created while creating a
table or added later. When a constraint is added after the table is created, it checks the existing data. If
there is any violation, then the constraint is rejected. A constraint can be created by using either of the
following statements:
A constraint can be defined on a column while creating a table. It can be created with the CREATE TABLE
statement. The syntax of adding a constraint at the time of table creation is:
CREATE TABLE table_name (column_name CONSTRAINT constraint_name
constraint_type [,CONSTRAINT constraint_name
constraint_type]
where,
Unique constraint
Check constraint
Default constraint
Primary Key Constraint
A primary key constraint is defined on a column or a set of columns whose values uniquely identify all
the rows in a table. These columns are referred to as the primary key columns. A primary key column
cannot contain NULL values since it is used to uniquely identify rows in a table. The primary key
constraint ensures entity integrity. You can define a primary key constraint while creating the table or
you can add it later by altering the table. However, if you define the primary key constraint after
inserting rows, SQL Server will give an error if the rows contain duplicate values in the column. While
defining a primary key constraint, you need to specify a name for the constraint. If a name is not
specified, SQL Server automatically assigns a name to the constraint. If a primary key constraint is
defined on a column that already contains data, then the existing data in the column is screened. If any
duplicate values are found, then the primary key constraint is rejected. The syntax of applying the
primary key constraint while creating a table is:
In the Project table, you can add a primary key constraint while creating the table. You can use the
following statement to apply the primary key constraint:
CREATE TABLE HumanResources.Project (ProjectCode int CONSTRAINT
pkProjectCode PRIMARY KEY, ......)
The preceding statement will create the Project table with a primary key column, ProjectCode. You can
create a primary key using more than one column. For example, you can set the EmployeeID and the
LeaveStartDate columns of the EmployeeLeave table as a composite primary key. You can use the
following statement to apply the composite primary key constraint:
CREATE TABLE HumanResources.EmployeeLeave (EmployeeID int,
LeaveStartDate datetime CONSTRAINT cpkLeaveStartDate
PRIMARY KEY(EmployeeID, LeaveStartDate), .........)
The preceding statement creates the EmployeeLeave table with a composite primary key constraint on
EmployeeID and LeaveStartDate. The name of the constraint is cpkLeaveStartDate.
Unique Constraint
The unique constraint is used to enforce uniqueness on non-primary key columns. A primary key
constraint column automatically includes a restriction for uniqueness. The unique constraint is similar to
the primary key constraint except that it allows one NULL row. Multiple unique constraints can be
created on a table. The syntax of applying the unique constraint when creating a table is:
CREATE TABLE table_name (col_name [CONSTRAINT constraint_name UNIQUE
[CLUSTERED | NONCLUSTERED] (col_name [, col_name
[, col_name [, ...]]]) col_name [, col_name [,
col_name [, ...]]] )
In the preceding figure, ID is the primary key column of the Customers table and Cust_ID is the foreign
key column in the Orders table. A foreign key constraint associates one or more columns (the foreign
key) of a table with an identical set of columns (a primary key column) in another table on which a
primary key constraint has been defined.
For example, in the EmployeeLeave table of the HumanResources schema, you need to add the
foreign key constraint to enforce referential integrity. The EmployeeID column is set as a
primary key in the Employee table of the HumanResources schema. Therefore, you need to set
EmployeeID in the EmployeeLeave table as a foreign key. You can use the following statement
to apply the foreign key constraint in the EmployeeLeave table:
HumanResources.Employee(EmployeeID),
...
... ...)
You can also apply the foreign key constraint in the EmployeeLeave table by using the
following statement:
...
...
HumanResources.Employee(EmployeeID) )
The preceding statement creates the EmployeeLeave table with a foreign key constraint on the
EmployeeID column. The name of the constraint is fkEmployeeID.
A foreign key can also refer to the primary key column in the same table. Consider the following
EmployeeDet table.
);
The following statements can be used to insert the records in the EmployeeDet table:
The preceding statements execute successfully. Now, consider the following INSERT
statement:
INSERT INTO EmployeeDet VALUES (6,10,‘Harry’)
The preceding statement gives an error because the foreign key constraint prevents the insertion
of a record with Mgr_ID that does not exist in the table.
However, you may assign a NULL value for Mgr_ID as done in the first INSERT statement. In
addition to the foreign key constraint, you can apply the cascading referential integrity constraint
to the tables having foreign key relationships. The cascading referential integrity constraint
defines the action that SQL Server performs when an attempt is made to update or delete a row
in a table with a key referenced by a foreign key in another table. SQL Server supports the ON
DELETE and ON UPDATE clauses to apply the cascading referential integrity constraint. The
ON DELETE and ON UPDATE clauses can be used with the following options:
Specifies that if an attempt is made to delete or update a primary key record, an error will be
raised and the delete or update operation will be rolled backed.
Specifies that if an attempt is made to update or delete a primary key record, the corresponding
foreign key record is also updated or deleted, respectively.
Specifies that if an attempt is made to delete or update a primary key record, the corresponding
foreign key record is set to NULL.
Specifies that if an attempt is made to delete or update a primary key record, the corresponding
foreign key record is set to its default values. However, to apply this constraint, all foreign key
columns must have a default definition.
For example, the EmployeeLeave table is associated with the Employee table through the foreign key
relationship. You can use the following statement to apply the cascading referential integrity constraint
on the Employee table:
ALTER TABLE HumanResources.EmployeeLeave ADD CONSTRAINT rfkcEmployeeID
FOREIGN KEY(EmployeeID) REFERENCES HumanResources.Employee (EmployeeID)
ON DELETE NO ACTION ON UPDATE NO ACTION
In the preceding statement, the ON DELETE NO ACTION and ON UPDATE NO ACTION clauses ensure that
any attempt to delete or update the EmployeeID in the Employee table will not be successful.
Check Constraint
A check constraint enforces domain integrity by restricting the values to be inserted in a column. It is
possible to define multiple check constraints on a single column. These are evaluated in the order in
which they are defined.
Expression specifies the conditions that define the check to be made on the column. It can include
elements, such as arithmetic operators and relational operators, or keywords, such as IN, LIKE, and
BETWEEN. A single check constraint can be applied to multiple columns when it is defined at the table
level. For example, while entering project details, you want to ensure that the start date of the project
must be less than or equal to the end date.
You can use the following statement to apply the check constraint on the Project table:
CREATE TABLE HumanResources.Project (
........
........
StartDate datetime,EndDate datetime,Constraint chkDate
CHECK (StartDate <= EndDate)
)
IN: To ensure that the values entered are from a list of constant expressions. The following statement
creates a check constraint, chkLeave on the LeaveType column of the HumanResources.EmployeeLeave
table, thereby restricting the entries to valid leave types:
CREATE TABLE HumanResources.EmployeeLeave (EmployeeID int,
LeaveStartDate datetime CONSTRAINT cpkLeaveStartDate PRIMARY
KEY(EmployeeID, LeaveStartDate),
LeaveEndDate datetime NOT NULL,
LeaveReason varchar(100),
LeaveType char(2) CONSTRAINT chkLeave CHECK(LeaveType
IN(‘CL’,‘SL’,‘PL’))
)
The preceding statement ensures that the leave type can be any one of the three values: CL, PL, or SL.
Here, CL stands for Casual Leave, SL stands for Sick Leave, and PL stands for Privileged Leave.
alter table HumanResources.Employee with nocheck add constraint chkempID check
(employeeID like('[A-E][0-9][0-9][0-9][0-9]'))
LIKE: To ensure that the values entered in specific columns are of a certain pattern. This can be achieved
by using wildcards. For example, the following statement creates a check constraint on DeptCode
column of the Emp table:
CREATE TABLE Emp (......DeptNo char(4) CHECK (DeptNo LIKE ‘[0-9][0-
9][0-9][0-9]’))
In the preceding statement, the check constraint specifies that the DeptNo column can contain only a
value that consists of characters from 0-9.
BETWEEN: To specify a range of constant expressions by using the BETWEEN keyword. The upper and
lower boundary values are included in the range. For example, the following statement creates a check
constraint on the sal column of the EmpTable table:
CREATE TABLE EmpTable (......sal money CHECK (sal BETWEEN 20000 AND
80000) )
In the preceding statement, the check constraint specifies that the sal column can have a value only
between 20000 and 80000. The rules to be followed while creating the check constraint are:
It does not check the existing data in the table if created with the WITH NOCHECK option.
Default Constraint
A default constraint can be used to assign a constant value to a column, and the user need not insert
values for such a column. Only one default constraint can be created for a column, but the column
cannot be an IDENTITY column. The syntax of applying the default constraint while creating a table is:
CREATE TABLE table_name (col_name [CONSTRAINT constraint_name] DEFAULT
(constant_expression | NULL) (col_name [, col_name [, ...]])
)
The preceding statement creates the EmployeeLeave table with a default constraint on the LeaveType
column, where the default value is specified as PL. The name of the constraint is chkDefLeave.
Applying Rules
A rule enforces domain integrity for columns or user-defined data types. The rule is applied to a column
or a user-defined data type before an INSERT or UPDATE statement is issued. In other words, a rule
specifies a restriction on the values of a column or a user-defined data type. Rules are used to
implement business-related restrictions or limitations. A rule can be created by using the CREATE RULE
statement. The syntax of the CREATE RULE statement is:
The variable specified in the conditional expression must be prefixed with the ‘@’ symbol. The
expression refers to the value that is being specified with the INSERT or UPDATE statement. In the
preceding example of the EmployeeLeave table, you applied the check constraint on the LeaveType
column to accept only three values: CL, SL, and PL. You can perform the same task by creating a rule, as
shown in the following statement:
USE AdventureWorks
GO
CREATE RULE ruleType AS @LeaveType IN (‘CL’, ‘SL’, ‘PL’)
GO
where, rulespecifies the name of the rule that you want to bind. object_name specifies the object
on which you want to bind the rule. futureonly_flag applies only when you want to bind the rule
to a user-defined data type. Consider the following example where the rulType rule created for the
LeaveType column of the EmployeeLeave table is bound by using the sp_bindrule stored procedure. You
can use the following statement to bind the rule:
sp_bindrule ‘ruleType’,‘HumanResources.EmployeeLea ve.LeaveType’
Similarly, when you want to remove a rule, the sp_unbindrule stored procedure is used. For example, to
remove the rule from the EmployeeLeave table, you can use the following statement to unbind the rule:
sp_unbindrule ‘HumanResources.EmployeeLeave.LeaveTyp e’
A rule can be deleted by using the DROP RULE statement. The syntax for the DROP RULE statement is:
DROP RULE rule_name
where, rule_nameis the name of the rule to be dropped. For example, you can use the following
statement to delete the rule, ruleType:
DROP RULE ruleType
sp_unbindefault (Transact-SQL)
Defined nullability
Predefined default value that may be bound to the user-defined data type
You can create user-defined data types by using the CREATE TYPE statement. The syntax of the CREATE
TYPE statement is:
CREATE TYPE [ schema_name. ] type_name { FROM base_type [ ( precision
[ , scale ] ) ] [ NULL | NOT NULL ] } [ ; ]
where, schema_name specifies the name of the schema to which the alias data type or the user
defined data type belongs. type_name specifies the name of the alias data type or the user-defined
data type. base_type specifies SQL Server supplied data type on which the alias data type is based.
precision indicates the maximum number of decimal digits that can be stored both to the left and to
the right of the decimal point. Scale specifies a non-negative integer that indicates the maximum
number of decimal digits that can be stored to the right of the decimal point. It can be specified only if
precision is specified and must be less than or equal to the precision. NULL | NOT NULL specifies
whether the data type can hold a null value. If not specified, NULL is the default. The following
statement creates a user-defined data type for descriptive columns:
USE AdventureWorks
GO
CREATE TYPE DSCRP FROM varchar(100) NOT NULL ;
GO
In the preceding statement, a user-defined data type, DSCRP is created to store the varchar data type
and the size limit is specified as 100. Further, it also specifies NOT NULL. Therefore, you can use this data
for the columns that store description, address, and reason. For example, you can use the DSCRP data
type to store the data of the LeaveReason column of the EmployeeLeave table, as shown in the
following statement:
CREATE TABLE HumanResources.EmployeeLeave
(
.......
LeaveReason DSCRP,LeaveType char(2) CONSTRAINT chkLeave
CHECK(LeaveType IN(‘CL’,‘SL’,‘PL’)) CONSTRAINT chkDefLeave DEFAULT
‘PL’
)
Modifying a Table
You need to modify tables when there is a requirement to add a new column, alter the data type of a
column, or add or remove constraints on the existing columns. For example, AdventureWorks stores the
leave details of all the employees in the EmployeeLeave table. According to the requirements, you need
to add another column named ApprovedBy in the table to store the name of the supervisor who
approved the leave of the employee. To implement this change, you can use the ALTER TABLE
statement.
The following statement adds a column named ApprovedBy to the EmployeeLeave table:
USE AdventureWorks
GO
ALTER TABLE HumanResources.EmployeeLeave
ADD ApprovedBy VARCHAR(30) NOT NULL
GO
In the preceding statement, the ApprovedBy column is added that can store string values. You can add a
computed column to a table. A computed column contains values that are rather calculated than
inserted. When you define a computed column, you need to include the expression that calculates the
value for each row of the column. The values for the computed columns are not specified by using the
INSERT statements. The expression for a computed column may refer to the other non-computed
columns from the same table. For example, if the Orders table contains the UnitPrice column and the
OrderQty column, the total cost of each order can be calculated as UnitPrice * OrderQty, as shown in
the following code snippet:
ALTER TABLE Orders
ADD TotalCost AS UnitPrice * OrderQty
The values generated by the expression of the computed column are not stored within the database.
Instead, the values are calculated every time they are required by a query. Therefore the computed
column is virtual. However, you can specify that a computed column be persisted. The values of the
persisted column are stored in the table. These values are recalculated whenever there is a change in
any reference column. You can use the PERSISTED keyword to define the computed column as persisted,
as shown in the following code snippet:
ALTER TABLE Orders
ADD TotalCost AS UnitPrice * OrderQty PERSISTED
A computed column must be persisted if you are adding a check constraint to it or if the column is
marked as NOT NULL. If you need to make the column persisted, it must be deterministic. This means
that the database engine should be able to verify that the column will always produce the same result.
The GetDate() function is not deterministic as its value changes every time the expression is evaluated.
The following statement modifies the Description column of the HumanResources.Project table:
USE AdventureWorks
GO
ALTER TABLE HumanResources.Project
ALTER COLUMN Description varchar(100)
GO
In the preceding statement, the size of the description column is increased to varchar(100). The
following statement drops the column named LeaveStatus from the EmployeeLeave table:
USE AdventureWorks
GO
ALTER TABLE HumanResources.EmployeeLeave
DROP COLUMN LeaveStatus
The following statement adds a constraint called chkRegion to the EmpTable table:
USE AdventureWorks
GO
ALTER TABLE EmpTableADD CONSTRAINT chkRegion
CHECK(Region IN (‘South America’, ‘North America’, ‘Middle East
Asia’))
GO
In the preceding statement, a CHECK constraint is added on the Region column. While modifying a table,
you can drop a constraint when it is not required. You can perform this task by altering the table by
using the ALTER TABLE statement.
The following statement drops the default constraint, chkDefLeave of the EmployeeLeave table:
USE AdventureWorks
GO
ALTER TABLE HumanResources.EmployeeLeave
DROP CONSTRAINT chkDefLeave
GO
In the preceding statement, the chkDefLeave constraint is dropped from the EmployeeLeave table.
Renaming a Table
You can rename a table whenever required. The sp_rename stored procedure is used to rename the
table. You can use sp_rename to rename any database object, such as table, view, stored procedure, or
function. The syntax of the sp_rename stored procedure is: sp_rename old_name, new_name
where, old_nameis the current name of the object.new_nameis the new name of the object. For
example, the following statement renames the EmployeeLeave table:
USE AdventureWorks
GO
sp_rename [HumanResources.EmployeeLeave],
HumanResources.EmployeeVacation]
GO
Dropping a Table
At times, when a table is not required, you need to delete it. A table can be deleted along with all the
associated database objects such as its index, triggers, constraints, and permissions. You can delete a
table by using the DROP TABLE statement.
USE AdventureWorks
DROP TABLE HumanResources.EmployeeVacation