MDB Unit 1 Notes
MDB Unit 1 Notes
UNIT – 1
Overview of DBMS and SQL: Introduction to DBMS and SQL, SQL Data
Definition and Data Types, Schema change statements in SQL, Specifying basic
constraints in SQL, Basic Queries in SQL, More Complex Queries in SQL.
1. INTRODUCTION TO DBMS
WHAT IS DATA?
Word 'Data' is originated from the word 'datum' that means 'single piece of
information.' It is plural of the word datum.
In computing, Data is information that can be translated into a form for efficient
movement and processing. Data is interchangeable.
WHAT IS DATABASE
You can organize data into tables, rows, columns, and index it to make it easier to
find relevant information.
Database handlers create a database in such a way that only one set of software
program provides access of data to all the users.
There are many databases available like MySQL, Sybase, Oracle, MongoDB,
Informix, PostgreSQL, SQL Server, etc.
Characteristics of DBMS
Advantages of DBMS
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed of data processor and
large memory size to run DBMS software.
o Size: It occupies a large space of disks and large memory to run them efficiently.
o Complexity: Database system creates additional complexity and requirements.
o Higher impact of failure: Failure is highly impacted the database because in
most of the organization, all the data stored in a single database and if the
database is damaged due to electric failure or database corruption then the data
may be lost forever.
o Table
o Record/ Tuple
o Field/Column name /Attribute
o Instance
o Schema
o Keys
An RDBMS is a tabular DBMS that maintains the security, integrity, accuracy, and
consistency of the data.
Applications
Banking Management
Business Organization
Manufacturing
Education
Medical Field
Railways
2. INTRODUCTION TO SQL
SQL
o SQL stands for Structured Query Language. It is used for storing and
managing data in relational database management system (RDMS).
o It is a standard language for Relational Database System. It enables a user to
create, read, update and delete relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS Access and SQL Server
use SQL as their standard database language.
o SQL allows users to query the database in a number of ways, using English-
like statements.
Rules:
SQL process:
o When an SQL command is executing for any RDBMS, then the system
figure out the best way to carry out the request and the SQL engine
determines that how to interpret the task.
o In the process, various components are included. These components can be
optimization Engine, Query engine, Query dispatcher, classic, etc.
o All the non-SQL queries are handled by the classic query engine, but SQL
query engine won't handle logical files.
Characteristics of SQL
o SQL is easy to learn.
o SQL is used to access data from relational database management systems.
o SQL can execute queries against the database.
o SQL is used to describe the data.
o SQL is used to define the data in the database and manipulate it when
needed.
o SQL is used to create and drop the database and table.
o SQL is used to create a view, stored procedure, function in a database.
o SQL allows users to set permissions on tables, procedures, and views.
Advantages of SQL
High speed
No coding needed
Well defined standards
Portability
Interactive language
Multiple data view
Each type may include a special value called the null value. A null value
indicates an absent value that may exist but be unknown or that may not exist at
all. In certain cases, we may wish to prohibit null values from being entered, as
we shall see shortly.
The char data type stores fixed length strings. Consider, for example, an
attribute A of type char(10). If we store a string “Avi” in this attribute, 7 spaces
are appended to the string to make it 10 characters long. In contrast, if attribute B
were of type varchar(10), and we store “Avi” in attribute B, no spaces would be
added. When comparing two values of type char, if they are of different lengths
extra spaces are automatically added to the shorter one to make them the same
size, before comparison.
When comparing a char type with a varchar type, one may expect extra spaces
to be added to the varchar type to make the lengths equal, before comparison;
however, this may or may not be done, depending on the database system. As a
result, even if the same value “Avi” is stored in the attributes A and B above, a
comparison A=B may return false. We recommend you always use the varchar
type instead of the char type to avoid these problems.
SQL also provides the nvarchar type to store multilingual data using the
Unicode representation. However, many databases allow Unicode (in the UTF-8
representation) to be stored even in varchar types.
2. Basic Schema Definition
We define an SQL relation by using the create table command. The following
command creates a relation department in the database.
The relation created above has three attributes, dept_name, which is a character
string of maximum length 20, building, which is a character string of maximum
length 15, and budget, which is a number with 12 digits in total, 2 of which are
after the decimal point. The create table command also specifies that
the dept_name attribute is the primary key of the department relation.
The general form of the create table command is:
create table r
(A1 D2,
A2 D2,
...,
An Dn,
{integrity-constraint 1},
...,
{integrity-constraint k });
where r is the name of the relation, each Ai is the name of an attribute in the
schema of relation r, and Di is the domain of attribute Ai; that is, Di specifies the
type of attribute Ai along with optional constraints that restrict the set of allowed
values for Ai.
The semicolon shown at the end of the create table statements, as well as
at the end of other SQL statements later in this chapter, is optional in many SQL
implementations.
SQL supports a number of different integrity constraints. In this section, we
discuss only a few of them:
• primary key (Aj1 , Aj2 ,..., Ajm ): The primary-key specification says that
at?tributes Aj1 , Aj2 ,..., Ajm form the primary key for the relation. The
primary?key attributes are required to be nonnull and unique; that is, no tuple can
have
a null value for a primary-key attribute, and no two tuples in the relation
can be equal on all the primary-key attributes. Although the primary-
key specification is optional, it is generally a good idea to specify a primary key
for each relation.
• foreign key (Ak1 , Ak2 ,..., Akn )references s: The foreign key specification says
that the values of attributes (Ak1 , Ak2 ,..., Akn ) for any tuple in the relation
must correspond to values of the primary key attributes of some tuple in
relation s.
Figure 3.1 presents a partial SQL DDL definition of the university database we
use in the text. The definition of the course table has a declaration “foreign key
(dept_name)references department”. This foreign-key declaration specifies that
for each course tuple, the department name specified in the tuple must exist
in the primary key attribute (dept_name) of the department relation. Without
this constraint, it is possible for a course to specify a nonexistent department
name. Figure 3.1 also shows foreign key constraints on tables section, instructor
and teaches.
• not null: The not null constraint on an attribute specifies that the null value
is not allowed for that attribute; in other words, the constraint excludes the
null value from the domain of that attribute. For example, in Figure 3.1, the
not null constraint on the name attribute of the instructor relation ensures that
the name of an instructor cannot be null.
SQL prevents any update to the database that violates an integrity constraint.
For example, if a newly inserted or modified tuple in a relation has null values for
any primary-key attribute, or if the tuple has the same value on the primary-key
attributes as does another tuple in the relation, SQL flags an error and prevents the
update. Similarly, an insertion of a course tuple with a dept_name value that does
not appear in the department relation would violate the foreign-key constraint on
course, and SQL prevents such an insertion from taking place.
A newly created relation is empty initially. We can use the insert command
to load data into the relation. For example, if we wish to insert the fact that there
is an instructor named Smith in the Biology department with instructor_id 10211
and a salary of $66,000, we write:
insert into instructor
values (10211, ’Smith’, ’Biology’, 66000);
The values are specified in the order in which the corresponding attributes are
listed in the relation schema. The insert command has a number of useful features,
and is covered in more detail later.
We can use the delete command to delete tuples from a relation. The command
delete from student;
would delete all tuples from the student relation. Other forms of the delete
command allow specific tuples to be delete.
To remove a relation from an SQL database, we use the drop table command.
The drop table command deletes all information about the dropped relation from
the database. The command
drop table r;
is a more drastic action than
delete from r;
The latter retains relation r, but deletes all tuples in r. The former deletes not only
all tuples of r, but also the schema for r. After r is dropped, no tuples can be
inserted into r unless it is re-created with the create table command.
We use the alter table command to add attributes to an existing relation. All
tuples in the relation are assigned null as the value for the new attribute. The form
of the alter table command is
alter table r add A D;
where r is the name of an existing relation, A is the name of the attribute to be
added, and D is the type of the added attribute. We can drop attributes from a
relation by the command
alter table r drop A;
where r is the name of an existing relation, and A is the name of an attribute of the
relation. Many database systems do not support dropping of attributes, although
they will allow an entire table to be dropped.
4. SQL DATA TYPES
Data types are used to represent the nature of the data that can be stored in the
database table. For example, in a particular column of a table, if we want to store a
string type of data then we will have to declare a string data type of this column.
• Bit-string 2 types
• fixed length
BIT(n), where n is the maximum number of bits.
• varying length
BIT VARYING(n), where n is the maximum number of bits.
The default for n, the length of a character string or bit string,
is one.
• Literal bit strings are placed between single quotes but
precede by a B ; example B’10101’
The DROP SCHEMA statement allows you to delete a schema from a database.
The following shows the syntax of the DROP SCHEMA statement:
DROP SCHEMA [IF EXISTS] schema_name;
In this syntax:
1. First, specify the name of the schema that you want to drop. If the schema
contains any objects, the statement will fail. Therefore, you must delete all
objects in the schema before removing the schema.
2. Second, use the IF EXISTS option to conditionally remove the schema only
if the schema exists. Attempting to drop a non existing schema without
the IF EXISTS option will result in an error.
Now we want to add a column named "DateOfBirth" in the "Persons" table. We use
the following SQL statement:
Notice that the "DateOfBirth" column is now of type year and is going to hold a
year in a two- or four-digit format.
Next, we want to delete the column named "DateOfBirth" in the "Persons" table.
Syntax
To rename a column in an existing table, the SQL ALTER TABLE syntax is:
For Oracle:
Example
For MySQL:
ALTER TABLE table_name CHANGE COLUMN old_name TO new_name;
Example:
RENAME TABLE
Syntax
Example
6. CONSTRAINTS IN SQL
Now let us try to understand the different constraints available in SQL in more detail
with the help of examples. We will use MySQL database for writing all the queries.
1. NOT NULL
o NULL means empty, i.e., the value is not available.
o Whenever a table's column is declared as NOT NULL, then the value for that
column cannot be empty for any of the table's records.
o There must exist a value in the column to which the NOT NULL constraint
is applied.
NOTE: NULL does not mean zero. NULL means empty column, not even zero.
Create a student table and apply a NOT NULL constraint on one of the table's
column while creating a table.
To verify that the not null constraint is applied to the table's column and the student
table is created successfully, we will execute the following query:
Example:
Consider we have an existing table student, without any constraints applied to it.
Later, we decided to apply a NOT NULL constraint to one of the table's column.
Then we will execute the following query:
To verify that the not null constraint is applied to the student table's column, we will
execute the following query:
2. UNIQUE
o Duplicate values are not allowed in the columns to which the UNIQUE
constraint is applied.
o The column with the unique constraint will always contain a unique value.
o This constraint can be applied to one or more than one column of a table,
which means more than one unique constraint can exist on a single table.
o Using the UNIQUE constraint, you can also modify the already created
tables.
Example:
Create a student table and apply a UNIQUE constraint on one of the table's column
while creating a table.
To verify that the unique constraint is applied to the table's column and the student
table is created successfully, we will execute the following query:
Example:
Create a student table and apply a UNIQUE constraint on more than one table's
column while creating a table.
To verify that the unique constraint is applied to more than one table's column and
the student table is created successfully, we will execute the following query:
Example:
Consider we have an existing table student, without any constraints applied to it.
Later, we decided to apply a UNIQUE constraint to one of the table's column. Then
we will execute the following query:
To verify that the unique constraint is applied to the table's column and the student
table is created successfully, we will execute the following query:
3. PRIMARY KEY
o PRIMARY KEY Constraint is a combination of NOT NULL and Unique
constraints.
o NOT NULL constraint and a UNIQUE constraint together forms a
PRIMARY constraint.
o The column to which we have applied the primary constraint will always
contain a unique value and will not allow null values.
Example:
Create a student table and apply the PRIMARY KEY constraint while creating a
table.
To verify that the primary key constraint is applied to the table's column and the
student table is created successfully, we will execute the following query:
Example:
Consider we have an existing table student, without any constraints applied to it.
Later, we decided to apply the PRIMARY KEY constraint to the table's column.
Then we will execute the following query:
To verify that the primary key constraint is applied to the student table's column, we
will execute the following query:
4. FOREIGN KEY
o A foreign key is used for referential integrity.
o When we have two tables, and one table takes reference from another table,
i.e., the same column is present in both the tables and that column acts as a
primary key in one table. That particular column will act as a foreign key in
another table.
Example:
Create an employee table and apply the FOREIGN KEY constraint while creating a
table.
To create a foreign key on any table, first, we need to create a primary key on a
table.
To verify that the primary key constraint is applied to the employee table's column,
we will execute the following query:
Now, we will write a query to apply a foreign key on the department table referring
to the primary key of the employee table, i.e., Emp_ID.
Example:
Create an employee table and apply the FOREIGN KEY constraint with a constraint
name while creating a table.
To create a foreign key on any table, first, we need to create a primary key on a
table.
To verify that the primary key constraint is applied to the student table's column, we
will execute the following query:
To verify that the foreign key constraint is applied to the department table's column,
we will execute the following query:
Example:
To verify that the foreign key constraint is applied to the department table's column,
we will execute the following query:
Example:
Create a student table and apply CHECK constraint to check for the age less than or
equal to 15 while creating a table.
Example:
Create a student table and apply CHECK constraint to check for the age less than or
equal to 15 and a percentage greater than 85 while creating a table.
To verify that the check constraint is applied to the age and percentage column, we
will execute the following query:
Example:
Consider we have an existing table student. Later, we decided to apply the CHECK
constraint on the student table's column. Then we will execute the following query:
To verify that the check constraint is applied to the student table's column, we will
execute the following query:
Whenever a default constraint is applied to the table's column, and the user has not
specified the value to be inserted in it, then the default value which was specified
while applying the default constraint will be inserted into that particular column.
Example:
Create a student table and apply the default constraint while creating a table.
To verify that the default constraint is applied to the student table's column, we will
execute the following query:
Example:
To verify that the default constraint is applied to the student table's column, we will
execute the following query:
CREATE INDEX constraint is used to create an index on the table. Indexes are not
visible to the user, but they help the user to speed up the searching speed or retrieval
of data from the database.
Example:
Create an index on the student table and apply the default constraint while creating a
table.
To verify that the create index constraint is applied to the student table's column, we
will execute the following query:
Example:
To verify that the create index constraint is applied to the student table's column, we
will execute the following query:
To verify that the create index constraint is applied to the student table's column, we
will execute the following query:
There are five types of SQL commands: DDL, DML, DCL, TCL, and DQL.
o CREATE
o ALTER
o DROP
o TRUNCATE
Example:
b. DROP: It is used to delete both the structure and record stored in the table.
Syntax
Example
c. ALTER: It is used to alter the structure of the database. This change could be either
to modify the characteristics of an existing attribute or probably to add a new attribute.
Syntax:
EXAMPLE
d. TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.
Syntax:
TRUNCATE TABLE table_name;
Example:
o INSERT
o UPDATE
o DELETE
a. INSERT: The INSERT statement is a SQL query. It is used to insert data into the
row of a table.
Syntax:
Or
For example:
For example:
UPDATE students
SET User_Name = 'Sonoo'
WHERE Student_Id = '3'
Syntax:
For example:
DCL commands are used to grant and take back authority from any database user.
o Grant
o Revoke
Example
Example
REVOKE SELECT, UPDATE ON MY_TABLE FROM USER1, USER2;
TCL commands can only use with DML commands like INSERT, DELETE and
UPDATE only.
These operations are automatically committed in the database that's why they cannot
be used while creating tables or dropping them.
o COMMIT
o ROLLBACK
o SAVEPOINT
a. Commit: Commit command is used to save all the transactions to the database.
Syntax:
COMMIT;
Example:
b. Rollback: Rollback command is used to undo transactions that have not already
been saved to the database.
Syntax:
ROLLBACK;
Example:
Syntax:
SAVEPOINT SAVEPOINT_NAME;
o SELECT
Syntax:
SELECT expressions
FROM TABLES
WHERE conditions;
For example:
SELECT emp_name
FROM employee
WHERE age > 20;
SELECT-FROM-WHERE : Example queries on company relational schema
Query 1: Retrieve the name and address of all employees who work for the
'Research' department.
Q1: SELECT name, ADDRESS
FROM EMPLOYEE, DEPARTMENT
WHERE DNUMBER=DNO AND DNAME='Research'
Query 2: For every project located in ‘blore', list the project number, the controlling
department number, and the department manager's name, address, and birthdate.
SELECT PNUMBER, DNUM, LNAME, BDATE, ADDRESS
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE DNUM=DNUMBER AND MGRSSN=SSN AND
PLOCATION=‘blore’ ;
In Q2, there are two join conditions and one select condition
• The join condition DNUM=DNUMBER relates a project to its
controlling department.
• The join condition MGRSSN=SSN relates the department to the
employee who manages that department.
• PLOCATION=‘blore' is selection condition for Project table
Example
Consider the following two tables.
Table 1 − CUSTOMERS Table is as follows. Table 2 − ORDERS Table is as follows.
table alias.
SQL> SELECT C.ID, C.NAME, C.AGE, O.AMOUNT FROM CUSTOMERS AS
C, ORDERS AS O WHERE C.ID = O.CUSTOMER_ID;
Result.
Column alias.
SQL> SELECT ID AS CUSTOMER_ID, NAME AS CUSTOMER_NAME FROM
CUSTOMERS WHERE SALARY IS NOT NULL;
Result.
Note that the two tables have a “Name” column in common apart from the
EmployeeID — which is always a number.
SELECT [Name], [Salary], [Name], [Age]FROM TABLE1 A INNER JOIN
TABLE2 B ON A.EmployeeID = B.EmployeeID
If you run the above query, you will get this error — “Ambiguous name
column”.
This means two columns have the same column name — that is the “Name”
column. The SQL Machine is confused as to which “Name” out of the two tables
you are referring to. It is ambiguous — not clear.
To clarify this, add the alias of either or both TABLE1 or TABLE2 to the columns
having the same name. You will notice above, the alias of TABLE1 is A while that
of TABLE2 is B.
So, let’s fix the bug.
SELECT A.[Name], [Salary], B.[Name], [Age]FROM TABLE1 A INNER JOIN
TABLE2 B ON A.EmployeeID = B.EmployeeID
Run the query. No error!
Ambiguity also arises if some queries need to refer to the same relation twice
• In this case, aliases are given to the relation name
1. Query 8: For each employee, retrieve the employee's name, and the
name of his or her immediate supervisor.
Q8: SELECT E.FNAME, E.LNAME, S.FNAME,
S.LNAME
FROM EMPLOYEE AS E, EMPLOYEE AS S
WHERE E.SUPERSSN=S.SSN
2. In Q8, the alternate relation names E and S are called aliases or
tuple variables for the EMPLOYEE relation
3. We can think of E and S as two different copies of EMPLOYEE; E
represents employees in role of supervisees and S represents
employees in role of supervisors
4. Use of AS is optional. E.g. : Use EMPLOYEE E, instead of
EMPLOYEE AS E,
Unspecified WHERE-clause and use of Asteric (*)
Query 9 and Query 10: Retrieve the SSN values for all employees.
Q9:SELECT SSN
FROM EMPLOYEE
• SQL does not treat a relation as a set; duplicate tuples can appear
• To eliminate duplicate tuples in a query result, the keyword DISTINCT
is used.
Query 11 : Retrieve the salary of every employee (Q11) and all distinct salary
values (Q11A)
Q11 : SELECT SALARY FROM EMPLOYEE;
The SQL Set operation is used to combine the two or more SQL SELECT
statements.
Types of Set Operation
1. Union
2. UnionAll
3. Intersect
4. Minus
1. Union
• The SQL Union operation is used to combine the result of two or more SQL
SELECT queries.
• In the union operation, all the number of datatype and columns must be same
in both the tables on which UNION operation is being applied.
• The union operation eliminates the duplicate rows from its resultset.
Syntax(UNION)
SELECT column_name FROM table1
UNION
SELECT column_name FROM table2;
The First table
2. Union All
Union All operation is equal to the Union operation. It returns the set without
removing duplication and sorting the data.
Syntax:
SELECT column_name FROM table1
UNION ALL
SELECT column_name FROM table2;
The First table
3. INTERSECT
• It is used to combine two SELECT statements. The Intersect operation
returns the common rows from both the SELECT statements.
• In the Intersect operation, the number of datatype and columns must be the
same.
• It has no duplicates and it arranges the data in ascending order by default.
• Syntax
INTERSECT
Result
4. Minus
It combines the result of two SELECT statements. Minus operator is used to
display the rows which are present in the first query but absent in the second query.
It has no duplicates and data arranged in ascending order by default.
Syntax
SELECT column_name FROM table1
MINUS
SELECT column_name FROM table2;
EXCEPT:
1. In SQL, EXCEPT returns those tuples that are returned by the first SELECT
operation, and not returned by the second SELECT operation.
2. This is the same as using a subtract operator in relational algebra.
Example:
Say we have two relations, Students and TA (Teaching Assistant). We want to
return all those students who are not teaching assistants. The query can be
formulated as:
SELECT Name
FROM Students
EXCEPT
SELECT NAME
FROM TA;
Output:
1. Rohan
2. Mansi
3. Megha
EXCEPT
1. In SQL, EXCEPT returns those tuples that are returned by the first SELECT
operation, and not returned by the second SELECT operation.
2. This is the same as using a subtract operator in relational algebra.
Except ALL
To retain duplicates, we must explicitly write EXCEPTALL instead of EXCEPT.
INTERSECT
Intersect returns the common rows of two or more table. Intersect removes
the duplicate after combining.
INTERSECT ALL
Intersect all does not remove duplicate.
Note :
Both INTERSECT and INTERSECT ALL returns the common rows of two
different SQLs. They differ in the way they handle duplicates.
Substring Pattern Matching and Arithmetic Operators
• The LIKE comparison operator can be used for string pattern
matching.
• Partial strings are specified using two reserved characters:
1. % replaces an arbitrary number of zero or more characters
2. the underscore (_) replaces a single character.
QUERY 12 : Retrieve all employees whose address is in Houston.
QUERY 12A : Find all employees who were born during the 1950s.
[YYYY-MM-DD]
LIKE Syntax
SELECT column1, column2, ...
FROM table_name
WHERE columnN LIKE pattern;
Here are some examples showing different LIKE operators with '%' and '_'
wildcards:
Q3. Select all students with a studentname that have “li” in any position
SELECT * FROM students
WHERE studentname LIKE '%li’;
Q4. Select all students with a studentname that have “o” in the second position:
Q5. Select all students with a studentname that start with “a” and are at least 5
characters in length
Q6. Select all students with a studentname that start with “s” and end with “y”
SELECT * FROM students
WHERE studentname LIKE 's%y';
If an underscore or % is needed as a literal character in the string, the character
should be preceded by an escape character, which is specified after the string
using the keyword ESCAPE.
For example:
'AB\_CD\%EF' ESCAPE '\' represents the literal string ‘AB_CD%EF',
because \ is specified as the escape character
• If an apostrophe (') is needed, it is represented as two consecutive
apostrophes (") so that it will not be interpreted as ending the string.
The standard arithmetic operators for addition (+), subtraction (-), multiplication (*),
and division (/) can be applied to numeric values or attributes with numeric
domains.
Example :
QUERY 13 : Show the resulting salaries if every employee working on the
'ProductX' project is given a 10 percent raise.
Q13: SELECT FNAME, LNAME, 1.1 * SALARY AS INCREASED_SAL
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE SSN=ESSN AND PNO=PNUMBER AND
PNAME='ProductX';
ORDER BY Example
The following SQL statement selects all customers from the "Customers" table,
sorted by the "Country" column:
Example
SELECT * FROM Customers
ORDER BY Country;
ORDER BY DESC Example
The following SQL statement selects all customers from the "Customers" table,
sorted DESCENDING by the "Country" column:
SELECT * FROM Customers
ORDER BY Country DESC;
• SQL allows the user to order the tuples in the result of a query by the values
of one or more attributes, using the ORDER BY clause.
QUERY 15 : Retrieve a list of employees and the projects they are working on,
ordered by department and, within each department, ordered alphabetically by last
name, first name.
Q15: SELECT DNAME, LNAME, FNAME, PNAME
FROM DEPARTMENT, EMPLOYEE, WORKS_ON, PROJECT
WHERE DNUMBER=DNO AND SSN=ESSN AND PNO=PNUMBER
ORDER BY DNAME, LNAME, FNAME;
Note : 1. The default order is in ascending order of values.
2. We can specify the keyword DESC if we want to see the result in a
descending order of values.
This chapter describes more advanced features of the SQL language standard for
relational databases. We start in Section 5.1 by presenting more complex features of
SQL retrieval queries, such as nested queries, joined tables, outer joins, aggregate
functions, and grouping. In Section 5.2, we describe the CREATE ASSERTION
statement, which allows the specification of more general constraints on the
database. We also introduce the concept of triggers and the CREATE TRIGGER
statement, which will be presented in more detail in Section 26.1 when we present
the principles of active databases. Then, in Section 5.3, we describe the SQL facility
for defining views on the database. Views are also called virtual or derived tables
because they present the user with what appear to be tables; however, the
information in those tables is derived from previously defined tables. Section 5.4
introduces the SQL ALTER TABLE statement, which is used for modifying the
database tables and constraints. Section 5.5 is the chapter summary.
This chapter is a continuation of Chapter 4. The instructor may skip parts of this
chapter if a less detailed introduction to SQL is intended.
(c) NOT
TRUE FALSE
FALSE TRUE
UNKNOWN UNKNOWN
In Tables 5.1(a) and 5.1(b), the rows and columns represent the values of the results of
comparison conditions, which would typically appear in the WHERE clause of an
SQL query. Each expression result would have a value of TRUE, FALSE, or
UNKNOWN. The result of combining the two values using the AND logical
connec- tive is shown by the entries in Table 5.1(a). Table 5.1(b) shows the result of
using the OR logical connective. For example, the result of (FALSE AND
UNKNOWN) is FALSE, whereas the result of (FALSE OR UNKNOWN) is
UNKNOWN. Table 5.1(c) shows the result of the NOT logical operation. Notice
that in standard Boolean logic, only TRUE or FALSE values are permitted; there is
no UNKNOWN value.
In select-project-join queries, the general rule is that only those combinations of
tuples that evaluate the logical expression in the WHERE clause of the query to
TRUE are selected. Tuple combinations that evaluate to FALSE or UNKNOWN are
not selected. However, there are exceptions to that rule for certain operations, such as
outer joins, as we shall see in Section 5.1.6.
SQL allows queries that check whether an attribute value is NULL. Rather than using
= or <> to compare an attribute value to NULL, SQL uses the comparison operators
IS or IS NOT. This is because SQL considers each NULL value as being distinct
from every other NULL value, so equality comparison is not appropriate. It follows
that when a join condition is specified, tuples with NULL values for the join attributes
are not included in the result (unless it is an OUTER JOIN; see Section 5.1.6).
Query 18 illustrates this.
Query 18. Retrieve the names of all employees who do not have supervisors.
Example
The NULL value can cause problems when selecting data. However, because
when comparing an unknown value to any other value, the result is always
unknown and not included in the results. You must use the IS NULL or IS NOT
NULL operators to check for a NULL value.
IS NULL operator.
SQL> SELECT ID, NAME, AGE, ADDRESS, SALARY FROM
CUSTOMERS WHERE SALARY IS NULL;
FROM PROJECT
WHERE Pnumber IN
( SELECT Pnumber
FROM PROJECT, DEPARTMENT, EMPLOYEE
Pnumber IN
( SELECT Pno
FROM WORKS_ON, EMPLOYEE
WHERE Essn=Ssn AND Lname=‘Smith’ );
If a nested query returns a single attribute and a single tuple, the query result will be a
single (scalar) value. In such cases, it is permissible to use = instead of IN for the
comparison operator. In general, the nested query will return a table (relation),
which is a set or multiset of tuples.
SQL allows the use of tuples of values in comparisons by placing them within
parentheses. To illustrate this, consider the following query:
WHERE E.Fname=D.Dependent_name
AND E.Sex=D.Sex );
In the nested query of Q16, we must qualify E.Sex because it refers to the Sex attrib- ute
of EMPLOYEE from the outer query, and DEPENDENT also has an attribute called
Sex. If there were any unqualified references to Sex in the nested query, they would
refer to the Sex attribute of DEPENDENT. However, we would not have to qualify
the attributes Fname and Ssn of EMPLOYEE if they appeared in the nested query
because the DEPENDENT relation does not have attributes called Fname and Ssn, so
there is no ambiguity.
It is generally advisable to create tuple variables (aliases) for all the tables referenced in
an SQL query to avoid potential errors and ambiguities, as illustrated in Q16.
1. Subqueries with the SELECT Statement
Subqueries are most frequently used with the SELECT statement.
SELECT column_name [, column_name ]
FROM table1 [, table2 ]
WHERE column_name OPERATOR
(SELECT column_name [, column_name ]
FROM table1 [, table2 ]
[WHERE])
Example
Consider the CUSTOMERS table having the following records
Example
Consider the CUSTOMERS table having the following records
Subqueries also can be used with INSERT statements. The INSERT statement
uses the data returned from the subquery to insert into another table. The
selected data in the subquery can be modified with any of the character, date
or number functions.
INSERT INTO table_name [ (column1 [, column2 ]) ]
SELECT [ *|column1 [, column2 ]
FROM table1 [, table2 ]
[ WHERE VALUE OPERATOR ]
Example
Consider a table CUSTOMERS_BKP with similar structure as CUSTOMERS table.
Now to copy the complete CUSTOMERS table into the CUSTOMERS_BKP table, you can
use the following syntax.
SQL> INSERT INTO CUSTOMERS_BKP
SELECT * FROM CUSTOMERS
WHERE ID IN
(SELECT ID FROM CUSTOMERS) ;
Example
Assuming, we have a CUSTOMERS_BKP table available which is a backup
of the CUSTOMERS table. The following example deletes the records from
the CUSTOMERS table for all the customers whose AGE is greater than or
equal to 27.
SQL> DELETE FROM CUSTOMERS
WHERE AGE IN (SELECT AGE FROM CUSTOMERS_BKP
WHERE AGE >= 27 );
Result:
Example :
Query 4 can be rephrased to use nested queries as shown in Q4A.
(Make a list of all project numbers for projects that involve an employee whose
last name is 'Smith', either as a worker or as a manager of the department that
controls the project.)
Q4A: SELECT DISTINCT PNUMBER FROM PROJECT
WHERE PNUMBER IN
(SELECT PNUMBER FROM PROJECT, DEPARTMENT,
EMPLOYEE
WHERE DNUM=DNUMBER AND MGRSSN=SSN AND
LNAME=‘Smith’)
OR
PNUMBER IN (SELECT PNO FROM WORKS_ON, EMPLOYEE
WHERE ESSN=SSN AND LNAME='Smith');
The IN operator can also compare a tuple of values in parentheses with a set or
multiset of union-compatible tuples.
SELECT DISTINCT ESSN FROM WORKS_ON
WHERE (PNO, HOURS) IN
(SELECT PNO, HOURS FROM WORKS_ON WHERE SSN=‘123456789’);
• This query will select the social security numbers of all employees who work the
same (project, hours) combination on some project that employee 'John Smith'
(whose SSN ='123456789') works on.
• In addition to the IN operator, ANY , SOME and ALL comparison operators can
be used to compare a single value v (typically an attribute name) to a set or multiset
V (typically a nested query).
• ANY , SOME , ALL can be used with =, >,>=, <, <=, and < >.
Example : Retrieve names of employees whose salary is greater than the salary of all the
employees in department 5:
• The = ANY (or = SOME) operator returns TRUE if the value v is equal to some
value in the set V and is hence equivalent to IN.
• In general, we can have several levels of nested queries. We can once again be faced
with possible ambiguity among attribute names if attributes of the same name exist-
one in a relation in the FROM clause of the outer query, and another in a relation in
the FROM clause of the nested query.
• The rule is that a reference to an unqualified attribute refers to the relation declared
in the innermost nested query.
• To refer to an attribute of the PROJECT relation specified in the outer query, we can
specify and refer to an alias (tuple variable) for that relation.
To illustrate the potential ambiguity of attribute names in nested queries, consider Query 16.
QUERY 16 : Retrieve the name of each employee who has a dependent with the same first
name and same sex as the employee.
• In the nested query of Q16, we must qualify E. SEX because it refers to the SEX
attribute of EMPLOYEE from the outer query, and DEPENDENT also has an
attribute called SEX.
• However, we do not have to qualify FNAME and SSN because the DEPENDENT
relation does not have attributes called FNAME and SSN, so there is no ambiguity.
SQL correlated subquery which is a subquery that uses values from the outer query.
If a sub query depends on outer query or the outer query depends on inner query
Finds employees whose salary is greater than the average salary of all employees:
SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE salary >
(SELECT AVG(salary) FROM employees);
SQL correlated subquery which is a subquery that uses values from the outer
query.
If a sub query depends on outer query or the outer query depends on inner query
Finds employees whose salary is greater than the average salary of all employees:
SELECT employee_id, first_name, last_name, salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
Finds employees whose salary is greater than the average salary of all employees:
First, you can execute the subquery that returns the average salary of all employees
independently.
Second, the database system needs to evaluate the subquery only once.
Third, the outer query makes use of the result returned from the subquery. The outer query
depends on the subquery for its value. However, the subquery does not depend on the outer
query. Sometimes, we call this subquery is a plain subquery.
SQL correlated subquery which is a subquery that uses values from the outer query.
If a sub query depends on outer query or the outer query depends on inner query
Syntax:
SELECT column1,column2,…..
FROM table_name T1
WHERE condition IN | NOT IN
(SELECT column1, FROM table_name T2
WHERE T1.column = T2.column)
SQL correlated subquery which is a subquery that uses values from the outer
query.
If a sub query depends on outer query or the outer query depends on inner query
SELECT employee_id, first_name from employee e
where department_id in
(select department_id from departments d
where d.department_id = e.department_id);
EXISTS and NOT EXISTS are typically used in conjunction with a correlated
nested query. In Q16B, the nested query references the Ssn, Fname, and Sex attributes
of the EMPLOYEE relation from the outer query. We can think of Q16B as follows:
For each EMPLOYEE tuple, evaluate the nested query, which retrieves all
DEPENDENT tuples with the same Essn, Sex, and Dependent_name as the
EMPLOYEE tuple; if at least one tuple EXISTS in the result of the nested query, then
select that EMPLOYEE tuple. In general, EXISTS(Q) returns TRUE if there is at
least one tuple in the result of the nested query Q, and it returns FALSE otherwise.
On the other hand, NOT EXISTS(Q) returns TRUE if there are no tuples in the
result of nested query Q, and it returns FALSE otherwise. Next, we illustrate the use
of NOT EXISTS.
Query 6. Retrieve the names of employees who have no dependents.
FROM EMPLOYEE
FROM PROJECT
WHERE Dnum=5)
EXCEPT ( SELECT Pno
FROM WORKS_ON
WHERE Ssn=Essn) );
In Q3A, the first subquery (which is not correlated with the outer query) selects all
projects controlled by department 5, and the second subquery (which is correlated)
selects all projects that the particular employee being considered works on. If the set
difference of the first subquery result MINUS (EXCEPT) the second subquery result
is empty, it means that the employee works on all the projects and is therefore selected.
The second option is shown as Q3B. Notice that we need two-level nesting in Q3B
and that this formulation is quite a bit more complex than Q3A, which uses NOT
EXISTS and EXCEPT.
Q3B: SELECT Lname, Fname
FROM EMPLOYEE
FROM WORKS_ON B
FROM PROJECT
WHERE Dnum=5 )
AND
WHERE C.Essn=Ssn
AND C.Pno=B.Pno )));
In Q3B, the outer nested query selects any WORKS_ON (B) tuples whose Pno is of
a project controlled by department 5, if there is not a WORKS_ON (C) tuple with
the same Pno and the same Ssn as that of the EMPLOYEE tuple under
consideration in the outer query. If no such tuple exists, we select the EMPLOYEE
tuple. The form of Q3B matches the following rephrasing of Query 3: Select each
employee such that there does not exist a project controlled by department 5 that the
employee does not work on. It corresponds to the way we will write this query in tuple
relation calculus (see Section 6.6.7).
There is another SQL function, UNIQUE(Q), which returns TRUE if there are no
duplicate tuples in the result of query Q; otherwise, it returns FALSE. This can be
used to test whether the result of a nested query is a set or a multiset.
WHERE Dname=‘Research’;
The default type of join in a joined table is called an inner join, where a tuple is
included in the result only if a matching tuple exists in the other relation. For exam- ple,
in query Q8A, only employees who have a supervisor are included in the result; an
EMPLOYEE tuple whose value for Super_ssn is NULL is excluded. If the user
requires that all employees be included, an OUTER JOIN must be used explicitly (see
Section 6.4.4 for the definition of OUTER JOIN). In SQL, this is handled by
explicitly specifying the keyword OUTER JOIN in a joined table, as illustrated in
Q8B:
FROM EMPLOYEE;
If we want to get the preceding function values for employees of a specific depart-
ment—say, the ‘Research’ department—we can write Query 20, where the
EMPLOYEE tuples are restricted by the WHERE clause to those employees who
work for the ‘Research’ department.
Query 20. Find the sum of the salaries of all employees of the ‘Research’
department, as well as the maximum salary, the minimum salary, and the aver- age
salary in this department.
Q20: SELECT SUM (Salary), MAX (Salary), MIN (Salary), AVG (Salary)
FROM (EMPLOYEE JOIN DEPARTMENT ON Dno=Dnumber)
WHERE Dname=‘Research’;
Queries 21 and 22. Retrieve the total number of employees in the
company (Q21) and the number of employees in the ‘Research’
department (Q22).
Here the asterisk (*) refers to the rows (tuples), so COUNT (*) returns the number of
rows in the result of the query. We may also use the COUNT function to count values
in a column rather than tuples, as in the next example.
Query 23. Count the number of distinct salary values in the database.
Q23: SELECT COUNT (DISTINCT Salary)
FROM EMPLOYEE;
If we write COUNT(SALARY) instead of COUNT(DISTINCT SALARY) in
Q23, then duplicate values will not be eliminated. However, any tuples with NULL
for SALARY will not be counted. In general, NULL values are discarded when
aggregate func- tions are applied to a particular column (attribute).
The preceding examples summarize a whole relation (Q19, Q21, Q23) or a selected
subset of tuples (Q20, Q22), and hence all produce single tuples or single values.
They illustrate how functions are applied to retrieve a summary value or summary
tuple from the database. These functions can also be used in selection conditions
involving nested queries. We can specify a correlated nested query with an aggregate
function, and then use the nested query in the WHERE clause of an outer query. For
example, to retrieve the names of all employees who have two or more dependents
(Query 5), we can write the following:
Q5: SELECT Lname, Fname
FROM EMPLOYEE
FROM DEPENDENT
The correlated nested query counts the number of dependents that each employee has; if
this is greater than or equal to two, the employee tuple is selected.
FROM EMPLOYEE
GROUP BY Dno;
In Q24, the EMPLOYEE tuples are partitioned into groups—each group having
the same value for the grouping attribute Dno. Hence, each group contains the
employees who work in the same department. The COUNT and AVG functions are
applied to each such group of tuples. Notice that the SELECT clause includes only the
grouping attribute and the aggregate functions to be applied on each group of tuples.
Figure 5.1(a) illustrates how grouping works on Q24; it also shows the result of Q24.
Figure 5.1
Results of GROUP BY and HAVING. (a) Q24. (b) Q26.
If NULLs exist in the grouping attribute, then a separate group is created for all
tuples with a NULL value in the grouping attribute. For example, if the
EMPLOYEE table had some tuples that had NULL for the grouping attribute Dno,
there would be a separate group for those tuples in the result of Q24.
Query 25. For each project, retrieve the project number, the project name, and the
number of employees who work on that project.
Q25: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname;
Q25 shows how we can use a join condition in conjunction with GROUP BY. In
this case, the grouping and functions are applied after the joining of the two
relations. Sometimes we want to retrieve the values of these functions only for groups
that sat- isfy certain conditions. For example, suppose that we want to modify Query
25 so that only projects with more than two employees appear in the result. SQL
provides a HAVING clause, which can appear in conjunction with a GROUP BY
clause, for this purpose. HAVING provides a condition on the summary information
regarding the group of tuples associated with each value of the grouping attributes.
Only the groups that satisfy the condition are retrieved in the result of the query. This is
illus- trated by Query 26.
Query 26. For each project on which more than two employees work, retrieve the project number, the
project name, and the number of employees who work on the project.
Q26: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON
WHERE Pnumber=Pno
GROUP BY Pnumber, Pname
HAVING COUNT (*) > 2;
Notice that while selection conditions in the WHERE clause limit the tuples to which functions are applied,
the HAVING clause serves to choose whole groups. Figure 5.1(b) illustrates the use of HAVING and
displays the result of Q26.
Query 27. For each project, retrieve the project number, the project name, and the number of employees
from department 5 who work on the project.
Q27: SELECT Pnumber, Pname, COUNT (*)
FROM PROJECT, WORKS_ON, EMPLOYEE
WHERE Pnumber=Pno AND Ssn=Essn AND Dno=5
GROUP BY Pnumber, Pname;
Here we restrict the tuples in the relation (and hence the tuples in each group) to those that satisfy the
condition specified in the WHERE clause—namely, that they work in department number 5. Notice that
we must be extra careful when two dif- ferent conditions apply (one to the aggregate function in the
SELECT clause and another to the function in the HAVING clause). For example, suppose that we want
to count the total number of employees whose salaries exceed $40,000 in each department, but only for
departments where more than five employees work. Here, the condition (SALARY > 40000) applies only to
the COUNT function in the SELECT clause. Suppose that we write the following incorrect query:
SELECT Dname, COUNT (*)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber=Dno AND Salary>40000
GROUP BY Dname
This is incorrect because it will select only departments that have more than five employees who each
earn more than $40,000. The rule is that the WHERE clause is executed first, to select individual tuples
or joined tuples; the HAVING clause is applied later, to select individual groups of tuples. Hence, the
tuples are already restricted to employees who earn more than $40,000 before the function in the
HAVING clause is applied. One way to write this query correctly is to use a nested query, as shown in
Query 28.
Query 28. For each department that has more than five employees, retrieve the department number
and the number of its employees who are making more than $40,000.
Q28: SELECT Dnumber, COUNT (*)
FROM DEPARTMENT, EMPLOYEE
WHERE Dnumber=Dno AND Salary>40000 AND
( SELECT Dno
FROM EMPLOYEE
GROUP BY Dno