As Level Computer Application Databases
As Level Computer Application Databases
Computer Application
Databases
YLLSS
In the syllabus, we have
Concepts and Students should understand the following terminology and concepts:
terminology
data and information
data, fields, records, tables, files and databases
common data types such as integer, real, character, string, boolean,
date, etc.
indexes and keys
database management systems (e.g. data definition language, data
manipulation language, data dictionary, transaction processing
and access control, etc.)
program-data independence
data redundancy and data integrity
Basic concepts of a Students should know the basic concepts underpinning relational databases
relational database such as entity, relation, attribute, domain, primary key, foreign key,
candidate key, entity integrity, referential integrity, domain integrity, etc.
Students should be able to identify these basic elements in examples taken
from everyday applications.
Students should know how to organise data differently but sensibly in a
relational database and be able to establish the required relationships to link
up the tables.
Creating a relational Students should be able to create a simple relational database2 based on
database specified requirements using SQL.
Database maintenance Students should be able to use SQL to maintain a simple relational database,
and manipulation manipulate its data or retrieve the required information. They should be able
to:
1
use appropriate operators and expressions such as the in, between
and like operators, arithmetic operators and expressions, comparison
operators and logical operators etc. to perform specific operations
use simple built-in functions such as aggregate and string functions,
etc.
perform multiple field indexing and multi-level ordering
perform queries on multiple tables including the use of equi-join,
natural join and outer join
perform sub-queries (for 1 sub-level only)
export query results to, for example, text, html or spreadsheet
format, etc.
The conceptual data Students should understand the importance of good database design in
model effective database management. They should be aware of the three levels of
data abstraction; namely conceptual level, physical level and view level.
Introduction to Students should be able to briefly explain the meaning and purpose of
Normalisation normalisation. They should be aware of the methods or measures used to
reduce data redundancy
2
Introduction to Databases
Numbers, text, images or any recording in a form that is accessible to human beings are classified
as data. Data themselves have no meaning. It is only when data is interpreted then the data
content will become meaningful. Interpreted data are referred to as information. For example, the
number 33.5 tells us almost nothing. However when readers are told that the number stands for
the temperature in centigrade, the number makes sense to us. In this example, 33.5 is a piece of
data whereas 33.5 as a temperature in centigrade is a piece of information. Information is stored in
computers such that both its data value and interpretation will be recorded. In most cases,
interpretation of computer data is typically given by the corresponding data name.
In the context of databases (which will be elaborated in the next section) as well as in daily use, the
terms “information” and “data” are often used interchangeably although such a kind of confusion is
not desirable. In most cases, the interpretation of the term “data” should be clear from the context of
discussion. In the context of databases, “data” usually means “information”.
Each information system has a hierarchy of data organization, and each succeeding level in the
hierarchy is the result of combining the elements of the preceding level. The six levels are bits,
characters (bytes), fields (data elements), records, files, and data base (see Figure 1). A bit is a
binary digital which has a value of either 0 or 1. A byte is a composed of 8 ordered bits.
3
Question to ponder
Are byte and character types the same? (Answer: Not necessarily. This depends on the
underlying encoding scheme being adopted. Even an ASCII character may need more than
one byte to store in certain implementation of Unicode.)
Data Field/Element
A (data) field or data element is the lowest level “logical unit” in the data hierarchy that can be
interpreted in a meaningful way, e.g., “David” for a name, “23469345” for a phone number. The
maximum number of characters (not bytes) that a field can have is called field length. A field
may consist of a single character only, e.g., M(ale) and F(emale) for representing sex. How fine is
the granularity of a field is a user’s decision, e.g., we can treat an address as a single field or as an
aggregate of several fields such as flat-and-floor-number, street-number-and-street-name,
district, city and country, etc. The key concern is the application needs. If certain processing is
required to handle an address at city level, we of course need to divide the address field into its
components.
Record
A record is a logical group of related data fields describing an event or an item, e.g., a student
enrollment record consists of fields such as student-ID, student-name, programme-code,
module-code, date-of-enrollment, etc. A record is the lowest level logical unit that can be
accessed from a file. In other words, if one would like to access a data field within a record, the
whole record has to be retrieved first before the required data field is identified.
File
A logical file is composed of occurrences of records. A physical file is used to refer to a named
area on a secondary storage device that contains a program, a textual material, or even an image.
One logical file is not necessary mapped to one physical file and vice versa. For example, a
logical file may consist of an index area and a record area such that each of the areas is associated
with a separate physical file. End-users are usually concerned with logical files instead of physical
files.
Questions to ponder
Give an example that a physical file may contain more than one logical file.
4
Give an example that a logical file may be stored in multiple physical files.
Data Base
A data base is a collection of files that are logically related and integrated to one another so that
data redundancy is minimized or reduced. Data redundancy exists when a data field is stored in
more than one logical file. Data redundancy often cannot be eliminated entirely for various
reasons but it should be kept under control. Database management system is devised to control
the data redundancy problem by ideally storing every data item once and/or by propagating data
changes to all related record occurrences probably among a number of files so that data integrity
(which concerns the validity, accuracy and correctness of data) can be maintained. Database
management system is often referred to as DBMS, database or database system.
Teaching remark
In many books or online learning resources, the term “data base” is often incorrectly referred to
as “database”. It comes to a stage that people begin to use the two terms interchangeably. In
fact, the ASCA and ALCS Curriculum and Assessment Guides also use the terms
interchangeably.
Almost all computer applications require some data be kept for describing some inherently stable
properties or up-to-date status of certain items or events. Let us think about the information kept
by a bank for its saving account holders. For each saving account, the bank must store its unique
account number, name(s) of account holder(s), contact address(es) of account holder(s), account
balance, etc., to say a few. Those data are considered to be persistent data as they are not
changed frequently. However some data are more persistent than the others. For example, an
account number should never been changed whereas there is a slim chance that changes would be
required for the name(s) of the account holder(s). Account balance is most susceptible to change
among the pieces of listed data as transactions like money deposit or withdrawal will affect its value.
Obviously the correctness of all recorded persistent data is important to the functioning of the
associated computer applications.
Whether or not a piece of data is persistent varies from application to application. Age may not be
considered as a piece of persistent data as it changes every year for most people. However the age
field is definitely persistent if it appears on a death certificate.
5
Persistent data can be stored in file(s). However there are potential problems with that.
1. Since files are designed to fit individual application needs, a data element may appear in several
files if that piece of data is needed in several applications. For example, a bank customer may
open a saving account and a stock account at the same time. For the stock account, the account
balance is composed of the quantity of each stock purchased. Obviously at least two different
files are needed to keep data for the two types of accounts but data elements such as name(s) of
account holder(s) and contact address(es) of account holder(s), are common. When the customer
moves to a new address, both file are required to be updated. This is caused by the data
redundancy problem. Data redundancy can cause a number of problems during data
modifications; those problems are referred to as data anomalies (which will be detailed later).
2. A consequence of data redundancy problem is integrity problem or data consistency problem.
Data become inconsistent if copies of data are not updated simultaneously.
3. Traditional file systems suffer from sharing problem and security problem. If a new report
which needs to use some but not all data from two files is required, should one be allowed to
access both files? As access control on a file system can only be made at the file level, allowing
someone to read both files implies unnecessary exposure of data. If a new file is created to
store all data needed to produce the new report, data redundancy problem emerges.
4. Structural dependence (also known as program-data dependence) exhibits in file systems.
In order to use a file, a program needs to know the file structure, i.e., details of all data stored in
the file. A change in any file’s structure requires the modification of all programs using that
file.
6
Some databases may not be able to achieve all the above aims. Early databases may not support
transaction processing or offer secured data access for concurrent users.
A key advantage of database is that the end-users and application programmers do not have to know
how data files are organized and stored in the database. This is referred to as the structural
independence (or program-data independence). Thus changing the structure of a file does not
necessarily require computer programs that access the file be modified. Databases achieve
structural independence by organizing data through advanced data structures in which the data
fields and records are related to each other. Computer programs do not access files for data.
Instead, computer programs that need accessing data have to direct their requests to the DBMS
which in turn processes the requests against the data base; in other words, all operations on the data
base are coordinated by the DBMS. Figure 2 describes the interactions between different parties
in a database environment.
Almost all computer applications need to use database to store persistent data. In a library system,
at least the following data need to be kept.
Library user ID number
Library user name
Library user contact address
7
Maximum number of books that a library user can borrow
Library user ID number, book’s call number and due date of the loan period for each book
which is on loan
Author name(s), publisher, year of publication, and status (e.g, on-loan, on-hold, on-request, and
missing, etc.) of each book.
The above information must be kept in order to support basic library operations like book search,
borrowing and return, etc.
Databases not only support day-to-day operations of organizations only. Applications can be built
to analyze historical data in databases for planning purpose. Banks use various types of customer
information such as account balances, salary information, saving patterns, credit card repayment
patterns, mortgage repayment patterns to create their customers’ profiles. Customer details like
occupation, age and marital status are recorded too. Such information is stored in databases and
would be analyzed so as to enable the banks to identify potential customers for specific products,
e.g., fund investment and insurance. Such a kind of database applications is known as data
mining which analyzes data in databases to look for data trends or anomalies without the
knowledge of the meaning of the data.
8
The amount of operational data would be too much for management staff to digest. Besides, the
data would be too raw for them to make management decisions. In practice, operational data are
typically summarized (and stored in a data warehouse sometimes) before they are presented to the
management. All mentioned data, no matter in a raw or digested form, are stored in some form of
database.
Types of Databases
There are many ways to classify databases and two of them are listed below.
Number of Users
Many databases designed to run on personal computers are expected to be used by one user at a
time. We usually referred them as single-user databases. Earlier versions of Microsoft FoxPro
and Access belong to such a type.
More sophisticated databases like MySQL, Microsoft SQL Server, IBM DB2 and Oracle are called
multi-user databases as they have built-in facilities for secured and concurrent data access.
Location
Reference
Silberschatz, A., Korth, H.F., & Sudarshan, S. (1997). Database System Concepts (3rd ed.).
McGraw Hill.
9
Data Models
A data model is a collection of logical constructs used to represent data structure, data semantics
and data relationships found within the database. Database models can be conceptual or
implementation oriented. Conceptual data models are used to describe data at the logical and
(user) view levels. It offers no description about the implementation issues. Conceptual models
are often used as a communication tool between database designers and end-users so as to help the
designers understand the data requirements of the end-users correctly. The entity-relationship
model is an instance of conceptual data model. Another type of data model provides a high-level
description of the implementation. Three popular implementation models are hierarchical,
network and relational models. Note that the problem of structural dependence in both
hierarchical and network models is resolved in the relational model.
10
Relational Database Concepts
Introduction
In this section, basic relational database terminology and concepts will be introduced. The
definitions and characteristics of entity, relation, attribute, domain and key, etc., are detailed. In
particular, the difference between keys and indexes, and three concepts about data integrity, namely
entity integrity, referential integrity and domain integrity, are explained. In order to help explain
the above terminology and concepts, a problem scenario about a school library is introduced as
below:
The library of XYZ School has decided to computerize its services so as to make them
more efficient and effective. Since computerization is relatively new to the school, the
library aims to provide only basic library functions to the users initially through the
implementation of a simple computerized library system. The system is expected to
offer a computerized catalogue of all library items, e.g., books and past examination
papers, and basic circulation functions such as item borrowing, returning and reserving.
Obviously the system needs to keep library user information such as the number of
library items that s/he is allowed to borrow, dates and call numbers of those library
items that s/he has borrowed, or requested, etc. Library item details such as its call
number, author(s), ISBN, year of publication and status (e.g., available, on loan,
requested and damaged), etc., are also kept.
As a teacher librarian of the school, you are asked to design a suitable database
schema to support the mentioned library operations.
Whenever applicable, examples will be provided in relation to the above problem scenario so as to
provide a clear context for illustrating the database terminology and concepts.
An entity is a distinguishable object to be described. It can be any object such as a person, a place,
an event or a thing, etc. Entities that share the same properties or attributes are collectively
referred to as an entity set (or entity type). Example entity sets that can be found in a school
environment are students (person), classrooms (place), examinations (event), and subjects (thing),
etc.
11
Entity sets in the XYZ School library example:
o Suppose Linus and Jeff are students, they are entities (library users) because they share
properties of a student and are distinguishable objects in a school library system.
o Library users who may be teachers or students (person), library items (things), circulation
transactions such as a book request (event), and user privilege (things) etc.
In relational database, an entity set is typically represented in terms of one or more relations (a
mathematical term for tables), with each of which being composed of rows and columns. Each
tuple (a mathematical term for rows in a table) in a relation represents an entity of the associated
entity set. Each column, which is uniquely named within the table that it is associated with,
represents a category of information that corresponds to an attribute. A relational database is
typically composed of a number of related tables. Note that the order of the rows and columns
within a table is immaterial to the database.
As shown in the table below, the “user privilege” entity set of XYZ School library example is
composed of 6 rows with each row defining the privilege of a user type for a given material type.
12
column (attribute)
row (tuple)
Attributes
Attribute and Domain
Each entity has certain descriptive properties known as attributes (or fields). Some potential
attributes for the student entity are student-name, student-number, and sex, etc.
Attributes in the XYZ School library example:
o student ID, class name (in the “library usesr” entity set)
o call number, material type (e.g., CD-ROM, book), item name (e.g., book title)
The set of all possible values for an attribute is called its domain. For the student entity set, the
domain of the attribute sex should be {female, male} whereas the domain of the attribute age
should be any positive integer (although it may make more sense by setting an upper bound for the
domain).
The relational database theory does not restrict what data type that an attribute can associate with.
However, some commonly supported data types in relational database are:
Number (integer or real number)
Text (fixed length or variable length)
Boolean type
13
Date and time
Attributes that cannot be divided into subparts are known as simple attributes (e.g., age);
otherwise they are composite attributes (e.g., address). Whether there is a need to re-structure
an attribute to finer attributes depends on the application needs. In the XYZ School library
example, the library user name is represented as a composite attribute as it is not further divided
into simpler attributes such as first-name and surname. Such a representation does not cause any
problem as the library does not have any need of processing library information in accordance with
its user’s first-name or surname. To facilitate detailed queries (for the future), many database
designers prefer to change a composite attribute into a series of simple attributes.
Null Attributes
It is possible to use a null as the value of an attribute of an entity. For example, the value of the
ISBN field will be set to null for past examination papers but a valid ISBN is needed for most
books.
Derived Attributes
In some occasions, the value of an attribute can be derived from other related attributes or entities.
Such a kind of attributes is referred to as derived attribute. Suppose a database keeps an
employee table to store employee information like employee-number, employee-name and
number-of-dependents, and a dependent table to record information of each employee’s
dependent in a separate row. In this case, the number-of-dependents attribute in the employee
table is a derived attribute as its value is equal to the number of associated rows in the dependent
table.
In a good database design, integrity constraint (which will be detailed later) should be defined
between derived attributes and their base attributes in order to ensure that an update of the value of
any base attribute will trigger a corresponding update of any associated derived attributes.
Otherwise, data inconsistency will occur.
Intuitively, we should eliminate all derived attributes of a database because their values, if required,
can be computed in real-time. However the use of derived attributes can improve the efficiency of
a database. In the XYZ School library example, it is better to have (derived) attributes to record
the number of times that a teacher does not return borrowed items to the library on time and the
cumulative number of days overdue although those pieces of information can be derived from the
teacher’s circulation records history. The use of derived attributes in this example can greater
14
enhances the database efficiency when compared to rescanning all past circulation records of a
teacher for computing the required information. In this example, the computational effort for
maintaining the integrity of the values of the derived attributes and their base attribute values is
small.
15
Keys
A key is a value of one or more selected attributes used to identify an entity in an entity set.
The concerned attribute(s) is/are known as the key field(s). A potential key field of the
“library user” entity set of the XYZ School library example is the “library user ID” which is
unique for each library user.
A superkey is a set of one or more attributes that, taken collectively, uniquely identify an entity
in an entity set. However, a superkey may contain extraneous attributes. In the “user
privilege” table of the XYZ School library example, all of the following combinations of
attributes are superkeys
o “User type” and “Type of material”
o “User type”, “Type of material” and “Loan period
o “User type”, “Type of material”, “Loan period”, and “Total number of items that can
be borrowed”
o “Description” and “Type of material”
o “Description”, “Type of material” and “Loan period.
Once the values of any of the above attribute combinations are given, we can always uniquely
identify an entity (row) in an entity set (table).
because giving the values of any of the above attribute combinations, more than one entity (row)
may be identified.
Teaching remarks
The identification of superkeys for a table must be based on the semantics of the attributes
of the table instead of the table content. In “the “user privilege” table of the XYZ School
library example (see Table 2), it appears that giving the values of the “Loan Period” and
“Total number of items that can be borrowed”, a unique entity (row) can be identified and
thus the two attributes, when combined, can be taken as a superkey. However this is
misleading. Suppose school alumni are allowed to use the library and they are allowed to
borrow up to 3 books for a maximum of 14 days. This obviously makes the “Loan Period”
and “Total number of items that can be borrowed” no longer a superkey as a junior student
is also allowed to borrow the same number of books for the same loan period.
In reality, teachers as well as textbooks often use table contents to explain the concept of
key (and normalization, which will be covered later). Teachers must indicate to students
their assumption that the table contents give an exhaustive illustration of the table semantics.
16
Minimal superkeys are called candidate keys. Removal of any attribute in a candidate key
will render the remaining attribute(s) no longer a key. In the “user privilege” table of the XYZ
School library example, all of the following combinations of attributes are candidate keys
o “User type” and “Type of material”
o “Description” and “Type of material”
In the above example, it clearly shows that it is okay for a table to have more than one candidate
key. However multiple candidate keys in a table might imply the existence of transitive
dependency in the table. Transitive dependency is an indicator of poor database design and
should be avoided. The notion of transitive dependency will be introduced when introducing
the notion of database normalization”.
Teaching remark
Like superkeys, the identification of candidate keys for a table must NOT base on the table
content, but the semantics of the attributes of the table.
A primary key is a candidate key chosen by the database designer as the major means of
identifying an entity (row) within an entity set (table). No part of a primary key can be null.
Unlike the candidate key, a table can only have one primary key.
Teaching remark
Some textbooks in the market may have given an imprecise definition of candidate key and
primary key. In one textbook, a primary key is defined as a field or combination of fields
that uniquely and minimally identify a particular record in a table. According to this
definition, it is possible that a table would have more than one primary key but this is
obviously incorrect. The definition given in the book in fact describes a candidate key
rather than a primary key.
Any attribute which is not a part of any candidate key is known as a non-key attribute. In the
XYZ School library example, the loan-period is a non-key attribute.
A foreign key is either null or not a superkey in its own table but a candidate key in another
table. Suppose we have two tables, namely student-subject and subject which store the
subjects that a student has enrolled and the subject description respectively. The
student-subject table records student-ID (a part of the primary key) and subject-ID (another
part of the primary key) whereas the subject table stores subject-ID (primary key) and
subject-descriptor. The subject-ID in the student-subject table is a foreign key to the
subject table.
17
student-subject subject
student-ID subject-ID subject-ID subject-descriptor
200425854 CS1145
Teaching remarks
It is wrong to say the subject-ID in the student-subject table is a foreign key. The notion
of foreign key is defined on two tables.
Many textbooks do not explicitly state that the value of a foreign key can be null.
Indexes
One or more indexes can be defined for a table for efficient data retrieval. Unlike primary key,
an index does not have to be unique. Whether or not an index is required for a table depends
on the application needs. Inclusion or omission of an index in a table definition may affect the
efficiency, but not the functionality, of any data retrieval.
An index is an implementation structure such that given one or more attribute values, relevant
rows can be efficiently retrieved. It is typically implemented through the use of sophisticated
data structures like ISAM and B+ trees.
Common mistakes
Some people may use the terms “index” and “secondary key” interchangeably but this should be
avoided. Keys are logical concepts whereas indexes are implementation concepts. In fact,
there is no notion of “secondary key” or “index” in relational database theory.
Teaching remarks
Most relational databases create an index for the primary key of each table for efficient data
retrieval.
Although indexing can facilitate efficient data retrieval, it should not be overused. Index
creation and maintenance may involve a lot of computations that take time to finish.
18
Data Integrity
As mentioned before, data integrity is concerned with the validity, accuracy and correctness of data.
In relational database, three type of data integrity are of particular concerns. They are entity
integrity, domain integrity and referential integrity.
Entity Integrity
Note that condition 1 must be enforced or a primary key will not be able to uniquely identify an
entity (a row) in an entity set (a table). As an example, the “user privilege” table does meet the
criteria of entity integrity.
Domain Integrity
Domain integrity is a property that ensures that whenever a new data item is entered into the
database, it must be within the domain of the corresponding attribute. For instance, the
enforcement of domain constraint can stop one from entering a value other than “female” or
“male” to the sex attribute.
Referential Integrity
Referential integrity is concerned with the data consistency between coupled tables. In particular,
we may want to ensure that an attribute value that appears in one table also appears for a certain set
of attributes in another table. For example, the XYZ School library database may keep one table
to store library user personal information like user-ID, user-name, and contact-address, etc. and
another table to keep information about loaned books like user-ID, book-call-number, and due
date, etc. The user-ID is the primary key of the library-user-details table whereas the
concatenation of user-ID and book-call-number forms the primary key of the loaned-book table.
The user-ID attribute of the loaned-book table is a foregin key to the library-user-details table (as
user-ID is not a superkey in the loaned-book table but a candidate key in the library-user-details
table). Obviously, it is important to ensure that any value appeared in the user-ID attribute of the
loaned-book table also appears in the user-ID attribute of the library-user-details table.
19
the table that the foreign key relationship is referred to. Thus, deleting a row that contains a value
referred to by a foreign key in another table would break referential integrity. In the XYZ School
library example, this is equivalent to removing a library user from the library-user-details table
without demanding the user to return all books that s/he has borrowed. More examples about
referential integrity can be found here.
It is important to note that a referential constraint may not enable us to avoid errors at the database
design level. The following example illustrates such a problem.
The table on the left stores ID numbers and names of all library users whereas the table on the right
keeps all loaned books. ID and call number are the primary keys of the library user and loan
event tables respectively. user ID in the loan event table is a foreign key to the library user
table. According to the definition of foreign key, it is acceptable to assign a null value to user ID
as found in third record of the loan event table. This obviously does not make sense from a user
perspective to allow a book being loaned to an unknown person but the referential constraint setting
between the two tables does not stop the assignment of null to user ID. To avoid the problem, we
need to make user ID in the loan event table a mandatory attribute.
Teaching remark
SQL92 and SQL99 provides standard features to define constraints for modeling various data
integrity constraints but many commercial database management systems such as Microsoft
Access tend to provide non-standard customized features to serve the purpose. Such details
are not within the curriculum and will not be further discussed here.
20
Introduction to Database Design Methodology
View level or external level is concerned with how individual users see the data. Note that a
user is may range from application programmers to casual users who interact with the database
with ad-hoc query facilities. For example, a library user may be interested in the library
collection but not the library user statistics. The librarian would not be expected to have any
interest in the information about individual library user’s reading habit.
Conceptual level is concerned with a community user view of the entire information content of
the database that is of interest to the organization. In this level, no physical consideration is
considered. A change in the internal view to improve performance may not involve any
change in the conceptual view of the database.
Physical level or internal level is concerned with how data is actually stored. Efficiency is the
prime concern at this level. The following aspects, among others, are considered at this level:
1. Data structures chosen, e.g. B-trees, hashing, etc.
2. Access paths, e.g. specification of primary and secondary keys, indexes and pointers and
sequencing.
3. Miscellaneous issues, e.g. data encryption and compression techniques.
The key advantage of the three-level database architecture is that it separates (1) the conceptual
view from the physical view, and (2) the external views from the conceptual view. The former
enables a database designer to provide a logical description of the database without the need to
specify physical structures. This is often called physical data independence. The latter enables a
21
database designer to change the conceptual view without affecting the external views in most cases.
This separation is sometimes called logical data independence. Readers may click here for a
more detailed discussion of the three levels.
Entity
An entity is a representation of any composite information of a real object (e.g., a bank
customer) or an abstract object (e.g., a money withdrawal transaction of a bank).
o Entities encapsulate data only, i.e., an entity is described only by its associated attributes.
How its attributes will be manipulated is out of the scope of the entity. For example, an
entity about a money withdrawal transaction of a bank is concerned with what amount of
money being taken out from which account on a particular date. How those recorded data
may be used for various purposes are immaterial from the logical data modeling perspective.
Entities may be related to one another, e.g., a bank customer may perform a number of money
withdrawal transactions over a given period.
The ERD notation for entities is a rectangle. A STUDENT entity is represented below.
22
Figure 2. Notation for entities (rectangle).
Attribute
Attributes define the properties of an entity so as to
o name an instance of an entity
o describe the instance
o make reference to another instance
The ERD notation for attributes is an oval. An attribute is linked to the associated entity by a
line or two lines depending whether or not the attribute is a multi-valued attribute. Suppose
the previously mentioned STUDENT entity has two attributes only – name and address. The
corresponding ERD representation is given below.
The above example assumes that every student has exactly one name and one address. For a
student that has more than one address, the corresponding ERD representation is as follows:
Figure 4. Notation for multi-value attributes (oval connected to a rectangle with double lines).
If an attribute is the (primary) key or a component of the primary key of an entity, the attribute
name may be underlined. Assuming each student has a unique name, the corresponding ERD
representation is changed as below.
23
Figure 5. Notation for multi-valued attributes (oval connected to a rectangle with double lines).
Teaching remarks
Apparently the A/AS Level curricula do not require students to be familiar with how
multi-valued attribute be drawn in an ERD.
In reality, it would be tedious to show attributes of entities in an ERD due to space limitation.
Besides, the attributes associated with a selected entity are usually clear from the context.
Thus attributes of entities are typically omitted in an ERD.
Relationship
Relationships are links connecting to entities that define the relationships of the entities. There may
be more than one relationship between two (or more) entities, e.g., customers open accounts,
customers close accounts in which open and close are relationships between the customer and
account entity sets. Note that an entity may have a reflexive relationship with itself, e.g., the work
supervisor of an employee of a company is also an employee of the company.
Although a relationship can be classified by its degree, cardinality, connectivity, direction, type, and
existence, etc., not all modeling methodologies use all these classifications. This package will
only focus the discussion in degree, cardinality, and existence.
The ERD notation for relationship is a diamond with the name of the relationship as the label of the
shape. The following ERD says a teacher would mark assignment.
Another occasionally used notation for relationship is to get rid of the diamond and simply put the
relationship name as a label of the line that represents the relationship. The previous example is
now depicted as follows:
24
Figure 7. An alternative notation for relationships (line directly connected to associated entities).
Although in most cases relationships are not associated with any attributes, it is possible that
attributes may be required to describe some relationships. Suppose we have a relationship called
borrow which relates the Student and Book. Obviously we need to keep the due date for return for
each book on loan. The information can only be attached to the borrow relationship as it is not an
attribute of Student or Book. In some literature, such a type of relationship is referred to as
associative entity.
Teaching remark
In the Curriculum and Assessment Guide (C&A guide), no associative entity is mentioned.
However, associating attributes to relationship is very common in practice and the concept
should be covered. Although the literature usually introduces a separate notation for
associative entity, the C&A guide does not provide any for it. Having said that, we may
simply associate attribute(s) to a relationship to capture the essence of an associative entity.
So far, all the above examples do not offer us any information to answer the following questions.
Would an assignment be marked by more than one teacher? What are the minimum and
maximum numbers of teachers to mark an assignment?
Would a teacher mark more than one assignment? What are the minimum and maximum
numbers of assignments that a teacher needs to mark?
Would there be any teacher who does not need to mark any assignment? Is it acceptable to
have any unmarked assignment?
In order to answer the above questions, we need to know additional properties of the relationship.
Degree
The degree of a relationship is the number of entity sets associated with the relationship. Most
relationships are binary relationship where the degree is two but ternary relationship that involves
three entity sets can be found occasionally, e.g., teachers teach subjects to students. An n-ary
relationship is a relationship with degree n.
Many modeling approaches typically deal with binary relationships only. Ternary or n-ary
relationships are typically decomposed into two or more binary relationships. Thus this e-learning
package focuses its discussion on binary relationship only.
25
Cardinality and Existence (or Modality)
Cardinality defines the actual number of entities that must be included in a relationship.
Cardinality information can be divided into two types – minimum cardinality and maximum
cardinality. Data modeling concerns whether or not the minimum cardinality is zero and whether
or not the maximum cardinality is greater than one, i.e., one (1) or many (n or m), as such
information will affect how a data model is translated into a data schema, i.e., database design.
Existence or modality denotes whether the existence of an entity instance is dependent upon the
existence of another related entity instance. The existence of an entity in a relationship is defined as
either mandatory if every instance of the entity involves in that relationship. Otherwise, the
existence of an entity in a relationship is defined as optional. It is clear that the minimum
cardinality of an entity that has an optional existence must be zero. Conversely, a mandatory
existence of an entity in a relationship implies that the minimum cardinality of the entity in the
relationship is a positive integer.
The minimum cardinality and maximum cardinality of the relationship are 0 and 1 respectively as a
man (or woman) may be married to no woman (or man) and the maximum number of women (or
men) that a man (or woman) can be married to is one.
Obviously, both entities (Man and Woman) optionally participate in the marry relationship. Thus
the existence of both entities in the relationship is optional. In ERD, a small circle is added on the
line that joins an entity and a related relationship if the existence of the entity in the relationship is
optional. An ERD that represents the connectivity and existence information of the marry
relationship is given below.
26
Example 2 - Mother give-birth-to Child
As a mother may give birth to at least one child, the corresponding ERD representation is as
follows:
The minimum cardinality and maximum cardinality of the relationship are 1 and n (where n is a
positive number) respectively as a mother must have one or more children. Regarding the
existence of the entities in the relationship, both Mother and Child must involve in the relationship
as every mother must have at least a child whereas every child must have a mother.
Assuming a normal school setting in which all teachers and students are involved in the teach
relationship, the relationship is of the many-to-many (m:n) type as many teachers would teach a
student whereas a teacher would teach many students. According to the assumption, the minimum
cardinality of the relationship is one. The maximum cardinality of the relationship is many. The
corresponding ERD representation is as follows:
27
If there exists some teacher who is taking a study leave and thus does not teach any student, the
above ERD will become:
The last example shows that an ERD can be correctly constructed only if all data requirements are
collected. Any missing requirement may result in an inaccurate data model, which in turn would
mislead a database designer to create an incorrect data schema. Thus, it is important for a database
designer to confirm with the end-users that all data requirements are correctly captured, typically
with the use of ERD as a communication tool. To enable an effective communication between the
two parties, the database designer must teach the end-users how to read an ERD.
Although the steps are presented in a linear manner, the process of database design is usually
iterative, i.e., some steps may need to be repeated before a final design results. This note will only
cover the first three steps. Steps 4-5 are straightforward to follow whereas Steps 6-8 requires a
28
more elaborated discussion which is definitely out of the scope of the current curricula of the A/AS
level computer subjects. In order to explain how a data model can be developed in accordance
with the suggested steps, a problem scenario about a bookstore is given below and illustrations in
light of the example will be given as far as possible.
ABC Bookstore is planning to automate its inventory, enquiry, sales and purchasing
functions by introducing a database management system. The inventory system will
keep track of the stock level of each book title. The sales system will keep track of the
details of each sales order (which is supposed to be of cash sales type only). A sales
order may involve multiple titles of any given quantities. When the inventory of a title
drops below a re-order level, a pre-determined re-order quantity for that title must be
ordered from the supplier of that book title. Each book title is assumed to be supplied
by one publisher only and a publisher may supply multiple book titles.
At the end of each day, the purchasing system will be run to compile a number of
purchase orders detailing the book titles, quantities needed from each publisher. Note
that all book titles to be re-ordered from the same publisher must be grouped into a
single order. Sales details will be removed from the sales system 6 months after the
sale. Details of purchase orders will be removed from the purchasing system 6 months
after the purchase orders are fulfilled. A purchase order is fulfilled when all the items in
the orders are delivered. For simplicity, we assume no partially fulfilled orders.
Concerning the enquiry function, the database should support enquiries based on author
name and book titles.
Teaching remarks
Developing an ERD from a problem description is not easy at all and it requires a lot of
expertise. Many learners find it difficult to learn the skill because without any expert’s advice
or comment, they do not know whether the ERD that they have developed is correct or not.
Thus teachers must be prepared to give a lot of feedback to students when teaching the topic.
To help student to learn the skill, give them very simple problems (that can be described in no
more than three sentences) to start with.
Developing an ERD typically begins with a general description of the organization’s operations and
procedures obtained during the requirements analysis. The purpose is
29
While it is easy to define the basic construct of the ER model, it is not easy to distinguish their roles
in building the data model. Should a data object be modeled as an entity or attribute? In the ABC
Bookstore example, apparently a book title has attributes like author(s), ISBN, publisher, and year
of publication, etc. It is also possible to model author as a separate entity. The correct answer
usually depends upon the requirements of the data base. Generally, the following guidelines are
adopted.
Entities contain descriptive information and they represent many things which share properties.
It is unlikely that an entity set/type would associate with no description information or have one
instance only.
Attributes identify (i.e., an identifier), describe entities, or make reference to other entity
instances.
Relationships are associations between entities.
In order to identify all potential entities and attributes, all nouns (or noun phrase) in the problem
description are singled out. Both entities and attributes tend to be associated with those descriptive
noun phrases. If there is no descriptive information associated with a noun phrase, it is unlikely
to be an entity.
As we will be able to see soon, some of the above nouns/noun phrases are in fact irrelevant whereas
some additional data not appeared in the problem description are needed to be added to the data
model.
Several guidelines can help learners identify candidates of entities and attributes.
It is unlikely that an entity set/type would associate with no description information or
have one instance only. For example, there is only one instance of ABC Bookstore. It is
thus unlikely to be an entity set/type. It is not an attribute too. In fact the bookshop offers a
context for the problem scenario and all entities and relationships are under its umbrella.
Another example is “database management system”.
30
Some general terms like “system” can usually be safely removed while some other general
terms like “details” may need to be elaborated.
A problem description may not be complete. Some data that need to be modeled may be
omitted. It is important for the learners to detect such a kind of omission and put the
omitted data objects back to the data model. For example, the dates of the sales and
purchase orders have never been mentioned explicitly in the problem description but it is clear
that they must be kept in the database. Another omission is publisher’s details like contact
information.
Some descriptions may be related to the processing aspect instead of the data aspect of the
application and they can be safely skipped when developing a data model. For example,
the second paragraph of the problem description gives details of the processing requirements,
i.e., how data should be processed to give results that users want. Basically what it says is that
programs need to be run (1) to support the enquiry function; (2) to produce purchase orders; and
(3) to remove old purchase and sales orders details from the database. The three mentioned
functions rely mostly on data already stored in the database and require only a few new data to
support those functions, e.g., purchase order fulfillment date.
In reality, end-users are often approached by database designers to clarify data requirements when
developing a database.
31
Teaching remarks
Try not to judge the correctness of a list of entities (and perhaps attributes too) from a problem
description that the students pass to you too soon as it will be difficult to know whether or not
their answer is correct without examining the whole ERD.
It is a good idea to identify potential entities and attributes before proceeding to the
identification of relationships. Relationships link entities and thus we can focus to find
verbs/verb phrases that link the potential entities.
Many printed and online resources would suggest identify potential relationships by identifying
verb (phrases) from the problem description. However such a method does not work well in many
cases. For example, some verbs or verb phrases that we have identified from the problem
description are as follows:
It is not easy to see how they can hint at the identification of valid relationships. We propose the
following steps to identify relationships and they are found to be particularly useful in dealing with
small problems.
32
Earlier on, we have identified four entities: Book, Publisher, Sales order, and Purchase order.
The potential relationships among them are as follows.
No obvious relationship can be identified between Sales order and Purchase order, and Publisher
and Sales order.
The initial ERD aims to provide a pictorial representation of the major entities, and the relationships
between them. Cardinality of each relationship is required to be shown. The initial ERD for the
ABC Bookstore example could be as follows:
Figure 13 gives the initial ERD if the inventory information of book title is modeled as a separate
entity.
33
Figure 13. An alternative initial ERD for ABC Bookstore.
No attributes are shown in the ERD above for simplicity. In practice, details of entities are shown
in a separate document called data object description. The document typically contains the name
of each entity and purpose, name and data type of each attribute for every entity, as well as the
attribute characteristics such as whether its value is unique and/or mandatory, etc.
Check whether the initial ERD meets any users requirements specified in the problem description.
If not, identify the inadequacy and propose new entity, attributes and/or relationships and redraw
the ERD. For example, one may leave the order fulfillment date in the Purchase order entity in
the initial ERD but such as omission can be identified when checking whether the initial ERD be
able to meet the users requirements specified in the problem description. The ERD given in
Figure 12 (or the one in Figure 13) appears to be able to meet all users requirements and thus will
not be refined further.
34
Converting ERD to Database Tables
ERD is a result of data analysis and it must be used in the data design process to help generate data
schema. A basic 3-rule conversion process can be applied to translate an ERD into a data schema
that meets the criteria of the third normal form (which will be detailed later). We refer the
conversion process to as the basic conversion process.
1. For a 1:1 cardinality relationship, all the attributes of the related entities are grouped into a
single table.
2. For a 1:n cardinality relationship, model each of the related entities in a separate table and post
the primary key of the “one” side entity as an (foreign key) attribute to the table that represents
the “many” side entity.
3. For an m:n cardinality relationship, model each of the related entities in a separate table and
create a new table (which is referred to as the intersection table) and post the primary key of
each entity set/type as an attribute in the new table. If the relationship has its own attributes,
those attributes are to be stored in the intersection table too. The primary key of the
intersection table is a composite key which includes the primary key of each concerned entity
type.
In the ABC Bookstore example, if an Inventory entity is introduced for representing the inventory
information of book title (Book), we will have the following relationship.
The relationship indicates that each book title is associated with exactly one piece of inventory
information and vice versa. Since the relationship is of 1:1 type, all attributes of the entities will
be stored in the same table according to the first rule of the basic conversion process. As a result,
the attributes to be stored in the resultant table will be exactly the same as the table corresponding to
the Book entity in the original ERD. They are ISBN (unique for each book), book title, author(s),
unit price, stock level, re-order level, and re-order quantity. This explains why various ERDs may
lead to the same data schema.
35
For ease of reference, the attributes of a table are shown in the following notation.
Note that all author names of a book title are assumed to be stored in the author field. Besides,
more attribute(s) will be added to the above Book table as we deal the relationship between the
Book and Publisher entities.
In the ABC Bookstore example, we have the following relationship that links the Publisher and
Purchase order entities.
The relationship indicates that a publisher may be associated with any number of purchase orders
(zero to many) whereas each purchase order is associated with exactly one publisher (as each
purchase order will only be placed to one publisher). According to the second rule of the basic
conversion process, the primary key (or identifier) of the Publisher entity must be posed to the table
that represents the Purchase order entity. The resultants tables for representing the relationship
will be as follows:
Note that the ISBN and quantity of each book title being specified in a purchase order are excluded
from the Purchase_order table as there exists an m:n relationship between the purchase order and
book title entities. Such attributes need to be housed in a separate table as illustrated in the next
example.
36
In the ABC Bookstore example, we have the following relationship that links the Book and
Purchase order entities.
The relationship indicates that a book title may be associated with any number of purchase orders
(zero to many) whereas each purchase order is associated with at least one book title. According
to the third rule of the basic conversion process, the primary keys of both the Book and Purchase
order entities must be posed to a new table, i.e. the intersection table, to link to the tables that
represent the concerned entities. The resultants tables for representing the relationship will be as
follows:
After applying the 3 rules specified in the basic conversion process, we can obtain the data schema
for the ABC Bookstore example as follows:
Note that the Book table has been added with a new field, publisher_name, after considering the
relationship between the Book and Publisher entities. This illustrates that the definition of a table
will not be finalized until all relationships connected to the entity concerned are considered.
37
Drawback of Basic Conversion Process
The basic conversion process does not guarantee that null attribute values are minimized and
problems may occur for entities with optional occurrences. Suppose a school has a number of
lockers at different buildings and each student is entitled to have one locker on request. Due to the
uneven demand of lockers at different buildings, some lockers are unused whereas some students
are assigned to no locker. The relationship is given below.
Since the relationship is of 1:1 type, we may put all attributes of the two entities together into one
single table. Assuming the attributes of the Student and Locker entities are:
Both of the above table structures are problematic. In the first table structure, lockers that are not
assigned to any students cannot be represented. In the second table structure, students that are not
assigned to any lockers cannot be represented.
The problem illustrated in the last example can be overcome by introducing another rule to the basic
conversion process and we refer the augmented process to as the optional-max conversion process.
The rules to be applied in the new process are as follows:
1. For every instance where the lower cardinality bound is zero and the upper cardinality bound is
one, temporarily label the upper cardinality bound of as n, i.e., many.
2. Apply the basic conversion process as usual.
After applying rule 1 of the optional-max conversion process, the relationship becomes
38
Figure 18. The Student is-assigned-to Locker relationship after the
first rule of the optional-max conversion process is applied
Now the relationship is considered as of an m:n type and will be modeled by three tables according
to the third rule of the basic conversion process. The resultant tables are:
With the new table structures, details of both empty lockers and students who are not given any
lockers can be represented.
Teaching remark
One may suggest handling the relationship as 1:m type. This will result in two tables, either
with the student_ID posed to the Locker table or the locker_ID posed to the Student table. The
proposed table structures can represent empty lockers and students who are not given any
lockers too. However the proposal will result in null entries in at least one table. In situations
that involve associative entity (i.e. relationship with attribute), more null entries would be
resulted. For example, if the date that a locker is assigned to a student is to be recorded, the
field will be null for an unassigned locker should we treat the relationship as 1:m type. The
optional-max conversion process provides a more resilient solution to the problem as the date
field will be kept in the intersection table, i.e., the Assign table.
39
Introduction to Normalization
Normalization is a database design technique based on analyzing relations among key and non-key
attributes of database tables. This technique includes a series of rules or steps to normalize the
database into a number of tables depending on the degree of normalization that one wants to
achieve. The database design compliant to those rules correspond to a specific normal form such as
first normal form (1NF), second normal form (2NF) and third normal form (3NF), …, etc. Despite
the existence of higher normal forms, only the 1NF, 2NF and 3NF will be covered. Higher normal
forms imply a data schema with more tables and querying such a database would involve more
efforts in “joining” tables together. In practice, most database designers generate data schemata
normalized to 3NF in order to strike for a balance between maintainability and efficiency. Readers
who are interested to have an overview of various normal forms (from 1NF to 6NF) may visit
Wikipedia’s page on database normalization.
Why Normalization
The main purpose of normalization is to minimize data redundancy and anomalies. In the following
section, we will show the problem of data redundancy and update anomalies through a problem
scenario.
Data Anomaly
Data anomaly refers to the unexpected phenomena that occur when updating a database that
exhibits data redundancy. There are several types of data anomaly – insertion, deletion and
modification (or update) anomalies.
Insertion Anomaly
Could we record insertion of some data object of interest in a table? If no, the table suffers from
addition anomaly.
Deletion Anomaly
Could we record deletion of some data object of interest in a table without losing any information?
If no, the table suffers from deletion anomaly.
Modification Anomaly
Would an update in one attribute’s value be recorded in a table more than once? If yes, the table
suffers from modification anomaly.
40
Functional Dependencies
In order to understand why data anomalies exist, we need to understand the concept about
functional dependencies. Functional dependencies are used to describe the dependency between
the attributes within a table.
Given A and B are attributes of the same table, the attribute B is functionally dependent on the
attribute A if each value of A is associated with one and only one value of B. The notation to
represent the above notion is A B. It may be read as A determines B.
Teaching remarks
Some textbooks and online resources on database may define full functionally dependency as
follows: Attribute B is said to be full functionally dependent on attribute A if B is functionally
dependent on A and not functionally dependent on any subset of A. Such a definition is
incorrect as the authors fail to distinguish the difference between proper subset and subset.
Any set is a subset of itself. A proper subset of a set is any subset of that set excluding the set
itself.
An A/AS level textbook defines partial dependency as follows: one or more non-key attributes
depend on part of the primary key. This is not entirely correct as the notion of functional
dependencies does not restrict the independent attribute (attribute A) to be a primary key as
described in the book.
Suppose there is a Student_in_Society table storing information about student roles in various
societies and clubs in a school. The table also contains information of the teacher supervisor of
each society. The table has the following attributes (field name in parentheses): student_ID
(StdID), student_name (StdName), society_ID (SocietyID), society_name (SocName),
student_role_in_society (Position), society_teacher_ID (SupID), and society_teacher_name
(Supervisor). Given the fact that each society has exactly one society teacher to give the society
advice, the primary key of the table is a composite key composed by student_ID and society_ID.
Figure 19 shows the full functionally dependency among the attributes in the table.
41
Figure 19. Full functionally dependency among attributes
in the Student_in_Society table.
First Normal Form
If every attribute of the relation is atomic, then the relation is said to be in first normal form (1NF).
An attribute is atomic if it is not multi-valued, i.e. without repeating groups. A table which is not
in 1NF is in unnormalized form (UNF).
The usual way to modify a table in UNF to 1NF is to store the details of the repeating groups in a
separate table. This will result in the following table structures.
Student(StdID,StdName)
Student_in_Society(StdID, SocietyID, SocietyName, SupID, Superviser, Position)
Student table
StdID StdName
042123 May Wong
042132 Katie Lee
042142 June Chan
42
Student_in_Society table
StdID SocietyID SocName SupID Supervisor Position
042123 001 Chinese 1 Mr. Wong Chairman
042123 003 Maths 2 Ms. Chan Member
042132 001 Chinese 1 Mr. Wong Member
042142 002 English 1 Mr. Wong Member
042142 005 Physics 3 Mr. Lee Chairman
042142 008 Biology 4 Miss Yu Member
It is a bad idea to store the multi-valued data in the following table structure.
The table above cannot accurately represent the relationship in the real world because a student
should not be restricted to join three societies only. Allowing a student to join the fourth society
implies a modification of the table structure, which can be troublesome once data have been entered
in the table. Anyway the table is not in the 1NF.
Note that many data anomalies cannot be removed by normalizing tables to 1NF. For example, if
Mr. Kwan replaces Mr. Wong to become the society teacher of the Chinese Society, two rows in the
Student_in_Society table in Figure 21 need to be updated (i.e., modification anomaly). It also
suffers from insertion anomaly as we cannot store information about a new society as no students
have joined it. Deletion anomaly exists when the last member of a society quits. The society
information will then be permanently removed from the database.
If a table is in 1NF but not in 2NF, it must have a composite primary key according to the second
property of the 2NF. To “promote” a table from 1NF to 2NF, we need to remove the partial
dependencies in the table.
43
Let us further work on the Student_in_Society table in Figure 21 to illustrate the notion of 2NF.
We illustrate that the functional dependencies for the student table are as follows:
We can reconstruct a table in 1NF to 2NF by extracting those fields that exhibit partial dependency
in the table to one or more separate tables. In our example, the Student_in_Society table can be
made conform to 2NF by extracting SocietyName, SupID, Superviser to a separate table, say the
Society table. The attribute that the three extracted fields full functionally dependent on, i.e.,
SocietyID, will be copied to the Society table to serve as the table’s primary key. The new table
structures are:
Student(StdID,StdName)
Society(SocietyID, SocietyName, SupID, Superviser)
Student_in_Society(StdID, SocietyID, Position)
The tables in 2NF with their data are shown in Figure 22.
Student table
StdID StdName
042123 May Wong
042132 Katie Lee
042142 June Chan
Society table
SocietyID SocName SupID Supervisor
001 Chinese 1 Mr. Wong
002 English 1 Mr. Wong
003 Mathematics 2 Ms. Chan
005 Physics 3 Mr. Lee
008 Biology 4 Miss Yu
44
Student_in_Society table (revised)
StdID SocietyID Position
042123 001 Chairman
042123 003 Member
042132 001 Member
042142 002 Member
042142 005 Chairman
042142 008 Member
Figure 22. The Student, Society and Student_in_Society (revised) tables in 2NF.
Tables in 2NF are not able to solve all data anomalies either. Although the insertion and deletion
anomalies associated with the Student_in_Society table (in 1NF) have gone, the modification
anomaly still exists in the Society table in Figure 22. Suppose Mr. Wong resigns and a new
teacher, Mr. Kwan, will replace Mr. Wong to become the society teacher of all societies that Mr.
Wong used to be responsible for. Note that Mr. Kwan will use the same SupID as Mr. Wong does.
To reflect such a change in the Society table, two rows (instead of one) need to be updated.
Transitive dependency exists if one or more attributes are functionally dependent on some non-key
attribute(s). If there are three attributes in a table called A, B and C such that A B and B C.
Obviously A C and the attribute C is transitively dependent on A. In the Society table of our
example, SocietyID SupID and SupID Supervisor and thus SocietyID Supervisor which is
a kind of transitive dependency. To convert a table in 2NF to 3NF, attributes that contribute to
transitive dependencies are extracted to separate table(s). The Society table can be made conform
to 3NF by extracting Supervisor to a new table, says the Society_Teacher table. The attribute that
the Supervisor field full functionally dependent on, i.e., SupID, is copied to the Society_Teacher
table to serve as the table’s primary key. This will result in the following table structures.
Student(StdID,StdName)
Society(SocietyID, SocietyName, SupID)
Student_in_Society(StdID, SocietyID, Position)
Society_Teacher(SupID, Superviser)
The tables in 3NF with their data are shown in Figure 23.
45
Student table
StdID StdName
042123 May Wong
042132 Katie Lee
042142 June Chan
Student_in_Society table
StdID SocietyID Position
042123 001 Chairman
042123 003 Member
042132 001 Member
042142 002 Member
042142 005 Chairman
042142 008 Member
Society_Teacher table
SupID Supervisor
1 Mr. Wong
2 Ms. Chan
3 Mr. Lee
4 Miss Yu
Figure 24 shows the full series of changes introduced to transform the original data schema (in UNF)
to the final design (in 3NF).
46
Figure 24. How the original design evolved from UNF to 3NF.
Is allocated
STUDENT CLASS
to
Is managed
Is assigned by
to
CLASS POST
47
2. A construction company has over 1000 employees. A client can hire this company to do projects. Usually,
several types of employees are grouped together to finish a project, e.g. a project may require an
accountant, 2 engineers, 1 managers and 1 system analyst. At the same time, an employee may take up
more than one project. Also, to finish a project may require a number of equipments.
Equipment
Employee
Client
Is assigned
to
Hires
Works
Project
Transform the following ER diagram into the database structure. Please show the structure of the database in
the form of
Tablename (keyfield, field1, field2, …)
3. In a school, a student may be assigned with one or more functional posts, like prefect, monitor, chairman.
A post must be assigned to exactly one student. Complete the following E-R diagram.
Name
Stud_ID Post_ID
Is
assigned
to
Address Post_Name
Date_birth
48
4. For a chain store, it has a number of branches and each branch will have a manager and several staff.
E.g. staff1 and staff2 belong to branchA whereas staff3 and staff4 belong to branchB. The salaries of the
staff are according to the salary points which are according their positions and year of service. E.g. a
manager with 5 years of services will have a salary point 15 which is $25,000 and a junior staff with 2
years will have a salary point 2 which is $6,000. Complete the ER diagram:
49
6. Staff in a trading company A will purchase products from other companies through some sales agents.
a) Construct the ER diagram if there is just one sale agent for each company and staff from different
departments may contact the same company.
DEPARTMENT
have
b) Construct another ER diagram if there may be more than one sale agent for a company.
DEPARTMENT
have
COMPANY CONTACT_LIST
through
50
7. Patient takes more than one medicine, and so, the ER diagram will be
take
8. To remove multi-valued attribute means 1st Normalization. For the following scenario, which attribute will
be multi-valued? How to perform the first normalization by modifying the database structure?
Patient_ID Name Date_birth Medicine_Name Quantity
9. Apart from multi-valued attribute problems, we should solve problem of M:N relations, first we will look at
the 1:1 relation:
a) Assume there is just one class master for every class, so the ER diagram would be
belong
51
belong
teach
52
However, since it is a M:N relation, so, it will be transformed into
53
Database Design Exercise 02
1. Given that the relationship Teaches between entities TEACHER and COURSE is one-to-many. Table
should include a foreign key .
A. TEACHER, course_id
B. TEACHER, teacher_id
C. COURSE, teacher_id
D. COURSE, course_id
Study the paragraph below carefully and answer the following four questions:
In an air freight service company, each customer will request a sales order for a freight. Each sales order is
taken care of by one salesperson. Each salesperson may take care of many sales orders. A sales order is a
freight requested by a customer. Each customer has made a request for at least one freight.
54
C. SALESPERSON
D. none of the above
7. Given that the relationship studies between STUDENT and SUBJECT is many-to-many.
A. A new table is needed.
B. The tables should be combined into one.
C. The relationship studies should be converted into an attribute
D. A foreign key should be added to a table SUBJECT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
C D A C B B A
1. The discipline Master of a school wishes to store the late records of students. The following E-R diagram
is drawn.
Commits
STUDENT LATE
Y
Parents_ X
name
Phone Reason
Address
If a student is being late for more than 3 times in a semester, a clerk will make a phone call to their
parents. If a student is being late for more than 5 times in a semester, a clerk will send the parents of the
student a letter to notify the problem through the address in the above ER diagram.
a) By investigating the above diagram, what problem will be suffer?
It is mandatory.
c) What is the value of X and Y in the diagram?
X: 1 , Y: M, it is a one-to-many relation
d) Dissolve the ER diagram and present it in a database schema. Remember to identify the primary
key of tables involved.
55
STUDENT_PARENT (stud_id, parents_name)
2. The discipline Master of a school wishes to store the late records of students. The following E-R diagram
is drawn.
If a student is being late for more than 3 times in a semester, a clerk will make a phone call to their
parents. If a student is being late for more than 5 times in a semester, a clerk will send the parents of the
student a letter to notify the problem through the address in the above ER diagram.
a) By investigating the above diagram, what problem will be suffer?
56
Database Design Exercise 03
M.C.
1. Which of the following is an example of entity?
A. “Mr. Cheung”
B. Teacher “Mr. Cheung”
C. Teacher
D. The subject taught by “Mr. Cheung”
2. A primary key
(1) can be made up of more than one field
(2) is always a candidate key
(3) can have null value
A. (1) only
B. (1), (2) only
C. (1), (3) only
D. (1), (2) and (3) only
3. Which of the following is not an appropriate attribute for an entity “golf coach”?
A. name
B. sex
C. charge per hour
D. booked_date
57
6. Which of the following is NOT a purpose of creating index?
A. Carry out sorting with a smaller amount of data
B. Improve data searching performance
C. Improve data ordering performance
D. Improve data updating performance
58
(1) Set indexes to an attribute
(2) Set foreign keys
(3) Setting validation rules (constraint)
A. (1) only
B. (2) only
C. (3) only
D. (1), (2) only
59
16. Data redundancy can be minimized by
A. entering data only when necessary
B. using database approach
C. data validation
D. using traditional file-processing system
17. The domain constraints for a field require the field to have
A. non-duplicating values
B. non-empty values
C. the same data type and range
D. the same values
20. Which of the following statements about a relation “audience watch TV programs” is correct?
A. The existence of the entity “audience” in the relationship “watch” is optional.
B. The existence of the entity “TV program” in the relationship “watch” is mandatory.
C. The maximum cardinality of the entity “TV program” is 1
D. The minimum cardinality of the entity “TV program” is 1
Answers:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
C B D A C A A A A C D C B B A B C D B A
60
1. The following database is used to store the students learning portfolio. This portfolio should contain
data for the whole secondary school life, data like in which year the students participate in which club
should be included.
Student (StuID, name, address, HKID, phone, sex, DateBirth)
Club (ClubID, ClubName, TeacherInChargeID)
JoinClub (RecordNumber, StuID, ClubID, Post)
a) Point out the primary key and the candidate keys of each table.
Primary Key Candidate keys
Student StuID StuID, HKID
Club ClubID ClubID
JoinClub RecordNumber RecordNumber (Not StuID + ClubID)
b) In what ways this database schema will not work properly, try to write the SQL statement to
overcome of the above problem.
It is assumed to store the information of a particular year only. If it is used to store
several year data, we have to add a new field called year to both tables Club and
Joinclub.
ALTER TABLE club ADD year char(4)
ALTER TABLE Joinclub ADD year char(4)
2. Inspect the following database schema, briefly describe some scenarios such that they will not perform
correctly.
(i) Record (BookCode, StuID, DoB, Returned) where DoB means Date of Borrow.
Lack the field amount, it will not function properly if two books with the same
bookcode that is borrowed by the same student, but in fact, it may happen.
61
3. Now, you are the database administrator of a recreation center, you designed a form as shown below
Tai Tai Recreation Center Facility Order Form
Membership ID: Date to use the facility : / /
Facility:
Table Tennis Badminton BasketBall Volleyball
FacilityCode TT01 BN01 BL01 VL01
/ Charge ($20) ($45) ($150) ($100)
Location Room 113 SportsRoom1A SportsRoom1 SportsRoom1
Room 114 SportsRoom1B SportsRoom2 SportsRoom2
Room 115 SportsRoom1C
Time to use the facility:
Time zone Duration Choose
1 12:00 - 1:00
2 1:00 - 2:00
3 2:00 - 3:00
4 3:00 - 4:00
5 4:00 - 5:00
6 5:00 - 6:00
7 6:00 - 7:00
8 7:00 - 8:00
9 8:00 - 9:00
10 9:00 - 10:00
Signature: Date:
It is supposed that a member cannot book more than one facility at the same time zone in a particular day. i.e.
A member cannot book a table tennis court and a basketball court at the same time, or he cannot book 2 table
tennis court at the same time but he can book a table tennis court for time zone 3 and 4.
and a database schema as shown below:
Facility (FacilityCode, FacilityName, Location, charge)
Membership (MemID, Name, Sex, DateBirth, address, PhoneNumber)
FacilityRecord (MemID, FacilityCode, DoB, timezone) where DoB means Date of Booking.
Is there any problem in the database design? Briefly describe how to solve it.
There may have several different locations for a particular facility, e.g. three rooms for
TT01, so, it will the attribute Location in the table Facility to have multi-valued. To solve
this problem, you should either create a new table to hold data like FacilityCode and
Location or assign each location a unique FacilityCode for the facility even though they
are the same kind of facility.
Also, the primary key for the table FacilityRecord should be RecordNo + FacilityCode
instead of RecordNo + MemID because it is supposed that each RecordNo should be
ordered by just one Member only and hence MemID is full functionally Dependent to
and hence a new table should be created.
62
Database Design Exercise 04
Question 1:
Consider a relational database with three tables, STUDENT, COURSE and GRADE, as shown below:
STUDENT
S_NO S_NAME
1025 Mary Wu
3350 Tom Leung
4170 Peter Chow
COURSE
C_CODE C_NAME CREDITS
CHEM203 Organic Chemistry II 2
COMF117 Computer Science I 3
MATH001 Mathematics 4
GEOG108 Geography 2
GRADE
StudentID C_CODE Score
1025 CHEM203 70
1025 COMF117 75
1025 MATH001 80
3350 COMF117 55
3350 GEOG108 40
4170 GEOG108 75
63
Question 2:
A teacher has designed a database, EXAM, to store the final examination results of students as follows:
Field Name Field Type Description
StdNo Numeric Unqiue student number
Name Character Name of the student
Class Character Class of the student
Sex Character M = male, F == female
SbjCode Numeric Unique Subject Code
Subject Character Full Name of the Subject
PassMk Numeric Pass Mark of the Subject
Mark Numeric Mark of the student in the subject
(a) Explain briefly how this design leads to data redundancy (2 marks)
If a student takes 2 or more subjects, there will be more than 1 record for the same
student and fields like Name, Class and Sex are stored multiple times.
Similarly, If subjects taken by more than 1 student will have fields like subject like
Subject_and_Mk stored multiple times.
->Now, we should state that attributes name, class, sex are full functionally dependent on the primary key
“StdNo”, however, SbjCode would be multi-valued. So, it is unnormalized form.
To fix the problem of data redundancy, the teacher breaks EXAM into three interlinked tables, which use the
above field names only.
(b) Complete the new design below and underline the corresponding key field(s). Underline the primary key
in the corresponding table. (2 marks)
Table Fields
STUDENT StdNo, Class, Name, Sex
SUBJECT SbjCode, Subject, PassMk
EXAM StdNo, SbjCode, Mark
MemberID ProductID
M N
Client buy Category
Product
MemberName ProductName
PointEarned Amount
Price
64
Answer:
The attribute of Amount should not be put in the entity “Product”, it should however, be put in the
relation buy. Also, on the Client side, it should be optional instead of mandatory. i.e.
Inventory
Amount
MemberID ProductID
M N
Client buy Category
Product
MemberName ProductName
PointEarned
Price
65
Past Paper Investigation:
2000 – AS – CA #1
1. (a) A teacher uses a database file to store the information about his students.
The file has the following structure:
The teacher inputs marks and grades to the database file after each test or examination. At the end of the
school term, he finds some problems in the file design. Identify fields that are redundant and explain why
the fields are redundant
(4 marks)
Totaltest – it is simply the sum of all marks of test1 to 3 and all data in this field can be
obtained from the data in fields of test1 to 3. There is no loss of any information if this
field is deleted. Therefore this field is redundant.
Grade- since this is obtained based on average of all marks in the fields of the
database, as long as the criteria for conversion of marks to grades are the same, no
loss of information is envisaged if this field is deleted. Therefore this field is
redundant.
<- At this level, we should know that the field TotalTest and Grade are redundant, however,
sometimes in the real world, database would have fields that are redundant, the reason for this
is to speed up the data retrieval process. Under such condition, only very frequently used fields
would be created even though it is redundant.
<- Of course, data redundancy would undermine the data integrity, especially referential
integrity.
(b) The teacher would like to add a field that will store the talent of the students (e.g. special skills,
strengths, personal interests, etc.) to the database file. The teacher cannot decide whether the field
should be declared a character type or a memo type. Compare the two data types and recommend
the most suitable data type for the teacher to use.
(4 marks)
Character type of data usually stores information of a certain length that does not
differ greatly. For example, names are stored as character type of length 25. Although
there are names that are short and there are names that are long, they would not be
much longer than 25 characters in length.
However, memo type of data stores information that may vary a lot in their lengths.
66
Memo fields can even include graphics or sounds. For example, talents of students
may be very different among different students. Some students will have fewer talents
and thus will have just one or two words stored in the field while for some other
students with many talents, they will have as much as some paragraphs stored in the
field.
For the above reasons, the memo type of data is recommended for the teacher in
storing the students' talents.
<- Of course, we can use memo type as the data type for field ‘talent’. It may looks like
Name Talent
Chan Tai Man Tennis, Piano, C++
Chan Siu Man Flash, Piano, Violin
Wong Siu Ling Writing
By using the following SQL,
SELECT name FROM student WHERE UPPER(talent) LIKES “PIANO”
We are able to find the name of the student who is good at piano.
However, since talent is multi-valued, it is recommended to put talent into a new table such
that the field talent would contain just one skill. This is especially important when the skills are
pre-defined. i.e. we can set the value of the field ‘talent’ to be the foreign key which is mapped
to another database table. In that foreign key, we can set the appropriate constraint, e.g. the
value of the field talent should be existed in the parents table.
To illustrate more, lets talk about these two items, students and talents. Originally in the
question, student is regarded as the entity and talent is regarded as attribute. Now, lets think
them as two separate entities and the relation is ‘OWN’, i.e. STUDENTS OWN TALENTS.
Both of them are optional.
Note: We should always be careful about the case like what would happened when some
students have no particular skills, i.e. null in the field ‘talent’ for some students, or, some
student just do not appear in the table ‘STUDENT’.
67
Contents
68
Result Presentation .............................................................................................93
The ORDER BY clause .................................................................................93
The GROUP BY … HAVING clause ...........................................................94
Operators Used with WHERE ............................................................................96
The LIKE operator ........................................................................................96
The IN operator..............................................................................................98
The BETWEEN Operator.............................................................................98
The AND Operator ........................................................................................99
The OR operator ..........................................................................................100
Add alias to a column...................................................................................101
Joining Tables ...................................................................................................101
Equijoin.........................................................................................................101
The NATURAL JOIN operator..................................................................103
The INNER JOIN operator.........................................................................104
The LEFT (OUTER) JOIN operator .........................................................105
The RIGHT (OUTER) JOIN operator ......................................................105
The FULL (OUTER) JOIN operator .........................................................106
Combining Query Results.................................................................................107
The UNION operator...................................................................................107
The INTERSECT operator .........................................................................108
The EXCEPT/MINUS operator .................................................................109
Using nested SELECT statement................................................................110
Arithmetic Operators/Functions.......................................................................111
String Functions ...............................................................................................112
Aggregate Functions.........................................................................................113
The AVG function........................................................................................113
The COUNT function ..................................................................................114
The MAX function .......................................................................................115
The MIN function.........................................................................................116
The SUM Function.......................................................................................117
Create/Drop Table Index.............................................................................117
Exporting Data from MS Access ......................................................................119
Export Data from an MS Access Database to Another Access Database119
Export Data from an MS Access Database in other file formats.............120
69
Introduction to Structured Query Language
What is SQL?
Structural Query Language (SQL) is a standard language for manipulating and querying database
objects (e.g., table structures and contents) in the relational database management system. For
simplicity, we refer relational database management system to as database from now on. SQL
allows you to access a database. SQL can be used to define database table structure and to store,
select and manage data from the database including data insertion, update and deletion. SQL is
widely used in databases like MySQL, DB2, Oracle, PostgreSQL, Sybase, Microsoft SQL Server,
MS Access, etc.
In early 1970s, a seminal paper related to the relational database model authored by E.F. Codd
received in a considerable notice from the database community. The relational database model
provided a perfectly theoretical framework for the development of a well-formed querying language
that the model could support. By 1974, IBM had defined a language called the ‘Structured English
Query Language’ or SEQUEL. The name was later shortened as Structured Query Language (SQL).
In 1986, a standard for Structured Query Language (SQL) was defined by the American National
Standards Institute (ANSI), and this became an international standard recognized by the
International Standards Organization (ISO) in 1987. In 1989, a revised standard known commonly
as SQL89 or SQL1, was published. The ANSI committee released the SQL92 standard in 1992
(also called SQL2). This standard addressed several weaknesses in SQL89 and set forth conceptual
SQL features which at that time exceeded the capabilities of any existing RDBMS implementation.
The SQL92 standard was approximately six times the length of its predecessor. Because of this
disparity, the authors defined three levels of SQL92 compliance: Entry-level conformance,
Intermediate-level conformance, and Full conformance. Some information about the difference
among various levels of SQL92 compliance can be found here.
In 1999, the ANSI/ISO released the SQL99 standard (also called SQL3). This standard addresses
some of the more advanced areas of modern SQL systems, such as object-relational database
concepts, call level interfaces, and integrity management. SQL99 replaces the SQL92 levels of
compliance with its own degrees of conformance: Core SQL99 and Enhanced SQL99. A short
article that highlights some important changes in SQL99 can be found here.
Although various databases may implement their SQL slightly differently, they support the same
major functions (such as SELECT, UPDATE, DELETE, INSERT, WHERE, etc.) in a similar way
70
in order to fulfill the ANSI standard. This SQL statements introduced in this note are largely
based on the Entry-level conformance of SQL92.
Teaching remarks
Apparently, the SQL statements that the A/AS level curricula cover are so basic that even the
entry-level of SQL92 supports them.
Most of the SQL statements included in this note have been tested on Microsoft Access 2003.
It supports SQL92 but this requires some reconfiguration. The default database format is
Access 2000 which is not compatible with SQL92. To change the default database format,
start Access 2003. Click Tools, then Options. Click the Advanced tab and change the
Default File Format to “MS Access2002-2003” (see Figure 1). To change the SQL syntax to
SQL92, click Tools and then Options. Click the Tables/Queries tab and check both boxes
(This database and Default for new databases) under SQL Server Compatible Syntax
(ANSI 92) (see Figure 2).
It appears that the SQL92 supported by Access 2003 conforms to the entry-level only. For
example, it does not support for some join features such as NATURAL JOIN and FULL
OUTER JOIN. Other non-support features include EXCEPT and INTERSECT, etc.
A subset of the SQL92 standard that is both usable and commonly supported can be found at
http://www.firstsql.com/tutor.htm.
71
Figure 2. Setting Access 2003’s default SQL syntax to conform to SQL92.
SQL supports functions such as building and manipulating database objects, populating database
tables with data, updating existing data in tables, deleting data, performing database queries,
controlling database access and overall database administration. Such functions can be classified
into a number of categories and the most well known two categories are Data Definition Language
(DLL) and Data Manipulation Language (DML).
DDL allows user to create and restructure database objects, such as creating and deleting database
tables. Besides, DDL can be used to define table indexes as well as foreign keys between tables.
Some of the commonly used DDL commands are:
CREATE TABLE
ALTER TABLE
DROP TABLE
CREATE INDEX
ALTER INDEX
DROP INDEX
CREATE VIEW
DROP VIEW
72
DML allows users to manipulate data within the objects of a database. Some of the commonly used
DML commands are:
SELECT
INSERT INTO
UPDATE
DELETE
In a nutshell, DDL allows database users to define database objects whereas DML allows database
users to retrieve, insert, delete and update data in a database.
In order to help readers understand the SQL statements that we are going to introduce, those
statements will be illustrated in a hypothetical library database as far as possible. The tables used
in the simple library database are the Student, Book and LoanRecord tables and their details are
given below. Readers are reminded that the tables and fields kept in the proposed database are far
less than what a real library system requires. We keep the example database simple and yet
adequate for the illustration purposes.
The Student table is used to store basic student information like student ID, name, the class that the
student belongs, and phone number. The data fields of the Student table are as follows:
73
Table 2 describes the characteristics of the data fields in the Student table.
Teaching remark
Some people may opt to define numeric data like StdID and PhoneNo as integers instead of text
string. The reason why we prefer to represent the fields as text strings is that as the “numbers”
are not used for computation.
Two different data types can be used to define text strings (see next section) and it is important
for teachers to clarify to their student of the key difference between the data types.
The Book table contains the key information about the books in the library. Details of the Book
table are shown in Table 3.
74
Table 4 describes the characteristics of the data fields in the Book table.
Field Name Description
BookID Unique book ID
Text string – 8 digits
Not null
Title Book Title
Text string – 100 characters
Not null
Type Book category
Text string – 3 digits
Table 4. Characteristics of the data fields in the Book table.
The LoanRecord table contains information of the library items on loan (or once on loan). Details of
the Book table are as follows:
LoanRecID StdID BookID DateOfBorrow Status
1 0002012 00000001 20051001 1
2 0002011 00000002 20020112 2
3 0002012 00000003 20031211 2
4 0002013 00000002 20031001 2
5 0002011 00000002 20051018 1
Table 6 describes the characteristics of the data fields in the LoanRecord table.
Field Name Description
LoanRecID Unique loan record ID
Text string – 8 digits
Not null
StdID Student number
Text string – 7 digits
Not null
BookID Book ID
Text string – 8 digits
Not null
DateOfBorrow Date of the book being borrowed
Date data type
Not null
Status Loan status (1 – on loan; 2 – returned; 3 – on hold)
Text string – 1 digit
Not null
Table 6. Characteristics of the data fields in the LoanRecord table.
75
Commonly Used Data Types in SQL
The data type of a data item restricts the values that the data item can take and the operations which
one can perform on that data item. Table 7 gives some of the commonly used data types in SQL.
Note that the Boolean data type, which accepts TRUE or FALSE as its value, is not defined in
SQL92, but in SQL99. However databases support the data type even though they are not
conforming to SQL99.
Teaching remarks
A character string stored in a CHAR column is left-justified and padded with trailing blanks to
the length of the column. All the strings stored in a CHAR column have the same length. These
trailing blanks are preserved in query results.
A character string stored in a VARCHAR column has exactly the same length as the source
string or the expression that generated the string (including trailing blanks). Character strings
stored in a VARCHAR column can vary in length.
A character string stored in a VARCHAR column incurs a 2-byte overhead. Do not use this data
type for columns less than 6 bytes long or for columns that store strings of the same length. Use
the CHAR data type instead.
76
SQL Statements
Example
A database named “library_system” is created with the following statement.
CREATE DATABASE library_system
Teaching remark
Some databases like Microsoft Access may require users to create a database by using their own
user interface instead of within a SQL environment.
Example 1
Create a table called “Teacher” with two columns named “Name” and “Age” respectively.
CREATE TABLE Teacher
(
Name varchar(30),
Age int
)
77
Result
An empty Teacher table with two fields – Name and Age – is created.
Teaching remark
The Teacher table is not required in the library system example. It is created to demonstrate
another SQL statement which removes database tables.
Example 2
Create a table called “Book” that contains fields named “BookID”, “Title” and “Type” such that a
value for “BookID” must be entered for each row and its value is unique within the table.
CREATE TABLE Book
(
BookID char(8) NOT NULL UNIQUE,
Title varchar(100),
Type int
)
Result
An empty Book table with three fields – BookID, Title and Type – is created. The BookID field
is mandatory (indicated by “NOT NULL”) and unique (indicated by “UNIQUE”) within the Book
table.
The PRIMARY KEY keyword is used to specify the fields in a table that compose the table’s
primary key.
78
Syntax
CREATE TABLE TableName
(
Column1 DataType, NOT NULL
Column2 DataType, NOT NULL
.......
PRIMARY KEY (Column1, Column2, …)
)
Full Syntax
Teaching remark
Technically, all fields in a primary key should be defined to be UNIQUE and NOT NULL.
Although some databases like Microsoft Access 2003 may take all primary key fields as
UNIQUE and NOT NULL even though they are not specified, it is a good practice to specify
them explicitly.
Example
To create a table called “Student” with the primary key “StdID”, we can use the following
statement:
CREATE TABLE Student
(
StdID char(7) NOT NULL UNIQUE,
Name varchar(30),
Class char(10),
Age smallint,
OverduePay decimal(5,2),
Teaching remark
The length of the Class field is set to 10 characters long intentionally. We will alter it to 2
characters long using another SQL statement later.
79
Syntax
CREATE TABLE TableName1
(
Column1 DataType1,
Column2 DataType2,
.......
FOREIGN KEY (ColumnX, ColumnY) REFERENCES TableName2
)
Full Syntax
Example
In this example, we would like to create a table “LoanRecord” with a primary key “LoanRecID”
and two foreign keys “StdID” and “BookID” that references tables “Student” and “Book”
respectively by using the following statement.
CREATE TABLE LoanRecord
(
LoanRecID char(8) NOT NULL,
StdID char(7) NOT NULL,
BookID char(8) NOT NULL,
Dateofborrow date,
Status char(1),
PRIMARY KEY (LoanRecID),
FOREIGN KEY (StdID) REFERENCES Student,
FOREIGN KEY (BOOKID) REFERENCES Book
)
Sample Query Q2_2_CreateLoanRecord Want to Try?
The SQL script given above for creating LoanRecord table cannot run successfully because a
primary key has not been defined for the Book table created earlier. It is important to rectify the
problem by altering the structure of the Book table before running the above SQL script again.
Important remark
A special view on one or more tables in the database in form of a kind of “virtual” table can be
created with the use of the CREATE VIEW statement. The data stored in the virtual table is
extracted by the SELECT statement. Both the CREATE VIEW and SELECT statements will
be covered later.
80
Add column
To add column(s) in a table, use ADD within the ALTER TABLE statement.
Syntax
ALTER TABLE TableName ADD ColumnName DataType
Full Syntax
Example
To add a column named “PhoneNo” in the “Student” table, we can use the following statement.
ALTER TABLE Student ADD PhoneNo char(8)
Result
Drop column
To drop column(s) in a table, use DROP within the ALTER TABLE statement.
Syntax
ALTER TABLE TableName DROP ColumnName
Full Syntax
Example
To drop a column ‘Age’ in the “Student” table, we can use the following statement.
ALTER TABLE Student DROP Age
Result
81
Syntax
ALTER TABLE TableName ALTER COLUMN Column1 NewDataType
Full Syntax
Teaching remark
If a new data type is set for an existing column, the values that already exist in the column must
be compatible with the new data type. Otherwise, the query will not be running successfully.
Example
To change the data type ‘Class’ to char(2) in the “Student” table, we can use the following
statement:
ALTER TABLE Student ALTER COLUMN Class char(2)
Result
Full Syntax
Example
To change the data type ‘Name’ to NOT NULL in the “Student” table, we can use the following
statement:
ALTER TABLE Student ALTER COLUMN Name varchar(30) NOT NULL
82
Result
Teaching remark
In the above MS Access 2003 interface, the item “Required” means a mandatory entry. In
other words, the value for the field cannot be NULL, i.e., NOT NULL.
Full Syntax
Example
ALTER TABLE Book ADD PRIMARY KEY (BookID)
Result
Teaching remark
As the Book table has a primary key now, the SQL script for creating the LoanRecord table that
references the Book table can now be running successfully.
83
Deleting Database Objects
If required, a database table or even the whole database can be deleted.
Delete a table
To delete a table, use the DROP TABLE statement.
Syntax
DROP TABLE TableName
Full Syntax
Example
DROP TABLE teacher
Delete a database
We can delete the entire database with the use of the DROP DATABASE statement.
Syntax
DROP DATABASE DatabaseName
Example
DROP DATABASE my_database
Teaching remarks
The DROP DATABASE statement should be used very rarely.
You will not be able to run the DROP DATABASE statement within the graphical user
environment of MS ACCESS 2003.
Full syntax
84
Value1 is the value of the first field of the TableName table when the table is created. Similarly
Value2 is the value of the second field of the table.
Example
The following query inserts data into the Student table.
INSERT INTO Student VALUES ('0002011', 'Chan Edward', '1C', 12.5,
'21238782');
INSERT INTO Student VALUES ('0002012', 'Wong Wai Ming', '2B', 30.5,
'21234456');
INSERT INTO Student VALUES ('0002013', 'Cheung Ka Fai', '1C', 0,
'23212321');
INSERT INTO Student VALUES ('0002014', 'Chang Wai Yee', '4A', 20.5,
'23123123');
INSERT INTO Student VALUES ('0002015', 'Lee Oi Lam', '5C', 3, '25214123');
INSERT INTO Student VALUES ('0002016', 'Sze Yuk Ki', '7B', 1.5, '26434534');
Result
Full syntax
Example
85
We insert the Book ID and titles of three books into the Book table. The book category
information (stored in the Type field) is empty for the three books.
INSERT INTO Book (BookID, Title) VALUES ('00000001', 'Apple Tree');
INSERT INTO Book (BookID, Title) VALUES ('00000002', 'Bible');
INSERT INTO Book (BookID, Title) VALUES ('00000003', 'Star Wing');
Result
Full Syntax
Example
The following SQL statement retrieves (and displays) all records in the Student table.
SELECT * FROM Student
Result
86
Retrieve value(s) from particular column(s) of a table
To select data from particular column of a table, we can use the SELECT statement too.
Syntax
SELECT Column1, Column2…
FROM TableName
Full Syntax
Example
To select the ‘Name’ and ‘Class’ columns from the Student table, we can use the statement as
below.
SELECT Name, Class FROM Student
Teaching remark
If the values of the selected columns from different rows of the table are the same, multiple
occurrences of the same values will result. To avoid the duplication, the SELECT DISTINCT
statement is required.
Full Syntax
Example
In this example, we would like to identify all students who have used the library service at least
once. If we use the SELECT statement without the DISTINCT keyword, multiple occurrences of
the same students may appear if those students use the library services more than once. To avoid
the duplication, we retrieve all distinct value(s) of the ‘StdID’ field from the LoanRecord table (see
Table 5 for its content) with the use of the SELECT DISTINCT statement.
SELECT DISTINCT StdID FROM LoanRecord
87
Result
Full Syntax
Example
In this example, we would like to retrieve records of those students in the class “1C”.
SELECT * FROM Student WHERE class = '1C'
Teaching remark
Except for numeric values, the operand(s) of the operator must be enclosed by a pair of single
quotation marks ‘’.
Result
88
Creating and Deleting Data View
Create a data view
With the use of the CREATE VIEW statement, users may create a special view on one or more
tables (or views) in the database in form of a new “virtual” table. The data view is created with
the use of an associated SELECT statement. Most SQL statements that apply to a database table
can also be applied to a data view.
Syntax
CREATE VIEW ViewName (Column1, Column2…)
AS Select-Statement;
Full Syntax
Example
In this example, we would like to create a data view to store the Book ID of those library books that
are currently on loan and their corresponding borrowers (Student ID and Name).
CREATE VIEW BookOnLoan_n_Borrower_View (StdID, Name, BookID)
AS SELECT Student.StdID, Name, BookID
FROM Student, LoanRecord
WHERE Student.StdID = LoanRecord.StdID AND status='1';
Result
A data view known as BookOnLoan_n_Borrower_View is created.
89
Delete a data view
A data view can be deleted with the use of the DROP VIEW statement.
Syntax
DROP VIEW ViewName;
Full syntax
Example
The BookOnLoan_n_Borrower_View data view created earlier can be removed with the following
SQL statement.
DROP VIEW BookOnLoan_n_Borrower_View;
Result
The BookOnLoan_n_Borrower_View data view is removed.
Syntax
UPDATE TableName
SET Column = NewValue
WHERE Condition(s) The WHERE clause is optional.
Full syntax
If the WHERE clause is not used, the value in the specified column of each row will be changed to
the new value.
90
Example
Suppose we had wrongly put ‘1C’ as the value of the ‘Class’ field for students in Class 2C (and no
records for students from Class 1C have been entered), the problem can be rectified by the
following SQL statement.
UPDATE Student SET class = '2C'
WHERE class='1C';
Result
Teaching remark
Except for numeric values, the operand(s) of the operator must be enclosed by a pair of single
quotation marks ‘’.
Full syntax
If the WHERE clause is not used, the values in the specified column(s) of each row will be changed
to the new values.
Example
Suppose we have wrongly entered the name and phone number of a student with student ID equal to
‘0002011’ in the Student table earlier on, we can use the UPDATE statement to fix the problem.
The name and phone number of the student should be “Chan Ming Wai” and ‘21111182’
respectively.
UPDATE Student SET Name = 'Chan Ming Wai', PhoneNo = '21111182'
WHERE StdID='0002011';
91
Result
Full syntax
Example 1
In this example, we would like to delete all records with the book ID “00000003” from a table
“Book”.
DELETE *
FROM Book
WHERE BookID='00000003';
As the BookID field serves as a foreign key in the LoadRecord table to the Book table and there
is a corresponding record with the BookID equal to ‘00000003’ in the LoadRecord table, the
above DELETE statement cannot be executed successfully. The corresponding rows in the
LoadRecord table need to be removed in order to enable the query to run successfully.
Example 2
To delete all records in the table, we can simply use the DELETE statement without setting any
condition. After running the following SQL statement successfully, the Book table will become
empty.
DELETE FROM Book
Teaching remark
Due to the same reason as indicated in the last “Teaching remark”, the above DELETE
statement cannot be executed successfully unless no corresponding rows in the LoadRecord
table are found.
92
Result Presentation
For various reasons, users may want to organize the result of a query in ascending or descending
order of some selected fields in some occasions. In other occasions, they may be interested in the
value of some aggregated attribute of the retrieved data, e.g., the total number of books that a
student has ever borrowed. The former can be achieved with the use of the ORDER BY clause
whereas the latter can be done with the use of the GROUP BY clause, both in a SELECT statement.
Full syntax
Optional parts are put inside square brackets. A vertical bar stands for disjunction. Thus
[ASC|DESC] means that a user may use none of the keywords, or either one.
Teaching remarks
The sort fields may or may not be selected for retrieval purposes.
The default sorting order is in lexicographical order.
Example 1
Suppose we would like to sort all rows in the Student table in ascending order of the student name.
SELECT * FROM Student
ORDER BY Name
Result
Example 2
In the following example, we retrieve all rows in the Student table in ascending order of the Class
field.
93
SELECT * FROM Student
ORDER BY Class ASC
Result
Example 3
In this example, we would like to sort all records in the Student table in two levels: first in
descending order of the Class field, then in ascending order of the Name field.
SELECT * FROM Student
ORDER BY Class DESC, Name ASC
Result
Syntax
SELECT Column(s) FROM TableName
GROUP BY Column1, Column2, ...
HAVING Condition(s) The HAVINIG clause is optional.
Full syntax
94
Example 1
To count the number of students who have borrowed books in each class, we can use the GROUP
BY clause (without the HAVING part) as below:
SELECT Class, count(*) AS Num
FROM Student
GROUP BY Class
ORDER BY Class DESC;
The AS keyword enables a user to assign a new label to a selected object. In the above example,
the output of the aggregate function COUNT(*) which counts the number of output rows in each
group (as specified by the GROUP BY clause) is labeled as ‘Num’.
Result
A clause which can only be used after the GROUP BY clause is HAVING. It comes after
GROUP BY (and before ORDER BY if the clause is needed as well). The purpose of HAVING
is to set selection criteria based on some aggregate values. The following SQL query counts the
number of students in each of the classes such that its students owe the library more than 20 dollars
overdue fine in aggregate.
Example 2
SELECT Class, Count(*) AS Num
FROM Student
GROUP BY Class
HAVING SUM(OverduePay) > 20
ORDER BY Class DESC;
95
Teaching remarks
The WHERE clause sets selection criteria for the SELECT statement based on
non-aggregate value(s) only. Any selection based on aggregate value must be
done with the HAVING clause. A common student mistake is to use some
aggregate function(s) in a WHERE clause. Aggregate functions do not work
in a WHERE clause because it is given no information as to how records (i.e.,
table rows) are to be grouped. Such grouping information is provided to the
HAVING clause by the GROUP BY clause.
The SELECT statement can reference values generated by the aggregate functions
or columns specified in the GROUP BY clause only.
The HAVING clause can reference values generated by the aggregate functions or
columns specified in the GROUP BY clause only.
As shown in the above example, the parameter (which is a column) specified in an
aggregate function referred to by the HAVING clause is not needed to be included
as a column referred to by the SELECT statement.
Teaching remarks
LIKE can only be used with CHAR and VARCHAR field types.
Unless the SQL-92 syntax is selected in Microsoft Access, the database uses ‘?’ and ‘*’ for ‘_’
and ‘%’ respectively.
96
Syntax
SELECT Column(s) FROM TableName
WHERE Column LIKE pattern
Full Syntax
Example 1
In the following example, all students records with the name started with ‘Ch’ are retrieved.
SELECT *
FROM Student
WHERE Name LIKE 'Ch%';
Result
Example 2
In this example, we would like to select all students records with the student name’s second letter
being ‘h’ and last letter being ‘i’.
SELECT *
FROM Student
WHERE Name LIKE '_h%i';
Result
Example 3
The following query selects all students records with at least one ‘u’ character in.the student name.
SELECT *
FROM Student
WHERE Name LIKE '%u%'
97
Result
The IN operator
When using the WHERE clause, it is possible to use IN to specify a list of values for a selected
column that the SELECT statement requires the retrieved rows to have.
Syntax
SELECT Column(s) FROM TableName
WHERE Column IN (value1,value2,...)
Full Syntax
The value list (which is an operand) of the IN operator can be list explicitly as shown in the above
syntax or generated by another SELECT statement. The latter is known as nested SELECT
statement which will be covered later.
Example
We use the IN operator to select records of student(s) whose name is ‘Cheung Ka Fai’ or ‘Wong
Wai Ming’.
SELECT *
FROM Student
WHERE Name IN ('Cheung Ka Fai','Wong Wai Ming');
Result
98
Syntax
SELECT Column(s) FROM TableName
WHERE Column
BETWEEN value1 AND value2
Full Syntax
Example
We use the BEWTEEN operator to select students with their student ID between 0002013 and
0002015.
SELECT *
FROM Student
WHERE StdID Between '0002013' AND '0002015';
Result
Teaching remark
The result of the above query may be different in various databases as some may contain the
boundary records while some may not. However, according to the SQL-92 and SQL-99
standards, boundary records are to be included.
Full Syntax
Example
The following query retrieve student record(s) such that the student is in class 2C and has overdue
fine to settle.
SELECT * FROM Student
WHERE Class = '2C' AND OverduePay > 0
99
Result
The OR operator
By using the OR operator, we can select data rows such that at least one of its operands (which is a
condition) is fulfilled.
Syntax
SELECT Column FROM TableName
WHERE Condition1 OR Condition2
Full Syntax
Example
The following query retrieves the student records from Student table such that the student is either a
member of Class 2C or his/her name being “Chang Wai Yee”.
SELECT *
FROM Student
WHERE Class='2C' OR Name='Chang Wai Yee';
Result
100
Add alias to a column
Sometimes, the column name of the resultant table may not be expressive enough for display
purpose. In this case, we can assign alias to the column of resultant table using the AS operator.
Syntax
SELECT Column1 AS ColumnAlias1, Column2 AS ColumnAlias2,...
FROM TableName
Full Syntax
Example
The following query assigns more meaningful labels to the fields retrieved from the Student table.
SELECT StdID AS Student_ID, Name AS Student_Name, PhoneNo AS Phone_Number
FROM Student;
Result
Joining Tables
Sometimes, we may need to retrieve data from two or more tables. In this case, we can join tables
with the use of the relevant field(s) of the tables. In most cases, tables are joined according to search
conditions that find only the rows with matching values; this type of join is known as an inner
equijoin. Occasionally, non-equijoins, for example, that express a greater-than or less-than
relationship, may be used. In some other occasions, decision-support analysis may require outer
joins, which retrieve both matching and non-matching rows. The three types of outer joins are left
outer join, right outer join, and full outer join.
Equijoin
We can retrieve data from tables by setting up retrieval condition that requires the column values of
the “joined” tables being equal. In brief, equijoin is a join in which rows from two tables are
combined and added to the result set when there are equal values in the joined columns.
101
Syntax
SELECT TableName1.Column11, TableName1.Column12,...
TableName2.Column21,TableName2.Column22,...
FROM TableName1, TableName2
WHERE equality_condition(s)
Full Syntax
Result
In the above example, the “StdID” field occurs twice in the equijoin output as it can be found in
both the LoanRecord and Student tables.
Obviously there is no point in repeating the same piece of information. One of the two identical
columns can be eliminated by changing the SELECT list. The result is called a natural join. More
exactly, the natural join operation produces a Cartesian product of its two argument tables, performs
a selection that enforces equality on attributes that appears in both tables, and removes duplicate
attributes at the end.
102
By selecting the fields of interest only, the repeated occurrences of the same piece of information
shown in the previous example disappear, i.e. a natural join. In that sense, the natural join is a
subtype of the equijoin.
Result
Teaching remark
In SQL, all join conditions are to be specified explicitly. The fact that two tables have the same
attribute name, (e.g. StdID in the LoadRecord and Student tables), does not mean that a join will
be done between them automatically. Omitting the join conditions when joining tables will
result in an output that corresponds to the Cartesian product of the rows in the selected tables.
Syntax
SELECT TableName1.Column11, TableName1.Column12,...
FROM TableName1
NATURAL INNER JOIN TableName2
Full Syntax
Example
In this example, we search the list of students who have at least made use of the library service once,
just like the example showed in the second equijoin example. However, this time we do the same
query with natural join.
SELECT DISTINCT Student.Name
FROM Student
NATURAL INNER JOIN LoanRecord
103
Result
Name
Chan Ming Wai
Cheung Ka Fai
Note that the multiple occurrences of output records are eliminated with the use of DISTINCT.
Teaching remark
NATURAL JOIN is not supported by Access 2003. However it is easy to model the
NATURAL JOIN operation with the INNER JOIN operation as shown in the next section.
Syntax
SELECT Column1, Column2,…
FROM TableName1
INNER JOIN TableName2
ON Condition(s)
Full Syntax
Example
This query below models the NATURAL JOIN example given in the last section.
SELECT distinct Student.Name
FROM Student
INNER JOIN LoanRecord on (Student.stdid = LoanRecord.stdid)
Resultant Table:
104
The LEFT (OUTER) JOIN operator
The result of a LEFT JOIN operation contains every row from the first table and all matching rows
in the second table. Rows found only in the second table are not displayed. If the rows in the first
table have no match in the second table, fields corresponding to the second tables in the output rows
will be filled with null.
Syntax
SELECT Column1, Column2,...
FROM TableName1
LEFT JOIN TableName2
ON Condition(s)
Full Syntax
Example
To view all library services that the students have accessed as well as those students who have not
made use of the library services at all, we can use the following query.
SELECT Student.StdID, Student.Name, LoanRecord.LoanRecID
FROM Student
LEFT JOIN LoanRecord
ON LoanRecord.StdID=Student.StdID;
Result
Syntax
105
SELECT Column1, Column2,...
FROM TableName1
RIGHT JOIN TableName2
ON Condition(s)
Full Syntax
Example
To view the student ID, student name, and the types of library services that the student had made
use of, we may use the following query.
SELECT Student.StdID, Student.Name, LoanRecord.LoanRecID
FROM Student
RIGHT JOIN LoanRecord
ON LoanRecord.StdID=Student.StdID;
Result
Syntax
SELECT Column1, Column2,...
FROM TableName1
OUTER JOIN TableName2
ON Condition(s)
Full Syntax
Example
We modify the RIGHT JOIN example by replacing the RIGHT JOIN by a FULL JOIN.
106
SELECT Student.StdID, Student.Name, LoanRecord.LoanRecID
FROM Student
FULL JOIN LoanRecord
ON LoanRecord.StdID = Student.StdID
Want to Try?
Result
StdID Name LoanRecID
0002011 Chan Ming Wai 2
0002011 Chan Ming Wai 5
0002014 Chang Wai Yee
0002013 Cheung Ka Fai 4
0002015 Lee Oi Lam
0002016 Sze Yuk Ki
0002012 Wong Wai Ming 1
0002012 Wong Wai Ming 3
Teaching remark
FULL (OUTER) JOIN is not supported by Access 2003. However it is easy to model the
FULL (OUTER) JOIN operation by “integrating” the results of the LEFT JOIN operation and
the RIGHT JOIN operation as shown in the next section.
Syntax
SQL_Statement1
UNION
SQL_Statement2
107
Example
The following example implements the FULL OUTER JOIN example using LEFT JOIN, RIGHT
JOIN and UNION.
SELECT Student.StdID,Student.Name,LoanRecord.LoanRecID
FROM Student
LEFT JOIN LoanRecord
ON LoanRecord.StdID = Student.StdID
UNION
SELECT Student.StdID,Student.Name,LoanRecord.LoanRecID
FROM Student
RIGHT JOIN LoanRecord
ON LoanRecord.StdID = Student.StdID;
Result
Example
The following query identifies those students whose names have the substrings “Wai” and “Chan”.
SELECT Name
FROM Student
WHERE Name LIKE ‘%Wai%’
INTERSECT
108
SELECT Name
FROM Student
WHERE Name LIKE ‘%Chan%’
Want to Try?
Result
Name
Chan Ming Wai
Chang Wai Yee
Teaching remark
INTERSECT is not supported by Access 2003.
Syntax
SQL_Statement1
EXCEPT
SQL_Statement2
Example
The following query identifies those students whose names have the substring “Wai” but not
“Chan”.
SELECT Name
FROM Student
WHERE Name LIKE ‘%Wai%’
EXCEPT
SELECT Name
FROM Student
WHERE Name LIKE ‘%Chan%’
Want to Try?
109
Result
Name
Wong Wai Ming
Teaching remark
EXCEPT/MINUS is not supported by Access 2003.
Syntax (=ANY)
SELECT (Column1, Column2, ...)
FROM TableName1,TableName2
WHERE Column =ANY SELECT (Column1, Column2, ...) FROM TableName3
Full Syntax
Note that the =ANY operator can be replaced by the IN operator in the above case.
Example1
In this example, we would like to find students in 2C class who had used some library service(s)
before.
SELECT name
FROM Student
WHERE (StdID =ANY (SELECT StdID FROM LoanRecord)) AND Class = '2C';
To save the effort of referencing the LoanRecord table given earlier, its table content is displayed
again as below.
110
Table “LoanRecord”
LoanRecID StdID BookID DateOfBorrow Status
1 0002012 00000001 20051001 1
2 0002011 00000002 20020112 2
3 0002012 00000003 20031211 2
4 0002013 00000002 20031001 2
5 0002011 00000002 20051018 1
Result
Example2
In this example, we would like to find the student who owes the large amount of overdue fine to the
library.
SELECT Name, OverduePay
FROM Student
WHERE OverduePay >=ALL (SELECT OverduePay from Student)
Result
Arithmetic Operators/Functions
The following arithmetic operators/functions can be used in relevant expressions within a SQL
statement.
Operator/Function Description
+ The arithmetic add operator or unary plus operator
- The arithmetic subtract operator or unary minus
operator
* The multiply operator (a shorthand for all columns
after the SELECT keyword)
/ The arithmetic divide operator
ABS(numeric-expresssion) The ABS() function turns the value of a numeric
expression into its absolute value.
Table 8. Some SQL arithmetic operators/functions.
111
For example, if the loan period for a library item is 28 days, we can use NOW()+28 to compute the
due date for return if the item is loaned to a library user now.
String Functions
Some SQL-92 string functions are included below.
Function Type Description
CHAR_LENGTH(string-expression) or Returns the number of characters in
CHARACTER_LENGTH(string-expression) a string-expression.
for SQL92
Example
LENGTH(string-expression) The following statement will return
for Oracle the value 7.
LCASE(string-expression) Example
for Access The following statement will return
'library'.
SELECT LOWER('Library')
UCASE(string-expression) Example
for Access The following statement will return
'LIBRARY'.
SELECT UPPER('Library')
Example
The following statement will return
the value 9.
112
SELECT CHAR_LENGTH(TRIM('
chocolate ' )
Teaching remark
It appears that many databases introduce their own built-in functions although many of those
functions in fact offer the same functionality as the corresponding SQL-92 built-in functions.
It is important to check carefully before teaching the topic.
Aggregate Functions
Aggregation functions enable the user to perform tasks on more than just one record. They can be
used to perform data calculations, such as maximum, minimum, or average.
Function Usage
AVG(expression) Computes the average value of a column by the expression
COUNT(expression) Counts the rows defined by the expression
COUNT(*) Counts all rows in the specified table or view
MIN(expression) Finds the minimum value in a column by the expression
MAX(expression) Finds the maximum value in a column by the expression
SUM(expression) Computes the sum of column values by the expression
Table 10. Some SQL aggregate functions.
Syntax
SELECT AVG(Column) FROM TableName
113
Example
We can calculate the average overdue payment per library user by using the AVG
SELECT AVG(OverduePay) AS Average_Overdue_Payment FROM Student
Result
Note that the overdue payment of Cheung Ka Fai (student ID 0002013) is set to zero and thus the
computation of the average value has included the number. If the field is set to null, the average
overdue payment per user will be 13.6 instead.
Example
The following query counts the number of inactive library users who have never made use of any
library service.
SELECT Count(*)-Count(LoanRecord.Status) AS Number_of_idle_users
FROM Student
LEFT JOIN LoanRecord
ON Student.StdID = LoanRecord.StdID;
Result
Syntax - COUNT(*)
SELECT COUNT(*) FROM TableName
114
Example
The following query counts the number of students whose names start with “Chan”.
SELECT Count(*) AS Number_of_students
FROM Student
WHERE (Student.Name) LIKE "Chan%";
Result
Example
The following query finds the student who owes to the library the largest amount of overdue fine.
SELECT Name, OverduePay
FROM Student
WHERE OverduePay = (SELECT MAX(OverduePay) FROM Student)
The following SQL script implements the same query without using the MAX function.
Result
Teaching remark
Many students may produce a SQL script similar to the one below for the above
query.
115
FROM Student
The above query violates the syntactic rules of SQL. The problems lies on the
fact that a number of student names can be retrieved from the Student table (which
correspond to several rows in the output table) but all aggregate functions like
MAX() returns exactly one row in the output table only.
Syntax
SELECT MIN(Column) FROM TableName
Example
The following query finds the student who owes to the library the least amount of overdue fine.
SELECT Name, OverduePay AS Overdue_Fine
FROM Student
WHERE OverduePay = (SELECT MIN(OverduePay) FROM Student)
Teaching remark
The following SQL script implements the same query without using the MIN
function.
Result
116
The SUM Function
The SUM function computes the sum of values from a selected column.
Syntax
SELECT SUM(Column) FROM TableName
Example
The query below gives the total amount of outstanding overdue fine for each class of students.
SELECT Class, SUM(OverduePay) AS Overdue_Fine
FROM Student group by Class
Result
Full Syntax
Syntax (DROP INDEX)
DROP INDEX IndexName
ON TableName
117
CREATE INDEX ind_class_stdID
ON Student (class, StdID)
Result
Result
118
Exporting Data from MS Access
Most databases are equipped with some data export facility so that data within a database can be
“exported” for the use of other applications. Many of them also allow not only export of data, but
also export of table structures, queries and other database objects. Those features enable user to
migrate their data base from one database to another database. In this section, we will briefly
mention the data export facility in Microsoft Access. Specifically it allows its users to export data
as text, HTML or Microsoft Excel format.
119
3. Enter the file name of another Access database (.mdb) and click the Export button
120
2. Click File Export from the menu bar
121
4. Enter the file name and click the Export button
122
Database Class Practice Activities 01
http://www.yll.edu.hk/~yll-cym/ca/download/database_activity_01.mdb
There are 3 tables in a database, the structures are shown below:
CLUB
Field Data type Width Description
studID character 20 ID of the students
clubname character 20 The name of the club
fee character 1 If the fee is settled, ‘Y’, ‘N’ otherwise
position character 20 The position of the students in that club
Info
Field Data type Width Description
sex character 1 Sex of the students
name character 20 Name of the students
address character 100 District, e.g. “Yuen Long”, “Tin Shui Wai”
class character 3 The class, e.g. 1A, 2A
classno character 2 The class number
studID character 7 ID of the students
Result
Field Data type Width Description
subj character 20 The name of subject
mark integer / The mark of the student for that subject
studID character 7 The ID of the student
Step Description of the requirement / Corresponding SQL
1. Create a table called club with the above structure:
CREATE TABLE club
(
studID varchar(20),
clubname varchar(20),
fee char(1),
position varchar(20)
)
2. Add a new field called “skill” in the table “club” which is a character field with 20 character width.
ALTER TABLE club ADD skill varchar(20)
3. Reduce all the mark by 10 for the subject “chin” in the table “result”.
UPDATE result SET mark = mark-10 WHERE subj="chin"
123
4. Reset the table “club” such that all the position is changed to “senior member”.
UPDATE club SET position = 'senior member'
5. Set the skill to “violin” for the student whose studID is “2006004” in the table club.
UPDATE club SET skill = 'violin' WHERE studID='2006004';
6. Set the fee to ‘N’ for the clubname = ‘maths’ in the table “club”.
UPDATE club SET fee = 'N' WHERE clubname="maths";
12. List the fields “class” and “name” from the table “info”
124
SELECT class, name FROM info;
13. In each class, there are students from different district (i.e. the field address). Show the class and
the address with no duplication.
SELECT DISTINCT class, address FROM info
14. Show the name of the students and his or her class, the address
would have a letter “e” and the name would start with the letter
“M”.
SELECT address, class, name FROM info ORDER BY address DESC , class, name;
125
SELECT subj, AVG(mark) FROM result GROUP BY subj;
21. Show the average mark of the student ID (studID) form the table
“result”.
22. Show how many 1A students lived in each district (the field
address in the table “info”).
23 For each district, show the number of students who are living in
that district. Name the field address as “district”, Count(*) as “cnt”.
126
WHERE info.studID=result.studID;
28. Show a list of form one girl whose English mark is at least 80.
127
32. Show a list of students who have not attend any club in
the school.
SELECT studID, mark FROM result WHERE subj = 'eng' AND mark >
(SELECT AVG(mark) FROM result WHERE subj ='eng');
36. Show a list of students and the club they attended to by
making use of the command LEFT JOIN.
128
Database Class Practice Activities 02
http://www.yll.edu.hk/~yll-cym/ca/download/database_activity_02.mdb
There are 5 tables in a database, the structures are shown below:
CLUB
Field Data type Width Description
studID character 20 ID of the students
clubname character 20 The name of the club
fee character 1 If the fee is settled, ‘Y’, ‘N’ otherwise
position character 20 The position of the students in that club
Info
Field Data type Width Description
sex character 1 Sex of the students
name character 20 Name of the students
address character 100 District, e.g. “Yuen Long”, “Tin Shui Wai”
class character 3 The class, e.g. 1A, 2A
classno character 2 The class number
studID character 7 ID of the students
Result
Field Data type Width Description
subj character 20 The name of the subject
mark integer / The mark of the student for that subject
studID character 7 The ID of the student
Teacher
Field Data type Width Description
teaID character 3 The teacher ID, it is the primary key of the table.
teaName Var char 20 The name of the teacher
dob Date / Date of birth of the teacher
doa Date / Date of admission to the school
subjTeacher
Field Data type Width Description
class character 3 The name of the class
subj Var char 20 The name of the subject
teaID character 3 The teacher ID
lesson integer / The number of lessons in a cycle for that subj in that class.
129
*It is assumed that a teacher may teach more than one subject for a particular class, also, a lesson can be
taught by more than one teacher.
5. Write down the SQL statement that is needed to create a table which has the primary key stated in
question 4.
CREATE TABLE subjTeacher (
class char(3) NOT NULL,
subj varchar(20) NOT NULL,
teaID char(3) NOT NULL,
lesson integer,
PRIMARY KEY (class, subj, teaID))
6. Set the foreign key ‘teaID’ in the table ‘subjTeacher’ by referencing the table ‘Teacher’. What is the
advantage of using a foreign key?
130
ALTER TABLE subjTeacher ADD FOREIGN KEY (teaID)
REFERENCES teacher(teaID)
Foreign key is useful in ensuring data integrity or more specific referential
integrity.
10. Show the name of the teachers who is young than 30.
SELECT teaName, ROUND((date()-dob)/365,1) AS age
FROM teacher
WHERE (date()-dob)/365<30;
11. Show the list of teachers who has not yet assigned any lessons.
SELECT teaName
FROM teacher
WHERE teaID NOT IN (SELECT DISTINCT teaID FROM subjTeacher)
12. a) Create a view named “view1” which will include the class, classno, name and mark for the
subject ‘Chinese’ for 1A students.
131
CREATE VIEW view1 AS SELECT DISTINCT name FROM info
12. b) Create a view called “view2” such that it will hold the information from the table info about the
students from the class ‘1A’. Here, you should note that what will happen if the data in the view
is being modified. Also, give reason why this approach is being used.
The data in the source table is being modified too. This approach is being used
because when a view is opened to a particular users, if that users is permitted
to make modifications, then, the modifications should not be limited to the view
but also the original source table.
13. Create a index named “index1” which is according to class and classno on the table info.
CREATE INDEX index1 ON info (class, classno)
14. Add a constraint named “cons1” to the table info such that the class and classno would be unique.
ALTER TABLE info ADD CONSTRAINT cons1
UNIQUE (class, classno)
15. Add a constraint named “cons2” to the table result such that the mark has to be less than 100.
ALTER TABLE result ADD CONSTRAINT cons2
CHECK (mark < 100)
16. Add a constraint named “cons3” to the table info such that the first 2 characters of the studid would
have to be “20.
ALTER TABLE info ADD CONSTRAINT cons3
CHECK (LEFT(studid,2) = '20')
17. Create a new table called 1A_result. Then, insert those data from the table result about the 1A
students.
We should use two-step query:
CREATE TABLE 1A_result (subj char(20), mark int, studID char(10))
INSERT INTO 1A_result
SELECT * FROM result WHERE studID IN
(SELECT studID FROM info WHERE class = '1A');
OR in Oracle, we can use this statement:
CREATE TABLE table1 AS
SELECT * FROM result WHERE studID IN (SELECT studID FROM info WHERE class=’1A’
132
More on Join
A cross join is a specialized inner join. It does the same thing as the inner join, but it does not have a WHERE
clause, making it the Cartesian product of the tables you are comparing. Thus, the cross join query could look
like this:
SELECT * FROM Actor, Movie
ActorID Actor.MovieID Movie.MovieID Name Title Year
133
SQL Exercise 01:
1 The staff information of a company is stored in a table with the following structure:
STAFF
Field Name Type Width Dec Description
ID Character 5 ID of a staff, it is the primary key and will
never null
name Character 20 name of a staff
salary numeric 9 2 salary of the staff
dob date / Date of birth
State the SQL needed to create this table.
CREATE TABLE staff
(
id char(5) UNIQUE NOT NULL ,
name char(20) ,
salary numeric(9,2) ,
dob date ,
PRIMARY KEY (id)
)
2 The staff information of a company is stored in a table with the following structure:
Branch
Field Name Type Width Description
staffID Character 5 ID of a staff, it will not be null
3 The staff information of a company is stored in a table with the following structure:
Branch
Field Name Type Width Description
staffID Character 5 ID of a staff, it will not be null
134
ALTER TABLE branch
(
ADD PRIMARY KEY (branchID, staffID)
)
4 The staff information of a company is stored in a table with the following structure:
Branch
Field Name Type Width Description
staffID Character 5 ID of a staff, it will not be null
6 In the table “result”, add 5 marks to each record in the field “eng”.
UPDATE result SET eng = eng + 5
7 Delete the records of the table “student” with the field “class” = 2B
DELETE FROM student WHERE class = ‘2B’
135
SQL Exercise 02:
Consider the following database file student.dbf to store the information of students:
STUDENT
field type width contents
id numeric 4 student id number
name character 10 name
dob date 8 date of birth
sex character 1 sex: M / F
class character 2 class
hcode character 1 house code: R, Y, B, G
dcode character 3 district code
remission logical 1 fee remission
mtest numeric 2 Math test score
1. a) List all the 2A students
2. a) List the classes, names of students whose names contain the letter "e" as the third letter.
SELECT class, name FROM student WHERE name LIKE "_ _e%"
b) List the classes, names of students whose names start with "T" and do not contain "y".
SELECT class, name FROM student WHERE name LIKE "T%" AND name NOT LIKE "%y%"
c) List the names of 1A students whose Math test score is not 51, 61, 71, 81, or 91.
SELECT class, name, mtest FROM student WHERE class="1A" AND mtest NOT IN (51, 61,
d) List the students who were born between 22 March 86 and 21 April 86
SELECT class, name, dob FROM STUDENT WHERE dob BETWEEN {03/22/86} AND
{04/21/86}
136
b) List the number of pass in the Math test of each class. (passing mark = 50)
SELECT class, COUNT(*) FROM student WHERE mtest >= 50 GROUP BY class
4. a) List the students with fee remission, in the order of their classes and names.
SELECT class, name FROM student WHERE remission ORDER BY class, name
c) The controlled average (CAVG) of the Math test of a group of students is the average score from
which the highest and the lowest scores are excluded (ie. only n–2 out of n data are used).
List the CAVG of the Form 1 boys of each house.
SELECT hcode, (SUM(mtest)–MAX(mtest)–MIN(mtest)) / (COUNT(*)–2) FROM student
5. a) Create a view with the name view1 that contains the names and dob of the students, order in the
ascending order of the dob.
CREATE VIEW view1 AS SELECT name, dob FROM student ORDER BY dob
b) List the name, class and Math test score of the students whose score is at least 10 marks greater
than the average score of his / her class.
CREATE VIEW view1 AS SELECT class, AVG(mtest)+10 AS mark FROM student GROUP
BY class
SELECT s.name, s.class, s.mtest FROM student s, view1 v WHERE s.class = v.class AND
137
The files phy.dbf, chem.dbf, bio.dbf are respectively the data files of the Physics Club, Chemistry Club and
Biology Club.
PHY / CHEM / BIO
field type width contents
id numeric 4 student id number
name character 10 name
sex character 1 sex: M / F
class character 2 class
6. a) List the students who are common members of the Physics Club and the Chemistry Club.
b) List the students who are common members of the Chemistry Club and Biology Club but not of
the Physics Club.
SELECT * FROM chem WHERE id IN (SELECT id FROM bio) AND id NOT IN (SELECT id
FROM phy)
Consider the following swim.dbf which contains the information of Form 1 students participating in the
Swimming Gala. [and also student.dbf]
SWIM
field type width contents
id numeric 4 student id number
event character 20 event
7. a) Print a list of 1A students taking part in the Swimming Gala, ordered by their names. The list
should also contain the events.
SELECT s.name, w.event FROM student s, swim w WHERE s.id=w.id AND s.class="1A"
ORDER BY 1 TO PRINTER
b) List the Blue House members taking part in Free Style events.
SELECT DISTINCT s.class, s.name FROM student s, swim w WHERE s.id=w.id AND
d) Print a complete list of the Swimming Gala. The list should also show the students not taking part
in any event with "******". The list should be order by class and student name.
SELECT s.class, s.name, w.event FROM student s, swim w WHERE s.id=w.id UNION
SELECT class, name, "******" FROM student WHERE id NOT IN (SELECT id FROM swim)
138
e) List the students taking part in two or more events. [Self-join]
SELECT DISTINCT s.class, s.name FROM student s, swim w1, swim w2 WHERE s.id=w1.id
f) List the boys of each House taking part in the Swimming Gala but not taking part in 50m Back
Stroke, ordered by House and student name.
SELECT hcode, name FROM student WHERE sex="M" AND id IN (SELECT id FROM swim)
AND id NOT IN (SELECT id FROM swim WHERE event = "50m Back Stroke") ORDER BY 1,
SQL Exercise 03
1 There are two database tables, “book” and “borrow_record” for the library. Their structures are
shown below:
Book
Field Name Type Width Dec Description
bookID Character 10 ID of the book
title varchar 50 The title of the book
abstract memo / The abstract of the book
borrow_record
Field Name Type Width Dec Description
bookID Character 10 ID of the book
dob Date / Date of borrow
userID character 10 ID of the borrower
Answer the following questions:
a) Why the field “title” in the table book would use the data type varchar? Should we change the field
bookid into the data type varchar?
The data type varchar can store the contexts of that field with variable length so that it
can save the storage of the computer. In this example, the length of the variable “title” is
variable with the ceiling 50 characters. It seems that varchar is much more flexible when
compared with the data type char of which the length is fixed. However, it is more efficient
if the length field is short (because varchar will contain overhead 2 bytes) or a field of
common field length (e.g. sex, phone number, etc), so, we should not change the data type
The data type memo is not only variable in length but also it is unlimited in length (which
139
varchar will have a limit, say, 255 characters for ACCESS. So, memo is a suitable data type
for abstract.
c) Now, you want to set the bookID in the table “borrow_record” as a foreign key with the reference
of the table book.
(i) State the SQL statement needed.
(ii) Under what conditions we cannot set the foreign key to another table?
(i) ALTER TABLE borrow_record
(ii) The foreign key has to be mapped into the primary key of the reference book. So, if the
reference book (in this case, “book”) has no primary key set, the SQL statement cannot be
executed completely.
d) Now, you want to set studentID as a foreign key to another table. Can we create two foreign keys
(bookID and studentID) to two different database tables?
Yes, we can form two different foreign keys in a table.
)
b) Under what condition we cannot set the primary key?
If there is some data inside the database table, and unfortunately, some of the records of
which the combination of courseID and studentID is not unique, which is the requirement
of a primary key, then, we cannot set it as the primary key by that SQL statement.
140
SQL Exercise 04
i. Create a new field called “amount” which is a numeric data with width=10 and 2 decimal places.
Then Update the field amount by multiplying the quantity (QTY) and the sale prices (SAL_PRICE).
ALTER TABLE info ADD amount numeric(10,2)
iii. Show a list of ITEM_NO of which its’ SAL_PRICE is neither the highest nor the lowest.
SELECT item_no FROM info WHERE sal_price NOT IN (SELECT max(sal_price),
iv. What would be the output if the following SQL statement is executed:
SELECT desc FROM info WHERE desc NOT LIKE “%e” AND desc LIKE “%e%”
Trumpet
v. Create a list that shows the categories of the musical instruments of which the total number of
quantity of that category is more than 10.
SELECT category, SUM(qty) AS cnt FROM info GROUP BY category HAVING SUM(qty) > 10
141
SQL Exercise 05
A library stores the information about the books in the following table:
BOOK.DBF
Field Name Type Width Decimal
Book_id Character 4
Title Character 40
Type Character 40
Date_pur Date 8
Author Character 40
ISBN Character 20
1. Create a list that shows the book title, type and author of which the book ID range from 2100 to 2160.
____________________________________________________________________
____________________________________________________________________
2. Display the book titles which consist of the words ‘Plants’ or ‘Tree’. The book titles may be in upper or
lower cases.
____________________________________________________________________
SELECT title FROM book WHERE UPPER(title) = “PLANTS” OR UPPER(title) = “TREE”
____________________________________________________________________
____________________________________________________________________
3. Display a list of book type, date of purchase and title by ordering the records by their Type. Within each
type, arrange the records in ascending order of Date_Pur.
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
5. Show the title of the book of which the ISBN equals “0333469267”.
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
7. Find out the title of the book that has the shortest name.
____________________________________________________________________
____________________________________________________________________
142
____________________________________________________________________
8. Modify the table to include a character field with the width=20, NewBook_ID, which stores the new book
ID of the book and its record should be unique. Update this new book ID according to the following table.
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
____________________________________________________________________
9. Delete the records of those books which have been purchased for more that 50 years. (Note: You need to
compare the year of purchase with that of the system date and you may assume a year to have 365.25
days)
____________________________________________________________________
____________________________________________________________________
143
Past Paper Investigation
2004 – AS – CA #1
1. A table is created with the following SQL command to store the subject scores of Chemistry (CHEM), Biology
(BIO) and Computer Studies (CS) of a class of students. REG_NO and EN_NAME represent the registration
number and name of a student respectively.
CREATE TABLE 4D (
REG_NO CHAR(6),
EN_NAME CHAR(30),
CHEM INTEGER,
BIO INTEGER,
CS INTEGER )
(a) Modify the above SQL command so that two records with the same registration number cannot be input
into 4D.
(1 mark)
Reg_no CHAR(6) UNIQUE OR
Reg_no CHAR(6) PRIMARY KEY
2004 – AS – CA #7
7. Below are two database files, DB1 and DB2, where the first row indicates the field names.
subject staff_code Staff_code Staff_name
Chinese 01 01 May Au
144
English 03 02 Billy Ho
Maths 05
(a) Apply equi-join on DB1 and DB2, and write the result in the space provided.
subject Staff_code (DB1) Staff_code (DB2) Staff_name
(2 marks)
Answer:
subject Staff_code (DB1) Staff_code (DB2) Staff_name
Chinese 01 01 May Au
(a) Apply full outer join on DB1 and DB2, and write the result in the space provided.
Subject Staff_code (DB1) Staff_code (DB2) Staff_name
(2 marks)
Answer:
subject Staff_code (DB1) Staff_code (DB2) Staff_name
Chinese 01 01 May Au
.NULL. .NULL. 02 Billy Ho
English 03 .NULL. .NULL.
Maths 05 .NULL. .NULL.
It could be not in order (Order of records is always not important.)
It should be exactly correct.
145
Revision Exercise 01
1. How many tables will be needed to present the following relationships in 3rd normal form?
(i)
A B
1 1
146
so it also suffers insertion anomaly. At the same time, when you want to delete
the last employee in that department, you not only delete the employee but the
whole department, so, it suffers deletion anomaly also.
3. For the following database schema, what kinds of anomaly it does suffer?
Subject (SubjID, SubjName, TeacherInChargeName)
Teacher(TeacherID, TeacherName, TeacherSex, TeacherDoB, TearcherAdmissionDate)
If a subject chairperson “Mr. Wong” quit the job and “Ms. Cheung” takes the post,
then, we have to changed more than once for this piece of information. Therefore,
it suffers from modification anomaly.
1 M N 1
Student Make Application Refer Course
147
Revision Exercise 02
David is a database administrator in a famous chained bookstore, he designed a database table called
“book_info” with the following structure:
Field Type Width Decimal Description
bookcode character 8 / The code for the book
title character 50 / The title of the book
publisher character 50 / The publisher of the book
author character 100 / The author the book
pub_date Date 8 / Date of publication
price Numeric 6 2 The price of the book
discount Numeric 3 2 The discount of the book
a) Write the SQL statement such that it will create a database table with the structure shown in above.
CREATE TABLE book_info (bookcode char(8), title char(50), publisher char(50),
b) Modify the database structure such that the discount will be set to 5% as default value.
ALTER TABLE book_info ALTER discount SET DEFAULT 0.05
c) Modify the database structure such that the title of the book can never be null.
ALTER TABLE book_info ALTER title char(50) NOT NULL
d) Modify the database structure such that the date of publication can never be as early as 1980/1/1. It will
give an error message to the invalid input.
ALTER TABLE book_info ADD CONSTRAINT cons1 CHECK
e) Write down the SQL statement such that it will give the total number of books available in the bookstore
according to their publishers.
SELECT publisher, COUNT(*) AS cnt FROM book_info GROUP BY publisher
f) David suggested that the database structure should be modified such that it can classify the books into
different categories. He suggested adding a field called category in the database table. Give your
opinions on his suggestions and can you give any suggestion?
If a new field is added to the table book_info, then, it will cause a problem if a book
can be classified into more than one categories. i.e. It does not allow a book to have
two or more categories. To solve the problem, we should create new tables
according to the categories, e.g. tables called science, geography, etc. in which the
bookcode stored.
148
Revision Exercise 03
David is a database administrator in a country club. He is responsible to design a database such that it can
facilitate the club members to book some services. He created a database file called “activities”, in this
database file, it has 3 tables called “member”, “facility”, “booking”.
a) What are the differences between database file and database table?
The database file is used to contain the database tables. Database table is used to
store the data where database table will not be used to store data but the relations.
b) Why David needs to create the database file, what kinds of features he cannot perform if he did not
create the database file but only created the database table?
Foreign key
CREATE TABLE facility (Fcode CHAR(10) PRIMARY KEY, name CHAR(25), place CHAR(25), price INTEGER)
CREATE TABLE booking (Bdate DATE, Btime INTEGER, Fcode CHAR(10), MID CHAR(9))
c) Roughly estimate the file size (in K bytes) of the table “member” if there are 10000 members in the
country club. Will the file size be less than, exactly equal to or greater than your estimation?
file size = 10000*(9+25+8+10)/1024 = 508 K byte
The file size should be a little bit greater than the estimated result, it is because it
will have an extra index file for the primary key “ID”.
d) John (member id = 1011103) booked the tennis court (facility code = ten101) on April-20-2006 at the
time from 3:00 to 4:00 p.m. Write a SQL statement to insert this record to the table “booking”.
149
INSERT INTO booking (bdate, btime, fcode, mid) VALUES ({04/20/2006}, 8, “ten101”,
“1011103”)
e) Since the field “Btime” in the table “booking” should only range from 1 to 14, how to use SQL statement
to modify the structure of the table “booking” such that it will avoid the invalid inputs.
ALTER TABLE booking ALTER btime SET CHECK (btime >= 1 AND btime <=14)
f) Write a SQL statement such that it will produce a name list of members who have booked the facility
more than 5 times.
SELECT name FROM member WHERE id IN (SELECT mid FROM book GROUP BY mid
g) There are 3 different grades for memberships, the first one is “general”, the second one is “prestige” and
the third one is “VIP”. Now, the company wants to insert a field called “discount” in the table “member”
and so, give 50% off for VIP, 20% off for prestige and 10% off for general members.
(i) Write a SQL statement such that it will insert a new field to the table “member.
ALTER TABLE member ADD discount numeric(3,2)
(ii) Write a SQL statement such that it will set the discount to 50% if he / she is a VIP.
(iii) Write a SQL statement such that it will give the total number of booking being made by the
members according to their grades.
SELECT member.grade, COUNT(*) AS cnt FROM member, booking WHERE member.id
h) Write a SQL statement such that it will produce a name list of members who has spent more than $1000
in the year 2005.
SELECT member.id FROM member, facility, booking WHERE booking.mid =
150
Revision Exercise 04
151
C. diagnostic results
D. Causeway Bay
Conventional Questions:
1. A private tennis club has ten tennis courts that allow members to use. Booking from members is
accepted within one week before the tennis court is used. In each booking, each member can reserve
at most 3 tennis courts and the duration is a multiple of half-hour. Given that two entities are identified:
MEMBER
A member of the tennis club. Identifier is MemberID. Other attributes are Name of member and
Contact phone number
152
COURT
A tennis court to be used by members. Identifiers is CourtID. Other attributes includes
Location and Fee.
a) Sketch an initial E-R diagram to show the relationship and cardinality between MEMBER and COURT.
MemberID Name
Fee
Location
b) Redraw the E-R diagram to include an entity BOOKING with attributes including Date and StartTime
of use and Duration of booking.
153
a) Is the following database table 1st normal form? If not, how to modify the structure to form a 1st normal
form?
STAFF(StaffID, StaffName, Skill)
No, it is not 1st normal form. It should be changed into
b) If we treat Staff and Skill as two entities, then, construct the ER diagram for it.
c) Given that any skill is come from a set of pre-defined skills by the company, then, how to change the
structure of the database schema to reduce redundancy?
Staff (StaffID, StaffName)
Skill_Info(Skill_ID, Skill)
Foreign key are being set at the table StaffSkill to the tables Staff and Skill_info to
154
Revision Exercise 05 (SQL)
1. A machinery company stores the parts information in a table with the following structure:
CLIENT
Field Name Type Width Description
Part_no Integer Unique code for a part
Descript Character 20 Description of the part
Qty Integer Quantity of the part
supplier Character 20 Supplier of the part
Write SQL statements to fulfill the following requests. Whenever the columns are not specified,
you may use SELECT * …
a) Produce a list of parts in ascending order of quantity.
c) Produce a list of parts that have a quantity more than 20 and are supplied by ‘China Metals Co.’
SELECT * FROM parts WHERE qty > 20 AND supplier = ‘China Metals Co.’
e) Increase the quantity by 10 for those parts with quantity less than 10.
UPDATE parts SET qty = qty + 10
f) Delete records with part_no equal to 879, 654, 231 and 234
DELETE FROM parts WHERE part_no IN (879, 654, 231, 234)
h) Make a copy of the table with only fields part_no and qty. Name the new table as PARTS2.
CREATE TABLE parts2 SELECT part_no, qty FROM parts
155
2. A supermarket stores the payroll information a table with the following structure:
RESULT
Field Name Type Width Dec Description
Name Character 20 Name of employee
Post Character 20 Post of the employee
Rate Numeric 5 2 Hourly salary rate
Hour Numeric 6 2 Number of hours worked
The salary of each employee is calculated by multiplying the hourly salary rate with the number of
hours worked, i.e. Salary = Rate*Hour.
Write SQL statements to fulfill the following requests:
a) Print a list of employees showing all the information as well as the salary.
SELECT post, SUM(hour) FROM payroll GROUP BY post HAVING AVG(hours) > 8
X: 1, 2, 3, 5, 8
.
156
b) SELECT x FROM setp UNION SELECT y FROM setq
X: 1, 2, 3, 4, 6, 9
1 7
2 4
2 3
4. The staff information of a company is stored in a table with the following structure:
STAFF
Field Name Type Width Dec Description
name Character 20 name of a staff
department Character 20 department of the staff
salary numeric 9 2 salary of the staff
Identify and correct the errors in the following queries:
a) SELECT name
WHERE salary = (SELECT MAX(salary) FROM staff)
Missing the table name in the main query, so, the correction is:
Missing the parenthesis “(“ and “)” for the query, so, the correction is:
Subquery can be used in the WHERE clause only, so, the correction is:
157
Subquery cannot be used in the HAVING clause, so, the correction is:
Aggregate function (like MAX, AVG, MIN, SUM, COUNT) cannot be used directly in the
More than one record is returned form the sub-query, so, the correction is:
5. The result of an English Contest in a class are stored in a table with the following structure:
RESULT
Field Name Type Width Description
Name Character 20 Name of a competitor
mark numeric 3 Mark of the competitor
Write SQL statements to find the following:
158
6. An ISP keeps the information of the clients in a table with the following structure:
CLIENT
Field Name Type Width Description
user_id Character 10 A unique code that identifies a user
password Character 10 Password for the user
name Character 40 Name of the user
profession Character 30 Profession of the user
Identify and correct the errors in each of the following SQL statements:
a) Task: A view PASSWORD is needed to show the user_id and password only for all the clients.
159
Revision Exercise 06 (SQL)
1. STAFF
Field Name Type Width Dec Description
Department Character 20 Name of a department: e.g. sales,
purchase, account
Name Character 20 Name of the employee
Date_birth Date Date of birth of the employee
Salary Numeric 8 2 Salary of the employee
sex Character 1 Sex of employee: ‘M’ for male, ‘F’ for
female
Write SQL statements to fulfill the following requests:
a) Produce a list to show the names of departments without duplicate lines.
2. An insurance company stores the client information in a table with the following structure:
CLIENT
Field Name Type Width Dec Description
Name Character 20 Name of client
Sex Character 1 Sex of the client (‘M’ or ‘F’)
Date_birth Date 8 Date of birth of the client
Occupation Character 20 Occupation of the client
premium Numeric 8 2 Premium of the client
Write SQL statements to fulfill the following requests:
160
a) Produce a list of clients who were born on Feburary, March, June or September.
b) Produce a list showing all the occupations of the clients without duplication
c) For each year between 1970 and 1990, find the number of clients who were born in the same
year.
e) Classify clients by year of birth. Find the average premium for those groups with average premium
more than HK$500.
SELECT YEAR(date_birth), AVG(premium) FROM client
3. A school stores the activity records of the students in two related tables which have the following
structures:
ENROLLMENT
Field Name Type Width Description
Name Character 20 Name of a student
Club_id Character 4 Unique code of a club enrolled by the student
CLUB
Field Name Type Width Description
Club_id Character 4 Unique code of a club
name Character 20 Name of the club
A student may enroll on more than one club. Write SQL statements for the following tasks:
a) Create a list containing the student name and the corresponding club name(s) for each student.
161
SELECT e.name FROM enrollment AS e, club AS c
4. The club and activity information of students in a school is stored in the following tables:
ACTIVITY CLUB
Name Club_id Club_id Club_Name
Janet Chan 02 02 Swimming
Isabella Wong 03 03 Violin
Quentin Cheung 02
Robin Kong 02
Robin Kong 03
Sidney Ah 03
a) State the result of the following SQL statement:
SELECT name FROM activity
WHERE club_id =
(SELECT club_id FROM club WHERE club_name = ‘Violin’)
Name
Isabella Wong
Robin Kong
Sidney Ah
162
Write SQL statements for the following task:
a) Find the passing percentage of each subject. Display the results accurate to 1 decimal place.
FROM exam
163
Revision Exercise 06 (SQL)
1. The Education Bureau keeps the information about the schools in the tables as follows:
SCHOOL
Field Name Type Width Description
Sch_id Character 4 A unique code that identifies a school
School Character 40 Name of the school
Principal Character 40 Name of the principal of the school
telephone Character 10 Telephone number of the school
SUBJECT
Field Name Type Width Description
Subj_id Character 3 A unique code that identifies a subject
subject Character 40 The name of the subject
OFFER
Field Name Type Width Description
Sch_id Character 4 Code of the school that offers a subject
Subj_id Character 3 Code of the subject
NumOfStud Numeric 3 Number of students taking the subject
a) Explain what a foreign key field is. State an example using the tables above.
A foreign key stores the field which forms a key field of another table. Therefore, a foreign
key can uniquely identify a record in another table. Example of foreign keys are sch_id and
c) Write the SQL statement to create the table SUBJECT and OFFER.
164
d) Write SQL statements for the following tasks:
(i) Product a list showing those schools which offer the subject ‘Computer Studies’.
WHERE s.sch_id = o.sch_id AND j.subj_id = o.subj_id AND s.school = ‘ABC school’
(iv) Find the total number of students taking the subject ‘Computer studies’ in HONG KONG.
2. The results of an inter-class English Contest are stored in a table with the following structure:
RESULT
Field Name Type Width Description
Name Character 20 Name of a competitor
Class Character 2 Class of the competitor
Mark Character 3 Mark of the competitor
Write SQL statements to find the following:
a) highest, average and lowest marks in each class.
165
c) students with mark above the class average mark in each class.
3. A fashion company keeps the stock information in the tables STOCK and DESIGNER. The
structures of the tables are shown below:
STOCK
Field Name Type Width Description
product_id Character 4 Unique code that identifies a product
designer_id Character 4 Code of the designer for the product
type Character 20 Type of the product
size Character 1 Size of the product, may be ‘L’, ‘M’ or ‘S’
qty Numeric 4 Quantity of the product
DESIGNER
Field Name Type Width Description
designer_id Character 4 Unique code that identifies a designer
name Character 20 Name of the designer
telephone Character 10 Telephone number of the designer
An example of product_id is ‘0034’. Different sizes of the same product will use different
product_id.
a) State the primary keys for the above tables.
4. More efficient
166
c) Explain why the information about the designers is stored separately.
A designer may have more than one product. Storing the designers separately can avoid
data redundancy. Otherwise, updating the stock records may lead to anomalies, i.e. errors
or inconsistencies.
GROUP BY product_id
(iii) Produce a list showing the product_id of the designer ‘Timothy’, without duplicating rows.
(iv) Find which design of size ‘M’ has the largest quantity.
[Hint: A design is identified by product_id]
167
Revision Exercise 07 (SQL)
1. In the Hong Kong District Council Election, the information about the districts and candidates are
stored in the following tables:
DISTRICT
Field Name Type Width Description
Dist_id Character 4 A unique code that identifies a district
Distric Character 20 Name of the district
VoterNum Numeric 7 Number of voter of the district
CAND
Field Name Type Width Description
Candidate Character 4 Name of the candidate
Dist_id Character 4 The district code of the candidate
NumOfVote Numeric 7 Number of votes obtained by the candidate
a) For each of the following tasks, determine whether the SQL statement can fulfill the task. If not,
rewrite the SQL statement.
(i) Task: Produce a list showing all the districts
SQL: SELECT DISTINCT dist_id FROM cand
Find the average number of voters among all the districts in Hong Kong
Find the average number of votes among all candidates in each district
(iii) SELECT dist_id FROM cand GROUP BY dist_id HAVING COUNT(*) > 2
Find the codes for the district which has more than 2 candidates
168
d) Assume that each voter can vote for one candidate only. Write SQL statements to find the
following figures:
(i) the number of districts in Hong Kong
169
Past Paper on Database
2001 – AS – CA #2
2. John wants to design a database DB to store information about his friends. Therefore, he designs a
database file FRIENDS with the following structure:
(a) (i) Although HKID is unique to each person, John cannot use it as the primary key. Explain why
not.
It is because the “if any” in description indicated that HKID is not a
mandatory data item
(ii) John wants to define a primary key involving FIRST_NAME and LAST_NAME. Describe the
procedure that John should follow.
Create a new field by First_Name + Last_Name and define that field
as key or Define a composite key “Last_Name + First_Name” is also
acceptable
(iii) Give an example where the primary key in part (a)(ii) may not be valid.
John’s friends may have the same given name as well as surname
(b) John uses another database file SCHOOL in database DB to store the school codes and school
names. The contents of SCHOOL are shown below.
SCHOOL_ID SCHOOL_NAME
081 Eden College
252 Hong Kong Number One Primary School
375 The Hong Kong Government School
441 Olympian Secondary School
782 Hong Kong Iciban Secondary School
956 Intensive Middle School
170
Joe Yeung 28585656 441 …
Do the above database files violate the integrity of the database DB? Explain.
(3 marks)
Yes
It is because the school_ID 780 in school
Is missing
2002 – AS – CA #5
5. Ms. Wong is conducting a survey on the service of the school tuck shop by doing the following steps:
collecting completed written questionnaires from students;
inputting the data into a computer; and
presenting the result of the survey using a presentation graphics software package.
She finds that there are a lot of mistakes on the completed questionnaires. One of the questionnaires with
mistakes is shown below.
1000
Ms. Wong now decides to have a new arrangement so that the students can fill in the questionnaires
online.
(a) Explain how the online input can help Ms. Wong to improve the following:
(i) the completeness of data collection
Validate the presence of input data for mandatory fields (e.g. the sex field on the
questionnaire) /Check the number of selection, e.g. at most 3 items should be
selected
(ii) the correctness of data collection
Validate the range of data for the correctness of data, e.g. check the number of
purchase
Validate the format of data for the correctness of data, e.g. check the date format
(b) Give a reason to justify Ms. Wong’s new arrangement in addition to the improvements given in Part
(a).
171
Reduce the time needed for data input (other reasonable answers)
(c) Ms. Wong decides to employ a programmer to develop the system rather than to buy an existing
software package available on the market. Give TWO reasons to support Ms. Wong’s decision.
Satisfy unique requirements
Future modification or enhancement is more possible
(d) Ms. Wong only wants to use touch screens for students to input data. Describe how the students can
fill in the numerical items in the questionnaire.
Use the numeric pad on the touch screen
2003 – AS – CA #2
2 (b) Users sometimes make mistakes when keying in data into a database. Suggest two possible measures
that can be considered when designing the database in order to minimize these mistakes.
How to make a field become primary key or unique, we can do it by the command
“CREATE TABLE” or “ALTER TABLE”, remember, “Primary key” have to be
handled in database level instead of table level. i.e. We cannot change a field of a table
if the table is in a database, if the table is not in a database but it is just a single table,
we cannot define it as the primary but unique only. Primary key implies the
properties of uniqueness. E.g.
ALTER TABLE info ALTER stu_id char(10) PRIMARY KEY
We can set some validation rules to some fields by the command CREATE TABLE or ALTER
TABLE. E.g.
CREATE TABLE result (stu_id char(10), test numeric(4,0),
exam numeric(4,0) SET CHECK exam >= 0 ERROR “Positive integer only!”
(b) (i) Compared with the character data type, state one advantage of defining a field as the memo data
type.
A memo data type provides storage space for text information of variable
length to avoid unnecessary waste of storage space.
(unlimited / insert graphics / separate file)
(ii) Describe a situation in which it is more appropriate to define a field as the character data type rather
than the memo data type.
Justification: When the text information in the field is very short (e.g. less
172
than 4 characters) OR the length of the text information is limited OR many
complicated string manipulations (e.g. sorting, calculation) are required, it
is more appropriate to define as character data type.
(2 marks)
2003 – AS – CA #4
4. The following table shows the structure of a database file STUDENT containing the records of all students in a
school.
Field name Type Width Description
For each of the following cases, write suitable statement(s) (SQL / database commands) to generate a login
name for each student and store the login name into the field LOGIN_ID.
a) X represents Character.
Y represents 4.
The first two characters of LOGIN_ID are the class name of the student.
The last two characters of LOGIN_ID are the class number of the student.
Example: For a 5C student with class number 08, his/her LOGIN_ID should be ‘5C08’.
(2 mark)
UPDATE student
SET login_id = cls_name + class_no
b) X represents Numeric value without decimal places.
Y represents 4.
The first two characters of LOGIN_ID are the class name of the student.
The last two characters of LOGIN_ID are the class number of the student.
Example: For a 5C student with class number 8, his/her LOGIN_ID should be ‘5C08’.
UPDATE student
SET login_id = cls_name + STR(class_no)
WHERE class_no >=10;
UPDATE student
SET login_id = cls_name + ‘0’+STR(class_no)
WHERE class_no <=9
c) X represents Character.
173
Y represents 6.
The first two characters of LOGIN_ID are the year of admission.
The next two characters of LOGIN_ID are the class name of the student.
The last two characters of LOGIN_ID are the class number of the student.
Example: For a 5C student with class number 08 who was admitted on 09/01/97, his/her LOGIN_ID should be
‘975C08’.
UPDATE student
SET login_id = RIGHT(STR(YEAR(doa)),2) +cls_name + class_no
2005 – AS – CA #6
6. Mr. Chin, the Extra-curricular Activities Master of a school, uses the database files, CLUB and MEM, to
store the information of clubs and members of those clubs respectively. At the end of a school year,
students' testimonials will show their extra-curricular activities records which are retrieved from these
files.
CLUB
MEM
: :
174
1700135 Wong Wai 7A 30 61388792 010
There are 10 clubs in the school and thus 10 records in CLUB. The key field of CLUB is ClubNo. The
two files are related by ClubNo.
One day, a student helps Mr. Chin to input the following new record into MEM:
a) Should the database system accept the above record? Explain briefly.
(1 mark)
No. There is no such club with ClubNo 012.
<-
Here, we should know it violates the integrity of the database. In fact, even more, you should be
enable to make the database to avoid this kind of problem. How? Use foreign key and properly
set the criteria like, the inputted values of a field which have to be matched with the primary key
value in the parents table. Of course, there are more features in foreign key, revise it if needed.
b) During the school year, Ms. Chau, the teacher in charge of the Fencing Club, leaves the
school. No other teacher in the school is suitable to lead the Club so it has to be closed.
Suggest a modification to the structure of the database file(s) so that the existing clubs in the
school can be shown.
Add a logical field in the CLUB to indicate whether the club is active or not.
<-
e.g.
ClubNo ClubName TeacherInCharage Active
012 Football Chan Tai Man .T.
c) In the school, each student can join several clubs. Each time a student joins a club, there will
be a record in MEM.
(i) State two drawbacks on this arrangement of MEM.
1. data of a student may appear more than once in the file MEM /
longer data entry time (Data redundancy)
2. when the data of a student changes, all records related to the
student should have to be updated.
3. any incomplete updating causes data inconsistency
4. wastes storage space / longer access time
(ii) Suggest a primary key to MEM.
StudentID & ClubNo (Class & ClassNo & ClubNo)
<- Composite key.
2006 – AS – CA #6
6. A school uses a database file, STU, to store information about students, as follows:
STU
175
Field name Type Width Description Example of data
SNAME character 20 Name of the student Wong Lai Mei
SCLASS character 2 Class of the student 3D (Form 3, class D)
SNO character 2 Class number of the student 38
STUID character 6 Student code 501478
a) Give two conditions for a field that can be used as a primary key.
Unique and mandatory (non-empty, not null)
<- I think all of you should answer this question. Because it just requires some
fundamental knowledge (Even though you don’t know the word mandatory, you should
know it to be non-null.) In fact, the question itself should not use the word ‘Condition’,
unique and mandatory is not condition, they are properties. The condition for the
primary key is “It is the field that uniquely identifies every single entry of the database
table.” Therefore, sometimes, do not think it too seriously for the questions, try your
best to use the knowledge in the book to answer the questions.
<- Like primary key, the condition (not property) for INDEX is speed up the data
searching for some frequently used fields which can compensate the workload when
data is updated. (Always remember that if too many indexes for different fields in a
table, a single data entry modification would result a lot of workload for the indexing.
So, usually, apart from the primary key, primary key by fault would always be indexed,
there are at most one or two more fields to be indexed.) Here, you should know that the
properties of the indexed field may not necessarily be unique. i.e. repeated values can
be indexed. If in case you do not understand this paragraph, come and ask me.
<- If this time, the question ask you ‘What is the condition for creating a VIEW in a
database?’, what would you answer?
First of all, I would have to admit that I do not the focus point for this question (and it
happens always in the questions in ASCA) and I have nothing on my mind, but anyway,
I would use the knowledge of the book to answer the question. The answer I would give
would be:
A view is a virtual table forming by one or more tables existing in the database, so,
every condition applied to creating a table would be suitable for a view. E.g. a
primary key field should be present. Since the data in the view is come from other
existing tables, so, other table should be present. Also, a view is used to facilitate
different users can access partial data in different tables, (access here means read and
write and modify) so, changes in the view should result in the parents table.
<- See, just try to use the knowledge in the book and then answer the question and then
you can get the mark.
b) Suggest a primary key for STU.
STUDID / SCLASS + SNO / SNAME + SCLASS + SNO + STUID
176
<- Since primary key (or candidate key) should be the minimal number of combination
of field that uniquely identified each entry in the database table. So, basically, only
STUDID should be considered as the ‘appropriate’ choice for the primary key.
Another database file, EXAM, is used to store the students’ examination results.
EXAM
Field name Type Width Description Example of data
STUID Character 6 Student code 501478
SUBID Character 2 Subject code CH (Chemistry)
MARK Integer 2 Exam mark of student STUID in subject 78
SUBID
c) Suggest a primary key for EXAM.
STUID CHAR(6),
SUBID CHAR(2),
MARK INTEGER(2),
177
Appendix 1: databases and web server and web applications:
Web application and database relations
When we are going to create some server side application, we usually have to deal with some databases. E.g.
An online forum or an online multiple choice system would be some common applications. They are usually
done with the following process:
1.
Client computer
(HTTP request)
Since it involved with several steps, so, we will investigate it one by one.
Step 1:
A web server (it is in fact a software program) has to be built to listen to the network and handle the
HTTP request accordingly. In the market, there are two common web servers which are widely used
and free of charge. They are IIS and Apache (Open source). These web servers provide a framework to
178
deal with the HTTP requests from client computers. i.e. If a client asked a specific file (web page or a
multimedia file), these web servers will arranged the time to deliver those data.
Step 2:
Apart from static web page, a web server is supposed to provide dynamic web page, i.e. it should be
able to perform some server side applications. IIS has its server side programming built in, it is
called .asp or .aspx(asp.net). However, for Apache, one has to install the PHP to make the web server
be able to execute the server side programs, the common files are .php. Last of all, it is not enough to
have just the server side programs, an database engine has to be installed to provides the interface or
to execute the query from the application to the database. IIS has its engine installed. However, for
apache, MySQL has to be installed to get the connection to the database.
*The details of setting will be discussed in the section of setting up a web server.
Step 3:
After the delivery of the web page, basically, there is no connection between the web server and the
client computer any more. It is the web browser’s responsibility to interpret the HTML code and
displayed it on the screen. That is why the same web page from the same web server will look
differently in two different web browsers.
In Windows XP or Windows 2000, IIS is regarded as one component. So, to install an IIS, all you have
to do is:
1. Call “Add / Remove Program” in Control Panel.
2. Select Install “Windows components” and check the box of “Internet Information Services (IIS)” in
the dialog box as shown below:
3. Insert the installation CD (Windows XP prof. edition SP2) and then the IIS is installed.
179
4. To setup the web server, we can call the msc by control panel -> administrative tools -> Internet
Information Services. From it, can call modify the property of the web server.
5. First of all we should add “index.htm”, “index.asp” or “index.html” as the default home page for the
web server.
6. Now, we can create a web page called “index.htm” and put it in the root of the web server, which is
in fact in the path “C:\Inetpub\wwwroot\”.
7. Then, if you’ve created the web page with the name “index.htm” and put it in the correct folder and
set the default home page as “index.htm”, then, you should find your server working. You can test it
by launching the HTTP request with the following URL in your browser http://127.0.0.1/
For a web server, it should be opened to the public. So, if you are using a real IP, say
212.44.55.66, you can access to the web server from a remote computer through the Internet
with the following URL: http://212.44.55.66/. However, if you have a firewall, then, you should set
the firewall properly otherwise, remote computer cannot connect to your web server. As you know,
let your server being connected from the outside world is like being hacked, so, usually, we will
only allow several ports being opened to the public. For the firewall in windows XP, the firewall
should be set as
180
control panel -> network -> local area network -> (right click and select properties) -> Advance ->
Edit the firewall.
Usually, we should turn on the firewall and allow some exception, e.g. HTTP port, we can add a
port (HTTP, port number 80) is being opened. By doing so, the web server can be opened to the
public and it tries to use port number 80 for the communication. So, by now, the remote client
computers are supposed to be able to connect to the web server with the URL: http://212.44.55.66.
If your IP address is fixed, you can register a domain name and mapped the domain name with
your IP address to hold a web server. E.g. http://abc.com -> http://212.44.55.66.
Sometimes, we can set a FTP server and direct the FTP root to the root of the web server for
updating…
181
RS.Open sql, Conn <- Execute the SQL
To ensure there will not be no result has been outputted by the SQL statement, it is common to put it
this way:
If not RS.eof then <- To test if RS is End of File (eof)
RS.movefirst <- If not, points to the first selection in the RS
Do <- Start the DO looping
……
Rs.movenext <- Next selection in the RS
Loop until RS.eof <- Quit condition would be end of file
End if
After finish using the recordset RS, we should end our connection:
RS.close
conn.close
Set RS = nothing
Set conn = nothing
The above shows the core of asp statement to get access to the database. Now, we go on to study how
the project can be implemented.
182
temp_password = Request.form(“passw”)
By using the variable temp_user, we can construct the SQL statement to find the password in the
database.
sql = “SELECT password FROM student WHERE studID = ‘” & temp_user & “’”
In fact, the SQL statement should look like:
SELECT password FROM student WHERE studID = ‘03002’
RS.Open sql, Conn <- Execute the SQL and store the result in
the variable RS
if RS.fields(“password”) <> request.form(“passw”) then <- Test if the passw from the previous
response.Redirect(“wrong_input.htm”) form equals to the password in the
end if recordset RS, if not, redirect to a page
called wrong_input.htm
RS.close <-
Conn.close Clear the variables RS and conn
Set RS = Nothing
Set Conn = Nothing ->
%>
If the password of that particular studID is correct, then, it will stay (not redirect to wrong_input.htm),
then, a form is shown below:
183
At this page, select MC exercise 001 and press the corresponding button GO!, you will find a online MC
question appears as shown below:
As you can expect, there would be at least two forms, one for the MC and the other for the polling. For
the MC, the HTML code would look like:
<FORM action=”domc.asp” method=”post”>
And we can use the following codes to define the combo box for the MC
<SELECT name=”mc” > <- Define the selection button
<%
Set Conn = Server.CreateObject(“ADODB.Connection”) <-
conn.Provider=”Microsoft.Jet.OLEDB.4.0” Set
conn.Open(Server.Mappath(“db/project.mdb”)) the
connection
set RS = Server.CreateObject(“ADODB.recordset”) ->
sql2=”SELECT DISTINCT exID FROM mc” <- Define the SQL to find exID
RS.Open sql2, Conn <- Execute the SQL
If not RS.eof then
RS.movefirst
Do
Response.write “<option value=’” &RS(“exID”) & “’> <- To input the exID into the selection
“ & RS(“exID”) & “</option>” &chr(13) button, each option should be given a
Rs.movenext corresponding value
Loop until RS.eof
End if
%>
</SELECT>
With the codes above, the web application domc.asp can get the MC exercise number by using the
code “temp_exID = request.form(“mc”).
However, how does it know which user (studID)? And do we need to check the login password again?
In fact, we can use a method called session to handle it, it can be done in the logon.asp:
if RS.fields(“password”) <> request.form(“passw”) then
response.Redirect(“wrong_input.htm”)
184
else <- If the password is correct, then
Session(“authorized”) = “true” define sessiond “authorized” and
Session(“user”) = request.form(“username”) “user” and set some value to them
end if ->
Then, in the domc.asp, all we have to do is not check the password all over again, but use a simple
statement like this:
If Session(“authorized”) <> “true” then Note that the case is sensitive here.
Response.Redirect “index.htm”
End If
At last, there should be a number of answer boxes, so, the following program codes is required to
generate the names of the answer boxes.
no_record = 0
If not RS.eof then
RS.movefirst
Do
no_record = no_record + 1
Response.write “<tr><td>” & RS(“queID”) & “</td>”
Response.write “<td>” & RS(“question”) & “</td>”
Response.write “<td>” & RS(“choice1”) & “</td>”
Response.write “<td>” & RS(“choice2”) & “</td>”
Response.write “<td>” & RS(“choice3”) & “</td>”
Response.write “<td>” & RS(“choice4”) & “</td>”
Response.write “<td><input value=’A’ type=’text’
name=’”& RS(“queID”) & “’></td></tr>” <- Define the answer box to the name of
Rs.movenext the queID
Loop until RS.eof
End if
response.write(“<input type=’hidden’ name=’exID’ value=” & <- Pass a object value “exID” to the next
temp_mc & “>”) page.
session(“no_records”) = no_record
To write data into the database, one should set the authority of users
to have write property. We can highlight the database file or the
database folder, then select the choice property. Then, we can
select security, and then add a new account, say, everyone.
185
Then, you can set the everyone to have full control to the folder db
(i.e. include the right to write).
In the asp program code, we have to set the recordset’s property to have the write property, here is the
statement required.
RS.CursorType = 2
RS.LockType = 3
And data can be assigned as follows:
RS.Addnew
RS.fields(“exID”) = Request.form(“exID”)
RS.fields(“studID”) = Session(“user”)
RS.fields(“answer”) = Request.form(Cstr(counter))
Cstr is a function to convert a value into a string, it is required because the field “answer” in the database file
MC_result is set to be text.
In fact, for easier updating web application, one should set up a FTP server and open the root of the web
server to make the updating easier. GoldenFTP is a freeware to do so, you can download here:
http://www.yll.edu.hk/~yll-cym/goldenftp/goldenftp.zip
186