Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DBA Module 2

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 16

Module Title: - Identifying Physical Database Requirements

Nominal duration:-60hrs
LO1- Identify database scope
What is data, database (DB), DBMS and DBS?

In computer science, data is anything in a form suitable for use with a computer. Data is often
distinguished from programs. A program is a set of instructions that detail a task for the computer to
perform. In this sense, data is thus everything that is not program code.

A database is a collection of information that is organized so that it can easily be accessed, managed,
and updated. In one view, databases can be classified according to types of content: bibliographic, full-
text, numeric, and images.
In computing, databases are sometimes classified according to their organizational approaches. The most
prevalent approach is the relational database, a tabular database in which data is defined so that it can be
reorganized and accessed in a number of different ways. A distributed database is one that can be
dispersed or replicated among different points in a network. An object-oriented programming database is
one that is congruent with the data defined in object classes and subclasses.

As one of the oldest components associated with computers, the database management system
(DBMS) is a computer software program that is designed as the means of managing all databases that
are currently installed on a system hard drive or network. Different types of database management
systems exist, with some of them designed for the oversight and proper control of databases that are
configured for specific purposes.

In database management system (DBMS), data files are the files that store the database information,
whereas other files, such as index files and data dictionaries, store administrative information, known as
metadata.

Database system is a system to achieve an organized, store a large number of dynamical associated
data, facilitate for multi-user accessing to computer hardware, software and data, that it is a computer
system with database technology.
Definition of Database Administrator (DBA)

A database administrator (short form DBA) is a person responsible for the installation, configuration,
upgrade, administration, monitoring and maintenance of databases in an organization.

The role includes the development and design of database strategies, system monitoring and improving
database performance and capacity, and planning for future expansion requirements. They may also
plan, co-ordinate and implement security measures to safeguard the database.

    A data administration (also known as a database administration manager, data architect, or
information center manager) is a high level function responsible for the overall management of data
resources in an organization.  In order to perform its duties, the DA must know a good deal of system
analysis and programming. 

    Database administration is more of an operational or technical level function responsible for
physical database design, security enforcement, and database performance.  Tasks include maintaining
the data dictionary, monitoring performance, and enforcing organizational standards and security. 

Functions of Database Administrator 

Before trying to understand the functions of the database administrator, it is necessary to first learn the
three different functional levels needed to maintain a database. 
These levels are
 The data administration (DA),
 The database administration (DBA), and
 Database steward.
These are the functions of a data administrator (not to be confused with database administrator
functions):
1. Data policies, procedures, standards
2. Planning- development of organization's IT strategy, enterprise model, cost/benefit model, design of
database environment, and administration plan.
3. Data conflict (ownership) resolution
4. Data analysis- Define and model data requirements, business rules, operational requirements, and
maintain corporate data dictionary
5. Internal marketing of DA concepts
6. Managing the data repository
What is a database steward?
A database steward is an administrative function responsible for managing data quality and assuring
that organizational applications meet the enterprise goals. It is a connection between IT and business
units.  Data quality issues include
 Security and disaster recovery,
 personnel controls,
 Physical access controls,
 maintenance controls,
 And data protection and privacy. 
For example, in order to increase security the database steward can have control over who can gain
access to the data base by assigning specific privileges to users.
What are the functions of a database administrator?

1. Selection of hardware and software


 Keep up with current technological trends
 Predict future changes
 Emphasis on established off the shelf products
2. Managing data security and privacy
 Protection of data against accidental or intentional loss, destruction, or misuse
 Firewalls
 Establishment of user privileges
 Complicated by use of distributed systems such as internet access and client/ server technology.
How many major threats to database security can you think of?
1. Accidental loss due to human error or software/ hardware error.
2. Theft and fraud that could come from hackers or disgruntled employees.
3. Improper data access to personal or confidential data.
4. Loss of data integrity.
5. Loss of data availability through sabotage, a virus, or a worm.
3. Managing Data Integrity
 Integrity controls protects data from unauthorized use
 Data consistency
 Maintaining data relationship
 Domains- sets allowable values
 Assertions- enforce database conditions
4. Data backup
 We must assume that a database will eventually fail
 Establishment procedures
o How often should the data be back-up?
o What data should be backed up more frequently?
o Who is responsible for the back ups?
 Back up facilities
o automatic dump- facility that produces backup copy of the entire database
o periodic backup- done on periodic basis such as nightly or weekly
o cold backup- database is shut down during backup
o hot backup- a selected portion of the database is shut down and backed up at a given time
o backups stored in a secure, off-site location
5. Database recovery
 Application of proven strategies for reinstallation of database after crash
 Recovery facilities include backup, journalizing, checkpoint, and recovery manager
6. Tuning database performance
 Set installation parameters/ upgrade DBMS
 Monitor memory and CPU usage
 Input/ output contention
o user striping
o distribution of heavily accessed files
 Application tuning by modifying SQL code in applications
7. Improving query processing performance
DBA Responsibilities

 Installation, configuration and upgrading of Database server software and related products.
 Evaluate Database features and Database related products.
 Establish and maintain sound backup and recovery policies and procedures.
 Take care of the Database design and implementation.
 Implement and maintain database security (create and maintain users and roles, assign
privileges).
 Database tuning and performance monitoring.
 Application tuning and performance monitoring.
 Setup and maintain documentation and standards.
 Plan growth and changes (capacity planning).
 Work as part of a team and provide 24x7 support when required.
 Do general technical troubleshooting and give cons.

Types of Database Administration


There are three types of DBAs:
1. Systems DBAs (also referred to as Physical DBAs, Operations DBAs or Production Support
DBAs): focus on the physical aspects of database administration such as DBMS installation,
configuration, patching, upgrades, backups, restores, refreshes, performance optimization, and
maintenance and disaster recovery.
2. Development DBAs: focus on the logical and development aspects of database administration
such as data model design and maintenance, DDL (data definition language) generation, SQL
writing and tuning, coding stored procedures, collaborating with developers to help choose the
most appropriate DBMS feature/functionality and other pre-production activities.
3. Application DBAs: usually found in organizations that have purchased 3rd party application
software such as ERP (enterprise resource planning) and CRM (customer relationship
management) systems. Examples of such application software include Oracle Applications,
Siebel and PeopleSoft (both now part of Oracle Corp.) and SAP. Application DBAs straddle the
fence between the DBMS and the application software and are responsible for ensuring that the
application is fully optimized for the database and vice versa. They usually manage all the
application components that interact with the database and carry out activities such as application
installation and patching, application upgrades, database cloning, building and running data
cleanup routines, data load process management, etc.

Logical and Physical Database Requirements


The requirements for a logical and physical database vary by size and design parameters. A logical
database must be able to access and identify all files within the storage system to operate correctly,
whereas a physical database manages a much smaller field of information. Sometimes, a physical
database stores only a single file with one value or word in it.

Logical Database Definition

A logical database is the collected information stored on multiple physical disk files and hard drives
within a computer. This database provides a structure to house all the accumulated information within
the device and determines the relationships between different types of files and programs. A logical
database determines these relationships through a series of highly structured tables designed to
categorize information into groups for easier accessibility. Without this categorization, accessing
different files within a computer would take additional time as the system searched each file for the
appropriate match.

Logical Database Requirements


A logical database can stretch over multiple physical hard disks and information files. The data storage
unit is still a single database for information retrieval purposes. To have a logical database, all given
hard disks and information files must be accessible from a single source. An example would be a
personal computer able to access its information files stored on multiple hard drives from a single user
interface. According to Microsoft, when a logical database is successful, the user sees a coherent list of
information from a central location that draws from the many file sources tied into the storage system.

Physical Database Definition


A physical database is both the actual device housing the information files and the search paths used to
access information between each source. According to Microsoft, the term "database" refers only to the
logical database controlling information files for the entire system. A physical database is technically a
smaller unit of storage referred to as either a company, field, record or table, depending on how much
information the physical storage device contains. A field is the smallest unit of storage housing only a
single file. A company is the largest -- next to a database -- housing separate, large groups of data.

Physical Storage Requirements


The requirements for a physical database vary by the parameters of the storage device in question. For
example, a flash drive designed to hold up to 2 gigabytes of information needs a personal computer or
another USB-connected device to allow access to the information stored on the equipment. A physical
database also needs a power source to access information. A computer hard drive cannot function
without electricity. A flash drive cannot operate without a device with an adequate power source.
Database security
Database security concerns the use of a broad range of information security controls to protect
databases (potentially including the data, the database applications or stored functions, the database
systems, the database servers and the associated network links) against compromises of their
confidentiality, integrity and availability. It involves various types or categories of controls, such as
technical, procedural/administrative and physical. Database security is a specialist topic within the
broader realms of computer security, information security and risk management.

Security risks to database systems include, for example:

 Unauthorized or unintended activity or misuse by authorized database users, database


administrators, or network/systems managers, or by unauthorized users or hackers (e.g.
inappropriate access to sensitive data, metadata or functions within databases, or inappropriate
changes to the database programs, structures or security configurations);
 Malware infections causing incidents such as unauthorized access, leakage or disclosure of
personal or proprietary data, deletion of or damage to the data or programs, interruption or denial
of authorized access to the database, attacks on other systems and the unanticipated failure of
database services;
 Overloads, performance constraints and capacity issues resulting in the inability of authorized
users to use databases as intended;
 Physical damage to database servers caused by computer room fires or floods, overheating,
lightning, accidental liquid spills, static discharge, electronic breakdowns/equipment failures and
obsolescence;
 Design flaws and programming bugs in databases and the associated programs and systems,
creating various security vulnerabilities (e.g. unauthorized privilege escalation), data
loss/corruption, performance degradation etc.;
 Data corruption and/or loss caused by the entry of invalid data or commands, mistakes in
database or system administration processes, sabotage/criminal damage etc.

An Introduction to Database Management Systems


A database is a collection of related files that are usually integrated, linked or cross-referenced to one
another. The advantage of a database is that data and records contained in different files can be easily
organized and retrieved using specialized database management software called a database management
system (DBMS) or database manager.

After reading this lesson, you should be able to:

 Define the term database management system (DBMS).


 Describe the basic purpose and functions of a DBMS.
 Discuss the advantages and disadvantages of DBMSs.

DBMS Fundamentals
A database management system is a set of software programs that allows users to create, edit and update
data in database files, and store and retrieve data from those database files. Data in a database can be
added, deleted, changed, sorted or searched all using a DBMS. If you were an employee in a large
organization, the information about you would likely be stored in different files that are linked together.
One file about you would pertain to your skills and abilities, another file to your income tax status,
another to your home and office address and telephone number, and another to your annual performance
ratings. By cross-referencing these files, someone could change a person's address in one file and it
would automatically be reflected in all the other files. DBMSs are commonly used to manage:

 Membership and subscription mailing lists


 Accounting and bookkeeping information
 The data obtained from scientific research
 Customer information
 Inventory information
 Personal records
 Library information

DBMSs and File Management Systems


Computerized file management systems (sometimes called file managers) are not considered true
database management systems because files cannot be easily linked to each other. However, they can
serve as useful data management functions by providing a system for storing information in files. For
example, a file management system might be used to store a mailing list or a personal address book.
When files need to be linked, a relational database should be created using database application software
such as Oracle, Microsoft Access, IBM DB2, or FileMaker Pro.

The Advantages of a DBMS


Improved availability: One of the principle advantages of a DBMS is that the same information can be
made available to different users.

Minimized redundancy: The data in a DBMS is more concise because, as a general rule, the
information in it appears just once. This reduces data redundancy, or in other words, the need to repeat
the same data over and over again. Minimizing redundancy can therefore significantly reduce the cost of
storing information on hard drives and other storage devices. In contrast, data fields are commonly
repeated in multiple files when a file management system is used.

Accuracy: Accurate, consistent, and up-to-date data is a sign of data integrity. DBMSs foster data
integrity because updates and changes to the data only have to be made in one place. The chances of
making a mistake are higher if you are required to change the same data in several different places than
if you only have to make the change in one place.

Program and file consistency: Using a database management system, file formats and system programs
are standardized. This makes the data files easier to maintain because the same rules and guidelines
apply across all types of data. The level of consistency across files and programs also makes it easier to
manage data when multiple programmers are involved.

User-friendly: Data is easier to access and manipulate with a DBMS than without it. In most cases,
DBMSs also reduce the reliance of individual users on computer specialists to meet their data needs.

Improved security: As stated earlier, DBMSs allow multiple users to access the same data resources.
This capability is generally viewed as a benefit, but there are potential risks for the organization. Some
sources of information should be protected or secured and only viewed by select individuals. Through
the use of passwords, database management systems can be used to restrict data access to only those
who should see it.
The Disadvantages of a DBMS

There are basically two major downsides to using DBMSs. One of these is cost, and the other the threat
to data security.

Cost: Implementing a DBMS system can be expensive and time-consuming, especially in large
organizations. Training requirements alone can be quite costly.

Security: Even with safeguards in place, it may be possible for some unauthorized users to access the
database. In general, database access is an all or nothing proposition. Once an unauthorized user gets
into the database, they have access to all the files, not just a few. Depending on the nature of the data
involved, these breaches in security can also pose a threat to individual privacy. Steps should also be
taken to regularly make backup copies of the database files and store them because of the possibility of
fires and earthquakes that might destroy the system.

Knowledge Check
What is an advantage of major database management systems?
1. The same information can be made available to different users.
2. Fires and earthquakes that might destroy the system.
3. Once an unauthorized user gets into the database, they have access to all the files.
4. Time and cost to implement.
Answer A is correct. An advantage of major database management systems is that the same information
can be made available to different users.

Types of Database Management Systems


DBMSs come in many shapes and sizes. For a few hundred dollars, you can purchase a DBMS for your
desktop computer. For larger computer systems, much more expensive DBMSs are required. Many
mainframe-based DBMSs are leased by organizations. DBMSs of this scale are highly sophisticated and
would be extremely expensive to develop from scratch. Therefore, it is cheaper for an organization to
lease such a DBMS program than to develop it. Since there are a variety of DBMSs available, you
should know some of the basic features, as well as strengths and weaknesses, of the major types.

After reading this lesson, you should be able to:

 Compare and contrast the structure of different database management systems.


 Define hierarchical databases.
 Define network databases.
 Define relational databases.
 Define object-oriented databases.

Types of DBMS: Hierarchical Databases

There are four structural types of database management systems: hierarchical, network, relational, and
object-oriented.
Hierarchical Databases (DBMS), commonly used on mainframe computers, have been around for a
long time. It is one of the oldest methods of organizing and storing data, and it is still used by some
organizations for making travel reservations. A hierarchical database is organized in pyramid fashion,
like the branches of a tree extending downwards. Related fields or records are grouped together so that
there are higher-level records and lower-level records, just like the parents in a family tree sit above the
subordinated children.

Based on this analogy, the parent record at the top of the pyramid is called the root record. A child
record always has only one parent record to which it is linked, just like in a normal family tree. In
contrast, a parent record may have more than one child record linked to it. Hierarchical databases work
by moving from the top down. A record search is conducted by starting at the top of the pyramid and
working down through the tree from parent to child until the appropriate child record is found.
Furthermore, each child can also be a parent with children underneath it.

The advantage of hierarchical databases is that they can be accessed and updated rapidly because the
tree-like structure and the relationships between records are defined in advance. However, this feature is
a two-edged sword. The disadvantage of this type of database structure is that each child in the tree may
have only one parent, and relationships or linkages between children are not permitted, even if they
make sense from a logical standpoint. Hierarchical databases are so rigid in their design that adding a
new field or record requires that the entire database be redefined.
Types of DBMS: Network Databases

Network databases are similar to hierarchical databases by also having a hierarchical structure. There
are a few key differences, however. Instead of looking like an upside-down tree, a network database
looks more like a cobweb or interconnected network of records. In network databases, children are
called members and parents are called owners. The most important difference is that each child or
member can have more than one parent (or owner).

Like hierarchical databases, network databases are principally used on mainframe computers. Since
more connections can be made between different types of data, network databases are considered more
flexible. However, two limitations must be considered when using this kind of database. Similar to
hierarchical databases, network databases must be defined in advance. There is also a limit to the
number of connections that can be made between records.

Types of DBMS: Relational Databases

In relational databases, the relationship between data files is relational, not hierarchical. Hierarchical
and network databases require the user to pass down through a hierarchy in order to access needed data.
Relational databases connect data in different files by using common data elements or a key field. Data
in relational databases is stored in different tables, each having a key field that uniquely identifies each
row. Relational databases are more flexible than either the hierarchical or network database structures.
In relational databases, tables or files filled with data are called relations, tuples designates a row or
record, and columns are referred to as attributes or fields.

Relational databases work on the principle that each table has a key field that uniquely identifies each
row, and that these key fields can be used to connect one table of data to another. Thus, one table might
have a row consisting of a customer account number as the key field along with address and telephone
number. The customer account number in this table could be linked to another table of data that also
includes customer account number (a key field), but in this case, contains information about product
returns, including an item number (another key field). This key field can be linked to another table that
contains item numbers and other product information such as production location, color, quality control
person, and other data. Therefore, using this database, customer information can be linked to specific
product information.

The relational database has become quite popular for two major reasons. First, relational databases can
be used with little or no training. Second, database entries can be modified without redefining the entire
structure. The downside of using a relational database is that searching for data can take more time than
if other methods are used.

Types of DBMS: Object-oriented Databases (OODBMS)


Able to handle many new data types, including graphics, photographs, audio, and video, object-
oriented databases represent a significant advance over their other database cousins. Hierarchical and
network databases are all designed to handle structured data; that is, data that fits nicely into fields,
rows, and columns. They are useful for handling small snippets of information such as names, addresses,
zip codes, product numbers, and any kind of statistic or number you can think of. On the other hand, an
object-oriented database can be used to store data from a variety of media sources, such as photographs
and text, and produce work, as output, in a multimedia format.

Object-oriented databases use small, reusable chunks of software called objects. The objects themselves
are stored in the object-oriented database. Each object consists of two elements: 1) a piece of data (e.g.,
sound, video, text, or graphics), and 2) the instructions, or software programs called methods, for what
to do with the data. Part two of this definition requires a little more explanation. The instructions
contained within the object are used to do something with the data in the object. For example, test scores
would be within the object as would the instructions for calculating average test score.

Object-oriented databases have two disadvantages. First, they are more costly to develop. Second, most
organizations are reluctant to abandon or convert from those databases that they have already invested
money in developing and implementing. However, the benefits to object-oriented databases are
compelling. The ability to mix and match reusable objects provides incredible multimedia capability.
Healthcare organizations, for example, can store, track, and recall CAT scans, X-rays,
electrocardiograms and many other forms of crucial data.
Knowledge Check
Which of the following is a database management system (DBMS) that works on the principle that
each table has a key field that uniquely identifies each row, and that these key fields can be used to
connect one table of data to another?
1. Hierarchical databases
2. Network databases
3. Relational databases
4. Object-oriented databases
Answer C is correct.
LO -2
Quick-Start Tutorial on Relational Database Design
Introduction

Relational database was proposed by Edgar Codd (of IBM Research) around 1969. It has since become
the dominant database model for commercial applications (in comparison with other database models
such as hierarchical, network and object models). Today, there are many commercial Relational
Database Management System (RDBMS), such as Oracle, IBM DB2 and Microsoft SQL Server. There
are also many free and open-source RDBMS, such as MySQL, mSQL (mini-SQL) and the embedded
JavaDB (Apache Derby).

A relational database organizes data in tables (or relations). A table is made up of rows and columns. A
row is also called a record (or tuple). A column is also called a field (or attribute). A database table is
similar to a spreadsheet. However, the relationships that can be created among the tables enable a
relational database to efficiently store huge amount of data, and effectively retrieve selected data.

A language called SQL (Structured Query Language) was developed to work with relational databases.

Database Design Objective

A well-designed database shall:

 Eliminate Data Redundancy: the same piece of data shall not be stored in more than one place. This is
because duplicate data not only waste storage spaces but also easily lead to inconsistencies.
 Ensure Data Integrity and Accuracy:
 [TODO] more

Relational Database Design Process

Database design is more art than science, as you have to make many decisions. Databases are usually
customized to suit a particular application. No two customized applications are alike, and hence, no two
database are alike. Guidelines (usually in terms of what not to do instead of what to do) are provided in
making these design decision, but the choices ultimately rest on the you - the designer.

Step 1: Define the Purpose of the Database (Requirement Analysis)

Gather the requirements and define the objective of your database, e.g. ...

Drafting out the sample input forms, queries and reports, often helps.

Step 2: Gather Data, Organize in tables and Specify the Primary Keys

Once you have decided on the purpose of the database, gather the data that are needed to be stored in the
database. Divide the data into subject-based tables.

Choose one column (or a few columns) as the so-called primary key, which uniquely identify the each of
the rows.
Primary Key

In the relational model, a table cannot contain duplicate rows, because that would create ambiguities in
retrieval. To ensure uniqueness, each table should have a column (or a set of columns), called primary
key, that uniquely identifies every records of the table. For example, an unique number customerID can
be used as the primary key for the Customers table; productCode for Products table; isbn for Books
table. A primary key is called a simple key if it is a single column; it is called a composite key if it is
made up of several columns.

Most RDBMSs build an index on the primary key to facilitate fast search and retrieval.

The primary key is also used to reference other tables (to be elaborated later).

You have to decide which column(s) is to be used for primary key. The decision may not be straight
forward but the primary key shall have these properties:

 The values of primary key shall be unique (i.e., no duplicate value). For example, customerName may
not be appropriate to be used as the primary key for the Customers table, as there could be two
customers with the same name.
 The primary key shall always have a value. In other words, it shall not contain NULL.

Consider the followings in choose the primary key:

 The primary key shall be simple and familiar, e.g., employeeID for employees table and isbn for
books table.
 The value of the primary key should not change. Primary key is used to reference other tables. If you
change its value, you have to change all its references; otherwise, the references will be lost. For
example, phoneNumber may not be appropriate to be used as primary key for table Customers,
because it might change.
 Primary key often uses integer (or number) type. But it could also be other types, such as texts.
However, it is best to use numeric column as primary key for efficiency.
 Primary key could take an arbitrary number. Most RDBMSs support so-called auto-increment (or
AutoNumber type) for integer primary key, where (current maximum value + 1) is assigned to the new
record. This arbitrary number is fact-less, as it contains no factual information. Unlike factual
information such as phone number, fact-less number is ideal for primary key, as it does not change.
 Primary key is usually a single column (e.g., customerID or productCode). But it could also make up
of several columns. You should use as few columns as possible.

Let's illustrate with an example: a table customers contains columns lastName, firstName,
phoneNumber, address, city, state, zipCode. The candidates for primary key are name=(lastName,
firstName), phoneNumber, Address1=(address, city, state), Address1=(address, zipCode).
Name may not be unique. Phone number and address may change. Hence, it is better to create a fact-less
auto-increment number, says customerID, as the primary key.

Step 3: Create Relationships among Tables

A database consisting of independent and unrelated tables serves little purpose (you may consider to use
a spreadsheet instead). The power of relational database lies in the relationship that can be defined
between tables. The most crucial aspect in designing a relational database is to identify the relationships
among tables. The types of relationship include:

1. one-to-many
2. many-to-many
3. one-to-one

One-to-Many

In a "class roster" database, a teacher may teach zero or more classes, while a class is taught by one (and
only one) teacher. In a "company" database, a manager manages zero or more employees, while an
employee is managed by one (and only one) manager. In a "product sales" database, a customer may
place many orders; while an order is placed by one particular customer. This kind of relationship is
known as one-to-many.

One-to-many relationship cannot be represented in a single table. For example, in a "class roster"
database, we may begin with a table called Teachers, which stores information about teachers (such as
name, office, phone and email). To store the classes taught by each teacher, we could create columns
class1, class2, class3, but faces a problem immediately on how many columns to create. On the
other hand, if we begin with a table called Classes, which stores information about a class
(courseCode, dayOfWeek, timeStart and timeEnd); we could create additional columns to store
information about the (one) teacher (such as name, office, phone and email). However, since a teacher
may teach many classes, its data would be duplicated in many rows in table Classes.

To support a one-to-many relationship, we need to design two tables: a table Classes to store
information about the classes with classID as the primary key; and a table Teachers to store
information about teachers with teacherID as the primary key. We can then create the one-to-many
relationship by storing the primary key of the table Teacher (i.e., teacherID) (the "one"-end or the
parent table) in the table classes (the "many"-end or the child table), as illustrated below.

The column teacherID in the child table Classes is known as the foreign key. A foreign key of a child
table is a primary key of a parent table, used to reference the parent table.

Take note that for every value in the parent table, there could be zero, one, or more rows in the child
table. For every value in the child table, there is one and only one row in the parent table.

Many-to-Many

In a "product sales" database, a customer's order may contain one or more products; and a product can
appear in many orders. In a "bookstore" database, a book is written by one or more authors; while an
author may write zero or more books. This kind of relationship is known as many-to-many.
Let's illustrate with a "product sales" database. We begin with two tables: Products and Orders. The
table products contains information about the products (such as name, description and
quantityInStock) with productID as its primary key. The table orders contains customer's orders
(customerID, dateOrdered, dateRequired and status). Again, we cannot store the items ordered
inside the Orders table, as we do not know how many columns to reserve for the items. We also cannot
store the order infomation in the Products table.

To support many-to-many relationship, we need to create a third table (known as a junction table), says
OrderDetails (or OrderLines), where each row represents an item of a particular order. For the
OrderDetails table, the primary key consists of two columns: orderID and productID, that uniquely
identify each row. The columns orderID and productID in OrderDetails table are used to reference
Orders and Products tables, hence, they are also the foreign keys in the OrderDetails table.

The many-to-many relationship is, in fact, implemented as two one-to-many relationships, with the
introduction of the junction table.

1. An order has many items in OrderDetails. An OrderDetails item belongs to one particular order.
2. A product may appears in many OrderDetails. Each OrderDetails item specified one product.

One-to-One

In a "product sales" database, a product may have optional supplementary information such as image,
moreDescription and comment. Keeping them inside the Products table results in many empty spaces
(in those records without these optional data). Furthermore, these large data may degrade the
performance of the database.

Instead, we can create another table (says ProductDetails, ProductLines or ProductExtras) to store
the optional data. A record will only be created for those products with optional data. The two tables,
Products and ProductDetails, exhibit a one-to-one relationship. That is, for every row in the parent
table, there is at most one row (possibly zero) in the child table. The same column productID should be
used as the primary key for both tables.

Some databases limit the number of columns that can be created inside a table. You could use a one-to-
one relationship to split the data into two tables. One-to-one relationship is also useful for storing certain
sensitive data in a secure table, while the non-sensitive ones in the main table.

Column Data Types

You need to choose an appropriate data type for each column. Commonly data types include: integers,
floating-point numbers, string (or text), date/time, binary, collection (such as enumeration and set).

Step 4: Refine & Normalize the Design

For example,

 adding more columns,


 create a new table for optional data using one-to-one relationship,
 split a large table into two smaller tables,
 others.

You might also like