Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Database PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 106

Contents

Articles
Database

Entityrelationship model

24

Database design

32

Relational database

35

Relational model

39

Binary relation

49

Database normalization

55

First normal form

63

Second normal form

67

Third normal form

70

BoyceCodd normal form

73

Fourth normal form

78

Fifth normal form

80

Domain/key normal form

83

Sixth normal form

85

Relation (database)

87

Functional dependency

88

Multivalued dependency

90

Join dependency

92

Concurrency control

93

References
Article Sources and Contributors

101

Image Sources, Licenses and Contributors

104

Article Licenses
License

105

Database

Database
A database is an organized collection of data. The data are typically organized to model relevant aspects of reality
(for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for
example, finding a hotel with vacancies).
The term database is correctly applied to the data and their supporting data structures, and not to the database
management system (DBMS). The database data collection with DBMS is called a database system.
The term database system implies that the data are managed to some level of quality (measured in terms of accuracy,
availability, usability, and resilience) and this in turn often implies the use of a general-purpose database
management system (DBMS).[1] A general-purpose DBMS is typically a complex software system that meets many
usage requirements to properly maintain its databases which are often large and complex. The utilization of
databases is now so widespread that virtually every technology and product relies on databases and DBMSs for its
development and commercialization, or even may have DBMS software embedded in it. Also, organizations and
companies, from small to large, depend heavily on databases for their operations.
Well known DBMSs include FoxPro, IBM DB2, Linter, Microsoft Access, Microsoft SQL Server, MySQL, Oracle,
PostgreSQL and SQLite. A database is not generally portable across different DBMS, but different DBMSs can
inter-operate to some degree by using standards like SQL and ODBC together to support a single application built
over more than one database. A DBMS also needs to provide effective run-time execution to properly support (e.g.,
in terms of performance, availability, and security) as many database end-users as needed.
A way to classify databases involves the type of their contents, for example: bibliographic, document-text, statistical,
or multimedia objects. Another way is by their application area, for example: accounting, music compositions,
movies, banking, manufacturing, or insurance.
The term database may be narrowed to specify particular aspects of organized collection of data and may refer to the
logical database, to the physical database as data content in computer data storage or to many other database
sub-definitions.

History
Database concept
The database concept has evolved since the 1960s to ease increasing difficulties in designing, building, and
maintaining complex information systems (typically with many concurrent end-users, and with a large amount of
diverse data). It has evolved together with database management systems which enable the effective handling of
databases. Though the terms database and DBMS define different entities, they are inseparable: a database's
properties are determined by its supporting DBMS. The Oxford English dictionary cites a 1962 technical report as
the first to use the term "data-base." With the progress in technology in the areas of processors, computer memory,
computer storage and computer networks, the sizes, capabilities, and performance of databases and their respective
DBMSs have grown in orders of magnitudes. For decades it has been unlikely that a complex information system
can be built effectively without a proper database supported by a DBMS. The utilization of databases is now spread
to such a wide degree that virtually every technology and product relies on databases and DBMSs for its
development and commercialization, or even may have such embedded in it. Also, organizations and companies,
from small to large, heavily depend on databases for their operations.
No widely accepted exact definition exists for DBMS. However, a system needs to provide considerable
functionality to qualify as a DBMS. Accordingly its supported data collection needs to meet respective usability
requirements (broadly defined by the requirements below) to qualify as a database. Thus, a database and its
supporting DBMS are defined here by a set of general requirements listed below. Virtually all existing mature

Database

DBMS products meet these requirements to a great extent, while less mature either meet them or converge to meet
them.

Evolution of database and DBMS technology


See also Database management system#History
The introduction of the term database coincided with the availability of direct-access storage (disks and drums) from
the mid-1960s onwards. The term represented a contrast with the tape-based systems of the past, allowing shared
interactive use rather than daily batch processing.
In the earliest database systems, efficiency was perhaps the primary concern, but it was already recognized that there
were other important objectives. One of the key aims was to make the data independent of the logic of application
programs, so that the same data could be made available to different applications.
The first generation of database systems were navigational,[2] applications typically accessed data by following
pointers from one record to another. The two main data models at this time were the hierarchical model, epitomized
by IBM's IMS system, and the Codasyl model (Network model), implemented in a number of products such as
IDMS.
The relational model, first proposed in 1970 by Edgar F. Codd, departed from this tradition by insisting that
applications should search for data by content, rather than by following links. This was considered necessary to
allow the content of the database to evolve without constant rewriting of applications. Relational systems placed
heavy demands on processing resources, and it was not until the mid 1980s that computing hardware became
powerful enough to allow them to be widely deployed. By the early 1990s, however, relational systems were
dominant for all large-scale data processing applications, and they remain dominant today (2012) except in niche
areas. The dominant database language is the standard SQL for the Relational model, which has influenced database
languages also for other data models.
Because the relational model emphasizes search rather than navigation, it does not make relationships between
different entities explicit in the form of pointers, but represents them rather using primary keys and foreign keys.
While this is a good basis for a query language, it is less well suited as a modeling language. For this reason a
different model, the entity-relationship model which emerged shortly later (1976), gained popularity for database
design.
In the period since the 1970s database technology has kept pace with the increasing resources becoming available
from the computing platform: notably the rapid increase in the capacity and speed (and reduction in price) of disk
storage, and the increasing capacity of main memory. This has enabled ever larger databases and higher throughputs
to be achieved.
The rigidity of the relational model, in which all data are held in tables with a fixed structure of rows and columns,
has increasingly been seen as a limitation when handling information that is richer or more varied in structure than
the traditional 'ledger-book' data of corporate information systems: for example, document databases, engineering
databases, multimedia databases, or databases used in the molecular sciences. Various attempts have been made to
address this problem, many of them gathering under banners such as post-relational or NoSQL. Two developments
of note are the object database and the XML database. The vendors of relational databases have fought off
competition from these newer models by extending the capabilities of their own products to support a wider variety
of data types.

Database
General-purpose DBMS
A DBMS has evolved into a complex software system and its development typically requires thousands of
person-years of development effort. Some general-purpose DBMSs, like Oracle, Microsoft SQL Server, FoxPro, and
IBM DB2, have been undergoing upgrades for thirty years or more. General-purpose DBMSs aim to satisfy as many
applications as possible, which typically makes them even more complex than special-purpose databases. However,
the fact that they can be used "off the shelf", as well as their amortized cost over many applications and instances,
makes them an attractive alternative (Vs. one-time development) whenever they meet an application's requirements.
Though attractive in many cases, a general-purpose DBMS is not always the optimal solution: When certain
applications are pervasive with many operating instances, each with many users, a general-purpose DBMS may
introduce unnecessary overhead and too large "footprint" (too large amount of unnecessary, unutilized software
code). Such applications usually justify dedicated development. Typical examples are email systems, though they
need to possess certain DBMS properties: email systems are built in a way that optimizes email messages handling
and managing, and do not need significant portions of a general-purpose DBMS functionality.
Types of people involved
Three types of people are involved with a general-purpose DBMS:
1. DBMS developers - These are the people that design and build the DBMS product, and the only ones who touch
its code. They are typically the employees of a DBMS vendor (e.g., Oracle, IBM, Microsoft, Sybase), or, in the
case of Open source DBMSs (e.g., MySQL), volunteers or people supported by interested companies and
organizations. They are typically skilled systems programmers. DBMS development is a complicated task, and
some of the popular DBMSs have been under development and enhancement (also to follow progress in
technology) for decades.
2. Application developers and database administrators - These are the people that design and build a
database-based application that uses the DBMS. The latter group members design the needed database and
maintain it. The first group members write the needed application programs which the application comprises.
Both are well familiar with the DBMS product and use its user interfaces (as well as usually other tools) for their
work. Sometimes the application itself is packaged and sold as a separate product, which may include the DBMS
inside (see embedded database; subject to proper DBMS licensing), or sold separately as an add-on to the DBMS.
3. Application's end-users (e.g., accountants, insurance people, medical doctors, etc.) - These people know the
application and its end-user interfaces, but need not know nor understand the underlying DBMS. Thus, though
they are the intended and main beneficiaries of a DBMS, they are only indirectly involved with it.
Database machines and appliances
In the 1970s and 1980s attempts were made to build database systems with integrated hardware and software. The
underlying philosophy was that such integration would provide higher performance at lower cost. Examples were
IBM System/38, the early offering of Teradata, and the Britton Lee, Inc. database machine. Another approach to
hardware support for database management was ICL's CAFS accelerator, a hardware disk controller with
programmable search capabilities. In the long term these efforts were generally unsuccessful because specialized
database machines could not keep pace with the rapid development and progress of general-purpose computers. Thus
most database systems nowadays are software systems running on general-purpose hardware, using general-purpose
computer data storage. However this idea is still pursued for certain applications by some companies like Netezza
and Oracle (Exadata).

Database

Database research
Database research has been an active and diverse area, with many specializations, carried out since the early days of
dealing with the database concept in the 1960s. It has strong ties with database technology and DBMS products.
Database research has taken place at research and development groups of companies (e.g., notably at IBM Research,
who contributed technologies and ideas virtually to any DBMS existing today), research institutes, and academia.
Research has been done both through theory and prototypes. The interaction between research and database related
product development has been very productive to the database area, and many related key concepts and technologies
emerged from it. Notable are the Relational and the Entity-relationship models, the atomic transaction concept and
related Concurrency control techniques, Query languages and Query optimization methods, RAID, and more.
Research has provided deep insight to virtually all aspects of databases, though not always has been pragmatic,
effective (and cannot and should not always be: research is exploratory in nature, and not always leads to accepted or
useful ideas). Ultimately market forces and real needs determine the selection of problem solutions and related
technologies, also among those proposed by research. However, occasionally, not the best and most elegant solution
wins (e.g., SQL). Along their history DBMSs and respective databases, to a great extent, have been the outcome of
such research, while real product requirements and challenges triggered database research directions and sub-areas.
The database research area has several notable dedicated academic journals (e.g., ACM Transactions on Database
Systems-TODS, Data and Knowledge Engineering-DKE, and more) and annual conferences (e.g., ACM SIGMOD,
ACM PODS, VLDB, IEEE ICDE, and more), as well as an active and quite heterogeneous (subject-wise) research
community all over the world.

Database type examples


The following are examples of various database types. Some of them are not main-stream types, but most of them
have received special attention (e.g., in research) due to end-user requirements. Some exist as specialized DBMS
products, and some have their functionality types incorporated in existing general-purpose DBMSs. Though may
differ in nature and functionality, these various types typically have to comply with the usability requirements below
to comply as databases.
Active database
An active database is a database that includes an event-driven architecture which can respond to
conditions both inside and outside the database. Possible uses include security monitoring, alerting,
statistics gathering and authorization.
Most modern relational databases include active database features in the form of database trigger.
Cloud database
A Cloud database is a database that relies on cloud technology. Both the database and most of its
DBMS reside remotely, "in the cloud," while its applications are both developed by programmers and
later maintained and utilized by (application's) end-users through a web browser and Open APIs. More
and more such database products are emerging, both of new vendors and by virtually all established
database vendors.
Data warehouse
Data warehouses archive data from operational databases and often from external sources such as
market research firms. Often operational data undergo transformation on their way into the warehouse,
getting summarized, anonymized, reclassified, etc. The warehouse becomes the central source of data
for use by managers and other end-users who may not have access to operational data. For example,
sales data might be aggregated to weekly totals and converted from internal product codes to use UPCs
so that they can be compared with ACNielsen data. Some basic and essential components of data
warehousing include retrieving, analyzing, and mining data, transforming,loading and managing data so

Database

5
as to make them available for further use.
Operations in a data warehouse are typically concerned with bulk data manipulation, and as such, it is
unusual and inefficient to target individual rows for update, insert or delete. Bulk native loaders for
input data and bulk SQL passes for aggregation are the norm.

Distributed database
The definition of a distributed database is broad, and may be utilized in different meanings. In general it
typically refers to a modular DBMS architecture that allows distinct DBMS instances to cooperate as a
single DBMS over processes, computers, and sites, while managing a single database distributed itself
over multiple computers, and different sites.
Examples are databases of local work-groups and departments at regional offices, branch offices,
manufacturing plants and other work sites. These databases can include both segments shared by
multiple sites, and segments specific to one site and used only locally in that site.
Document-oriented database
A document-oriented database is a computer program designed for storing, retrieving, and managing
document-oriented, or semi structured data, information. Document-oriented databases are one of the
main categories of so-called NoSQL databases and the popularity of the term "document-oriented
database" (or "document store") has grown with the use of the term NoSQL itself.
Utilized to conveniently store, manage, edit and retrieve documents.
Embedded database
An embedded database system is a DBMS which is tightly integrated with an application software that
requires access to stored data in a way that the DBMS is hidden from the applications end-user and
requires little or no ongoing maintenance. It is actually a broad technology category that includes
DBMSs with differing properties and target markets. The term "embedded database" can be confusing
because only a small subset of embedded database products is used in real-time embedded systems such
as telecommunications switches and consumer electronics devices.[3]
End-user database
These databases consist of data developed by individual end-users. Examples of these are collections of
documents, spreadsheets, presentations, multimedia, and other files. Several products exist to support
such databases. Some of them are much simpler than full fledged DBMSs, with more elementary DBMS
functionality (e.g., not supporting multiple concurrent end-users on a same database), with basic
programming interfaces, and a relatively small "foot-print" (not much code to run as in "regular"
general-purpose databases). However, also available general-purpose DBMSs can often be used for such
purpose, if they provide basic user-interfaces for straightforward database applications (limited query
and data display; no real programming needed), while still enjoying the database qualities and
protections that these DBMSs can provide.
Federated database and multi-database
A federated database is an integrated database that comprises several distinct databases, each with its
own DBMS. It is handled as a single database by a federated database management system (FDBMS),
which transparently integrates multiple autonomous DBMSs, possibly of different types (which makes it
a heterogeneous database), and provides them with an integrated conceptual view. The constituent
databases are interconnected via computer network, and may be geographically decentralized.
Sometime the term multi-database is used as a synonym to federated database, though it may refer to a
less integrated (e.g., without an FDBMS and a managed integrated schema) group of databases that
cooperate in a single application. In this case typically middleware for distribution is used which

Database

6
typically includes an atomic commit protocol (ACP), e.g., the two-phase commit protocol, to allow
distributed (global) transactions (vs. local transactions confined to a single DBMS) across the
participating databases.

Graph database
A graph database is a kind of NoSQL database that uses graph structures with nodes, edges, and
properties to represent and store information. General graph databases that can store any graph are
distinct from specialized graph databases such as triplestores and network databases.
Hypermedia databases
The World Wide Web can be thought of as a database, albeit one spread across millions of independent
computing systems. Web browsers "process" these data one page at a time, while web crawlers and
other software provide the equivalent of database indexes to support search and other activities.
Hypertext database
In a Hypertext database, any word or a piece of text representing an object, e.g., another piece of text, an
article, a picture, or a film, can be linked to that object. Hypertext databases are particularly useful for
organizing large amounts of disparate information. For example they are useful for organizing online
encyclopedias, where users can conveniently jump in the texts, in a controlled way, by using hyperlinks.
In-memory database
An in-memory database (IMDB; also main memory database or MMDB) is a database that primarily
resides in main memory, but typically backed-up by non-volatile computer data storage. Main memory
databases are faster than disk databases. Accessing data in memory reduces the I/O reading activity
when, for example, querying the data. In applications where response time is critical, such as
telecommunications network equipment, main memory databases are often used.[4]
Knowledge base
A knowledge base (abbreviated KB, kb or [5][6]) is a special kind of database for knowledge
management, providing the means for the computerized collection, organization, and retrieval of
knowledge. Also a collection of data representing problems with their solutions and related experiences.
Operational database
These databases store detailed data about the operations of an organization. They are typically organized
by subject matter, process relatively high volumes of updates using transactions. Essentially every major
organization on earth uses such databases. Examples include customer databases that record contact,
credit, and demographic information about a business' customers, personnel databases that hold
information such as salary, benefits, skills data about employees, Enterprise resource planning that
record details about product components, parts inventory, and financial databases that keep track of the
organization's money, accounting and financial dealings.
Parallel database
A parallel database, run by a parallel DBMS, seeks to improve performance through parallelization for
tasks such as loading data, building indexes and evaluating queries. Parallel databases improve
processing and input/output speeds by using multiple central processing units (CPUs) (including
multi-core processors) and storage in parallel. In parallel processing, many operations are performed
simultaneously, as opposed to serial, sequential processing, where operations are performed with no
time overlap.
The major parallel DBMS architectures (which are induced by the underlying hardware architecture are:
Shared memory architecture, where multiple processors share the main memory space, as well as other
data storage.

Database

7
Shared disk architecture, where each processing unit (typically consisting of multiple processors) has its
own main memory, but all units share the other storage.
Shared nothing architecture, where each processing unit has its own main memory and other storage.

Real-time database
If a DBMS system responses users' request in a given time period, it can be regarded as a real time database.
Spatial database
A spatial database can store the data with multidimensional features. The queries on such data include location based
queries, like "where is the closest hotel in my area".
Temporal database
A temporal database is a database with built-in time aspects, for example a temporal data model and a temporal
version of Structured Query Language (SQL). More specifically the temporal aspects usually include valid-time and
transaction-time.
Unstructured-data database
An unstructured-data database is intended to store in a manageable and protected way diverse objects
that do not fit naturally and conveniently in common databases. It may include email messages,
documents, journals, multimedia objects etc. The name may be misleading since some objects can be
highly structured. However, the entire possible object collection does not fit into a predefined structured
framework. Most established DBMSs now support unstructured data in various ways, and new dedicated
DBMSs are emerging.

Major database usage requirements


The major purpose of a database is to provide the information system (in its broadest sense) that utilizes it with the
information the system needs according to its own requirements. A certain broad set of requirements refines this
general goal. These database requirements translate to requirements for the respective DBMS, to allow conveniently
building a proper database for the given application. If this goal is met by a DBMS, then the designers and builders
of the specific database can concentrate on the application's aspects, and not deal with building and maintaining the
underlying DBMS. Also, since a DBMS is complex and expensive to build and maintain, it is not economical to
build such a new tool (DBMS) for every application. Rather it is desired to provide a flexible tool for handling
databases for as many as possible given applications, i.e., a general-purpose DBMS.

Functional requirements
Certain general functional requirements need to be met in conjunction with a database. They describe what is needed
to be defined in a database for any specific application.
The database's structure must be defined. The database needs to be based on a data model that is sufficiently rich to
describe in the database all the needed respective application's aspects. Data definition languages exist to describe
the databases within the data model. Such languages are typically data model specific.
A database data model needs support by a sufficiently rich data manipulation language to allow database
manipulations, and for information to be generated from the data. Such language is typically data model specific.
A database needs built-in security means to protect its content (and users) from dangers of unauthorized users (either
humans or programs). Protection is also provided from types of unintentional breach. Security types and levels
should be defined by the database owners.
Manipulating database data often involves processes of several interdependent steps, at different times (e.g., when
different people's interactions are involved; e.g., generating an insurance policy). Data manipulation languages are
typically intended to describe what is needed in a single such step. Dealing with multiple steps typically requires

Database
writing quite complex programs. Most applications are programmed using common programming languages and
software development tools. However the area of process description has evolved in the frameworks of workflow and
business processes with supporting languages and software packages which considerably simplify the tasks.
Traditionally these frameworks have been out of the scope of common DBMSs, but utilization of them has become
common-place, and often they are provided as add-ons to DBMSs.

Operational requirements
Operational requirements are needed to be met by a database in order to effectively support an application when
operational. Though it typically may be expected that operational requirements are automatically met by a DBMS, in
fact it is not so in most of the cases: To be met substantial work of design and tuning is typically needed by database
administrators. This is typically done by specific instructions/operations through special database user interfaces and
tools, and thus may be viewed as secondary functional requirements (which are not less important than the primary).
Availability
A DB should maintain needed levels of availability, i.e., the DB needs to be available in a way that a user's action
does not need to wait beyond a certain time range before starting executing upon the DB. Availability also relates to
failure and recovery from it (see Recovery from failure and disaster below): Upon failure and during recovery
normal availability changes, and special measures are needed to satisfy availability requirements.
Performance
Users' actions upon the DB should be executed within needed time ranges.
Isolation between users
When multiple users access the database concurrently the actions of a user should be uninterrupted and unaffected by
actions of other users. These concurrent actions should maintain the DB's consistency (i.e., keep the DB from
corruption).
Recovery from failure and disaster
All computer systems, including DBMSs, are prone to failures for many reasons (both software and hardware
related). Failures typically corrupt the DB, typically to the extent that it is impossible to repair it without special
measures. The DBMS should provide automatic recovery from failure procedures that repair the DB and return it to
a well defined state.
Backup and restore
Sometimes it is desired to bring a database back to a previous state (for many reasons, e.g., cases when the database
is found corrupted due to a software error, or if it has been updated with erroneous data). To achieve this a backup
operation is done occasionally or continuously, where each desired database state (i.e., the values of its data and their
embedding in database's data structures) is kept within dedicated backup files (many techniques exist to do this
effectively). When this state is needed, i.e., when it is decided by a database administrator to bring the database back
to this state (e.g., by specifying this state by a desired point in time when the database was in this state), these files
are utilized to restore that state.

Database
Data independence
Data independence pertains to a database's life cycle (see Database building, maintaining, and tuning below). It
strongly impacts the convenience and cost of maintaining an application and its database, and has been the major
motivation for the emergence and success of the Relational model, as well as the convergence to a common database
architecture. In general the term "data independence" means that changes in the database's structure do not require
changes in its application's computer programs, and that changes in the database at a certain architectural level (see
below) do not affect the database's levels above. Data independence is achieved to a great extent in contemporary
DBMS, but it is not completely attainable, and achieved at different degrees for different types of database structural
changes.

Major database functional areas


The functional areas are domains and subjects that have evolved in order to provide proper answers and solutions to
the functional requirements above.

Data models
A data model is an abstract structure that provides the means to effectively describe specific data structures needed
to model an application. As such a data model needs sufficient expressive power to capture the needed aspects of
applications. These applications are often typical to commercial companies and other organizations (like
manufacturing, human-resources, stock, banking, etc.). For effective utilization and handling it is desired that a data
model is relatively simple and intuitive. This may be in conflict with high expressive power needed to deal with
certain complex applications. Thus any popular general-purpose data model usually well balances between being
intuitive and relatively simple, and very complex with high expressive power. The application's semantics is usually
not explicitly expressed in the model, but rather implicit (and detailed by documentation external to the model) and
hinted to by data item types' names (e.g., "part-number") and their connections (as expressed by generic data
structure types provided by each specific model).
Early data models
These models were popular in the 1960s, 1970s, but nowadays can be found primarily in old legacy systems. They
are characterized primarily by being navigational with strong connections between their logical and physical
representations, and deficiencies in data independence.
Hierarchical model
In the Hierarchical model different record types (representing real-world entities) are embedded in a predefined
hierarchical (tree-like) structure. This hierarchy is used as the physical order of records in storage. Record access is
done by navigating through the data structure using pointers combined with sequential accessing.
This model has been supported primarily by the IBM IMS DBMS, one of the earliest DBMSs. Various limitations of
the model have been compensated at later IMS versions by additional logical hierarchies imposed on the base
physical hierarchy.
Network model
In this model a hierarchical relationship between two record types (representing real-world entities) is established by
the set construct. A set consists of circular linked lists where one record type, the set owner or parent, appears once
in each circle, and a second record type, the subordinate or child, may appear multiple times in each circle. In this
way a hierarchy may be established between any two record types, e.g., type A is the owner of B. At the same time
another set may be defined where B is the owner of A. Thus all the sets comprise a general directed graph
(ownership defines a direction), or network construct. Access to records is either sequential (usually in each record

Database
type) or by navigation in the circular linked lists.
This model is more general and powerful than the hierarchical, and has been the most popular before being replaced
by the Relational model. It has been standardized by CODASYL. Popular DBMS products that utilized it were
Cincom Systems' Total and Cullinet's IDMS. IDMS gained a considerable customer base and exists and supported
until today. In the 1980s it has adopted the Relational model and SQL in addition to its original tools and languages.
Inverted file model
An inverted file or inverted index of a first file, by a field in this file (the inversion field), is a second file in which
this field is the key. A record in the second file includes a key and pointers to records in the first file where the
inversion field has the value of the key. This is also the logical structure of contemporary database indexes. The
related Inverted file data model utilizes inverted files of primary database files to efficiently directly access needed
records in these files.
Notable for using this data model is the ADABAS DBMS of Software AG, introduced in 1970. ADABAS has
gained considerable customer base and exists and supported until today. In the 1980s it has adopted the Relational
model and SQL in addition to its original tools and languages.
Relational model
The relational model is a simple model that provides flexibility. It organizes data based on two-dimensional arrays
known as relations, or tables as related to databases. These relations consist of a heading and a set of zero or more
tuples in arbitrary order. The heading is an unordered set of zero or more attributes, or columns of the table. The
tuples are a set of unique attributes mapped to values, or the rows of data in the table. Data can be associated across
multiple tables with a key. A key is a single, or set of multiple, attribute(s) that is common to both tables. The most
common language associated with the relational model is the Structured Query Language (SQL), though it differs in
some places.
Object model
In recent years, the object-oriented paradigm has been applied in areas such as engineering and spatial databases,
telecommunications and in various scientific domains. The conglomeration of object oriented programming and
database technology led to this new kind of database. These databases attempt to bring the database world and the
application-programming world closer together, in particular by ensuring that the database uses the same type system
as the application program. This aims to avoid the overhead (sometimes referred to as the impedance mismatch) of
converting information between its representation in the database (for example as rows in tables) and its
representation in the application program (typically as objects). At the same time, object databases attempt to
introduce key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.
A variety of these ways have been tried for storing objects in a database. Some products have approached the
problem from the application-programming side, by making the objects manipulated by the program persistent. This
also typically requires the addition of some kind of query language, since conventional programming languages do
not provide language-level functionality for finding objects based on their information content. Others have attacked
the problem from the database end, by defining an object-oriented data model for the database, and defining a
database programming language that allows full programming capabilities as well as traditional query facilities.

10

Database

11

Other database models


Products offering a more general data model than the relational model are sometimes classified as post-relational.[7]
Alternate terms include "hybrid database", "Object-enhanced RDBMS" and others. The data model in such products
incorporates relations but is not constrained by E.F. Codd's Information Principle, which requires that
all information in the database must be cast explicitly in terms of values in relations and in no other way[8]
Some of these extensions to the relational model integrate concepts from technologies that pre-date the relational
model. For example, they allow representation of a directed graph with trees on the nodes. The German company
sones implements this concept in its GraphDB.
Some post-relational products extend relational systems with non-relational features. Others arrived in much the
same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are
historically pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational.
The resource space model (RSM) is a non-relational data model based on multi-dimensional classification.[9]

Database languages
Database languages are dedicated programming languages, tailored and utilized to
define a database (i.e., its specific data types and the relationships among them),
manipulate its content (e.g., insert new data occurrences, and update or delete existing ones), and
query it (i.e., request information: compute and retrieve any information based on its data).
Database languages are data-model-specific, i.e., each language assumes and is based on a certain structure of the
data (which typically differs among different data models). They typically have commands to instruct execution of
the desired operations in the database. Each such command is equivalent to a complex expression (program) in a
regular programming language, and thus programming in dedicated (database) languages simplifies the task of
handling databases considerably. An expressions in a database language is automatically transformed (by a compiler
or interpreter, as regular programming languages) to a proper computer program that runs while accessing the
database and providing the needed results. The following are notable examples:
SQL for the Relational model
A major Relational model language supported by all the relational DBMSs and a standard.
SQL was one of the first commercial languages for the relational model. Despite not adhering to the relational model
as described by Codd, it has become the most widely used database language.[10][11] Though often described as, and
to a great extent is a declarative language, SQL also includes procedural elements. SQL became a standard of the
American National Standards Institute (ANSI) in 1986, and of the International Organization for Standards (ISO) in
1987. Since then the standard has been enhanced several times with added features. However, issues of SQL code
portability between major RDBMS products still exist due to lack of full compliance with, or different interpretations
of the standard. Among the reasons mentioned are the large size, and incomplete specification of the standard, as
well as vendor lock-in.

Database
OQL for the Object model
An object model language standard (by the Object Data Management Group) that has influenced the design of some
of the newer query languages like JDOQL and EJB QL, though they cannot be considered as different flavors of
OQL.
XQuery for the XML model
XQuery is an XML based database language (also named XQL). SQL/XML combines XQuery and XML with
SQL.[12]

Database architecture
Database architecture (to be distinguished from DBMS architecture; see below) may be viewed, to some extent, as
an extension of data modeling. It is used to conveniently answer requirements of different end-users from a same
database, as well as for other benefits. For example, a financial department of a company needs the payment details
of all employees as part of the company's expenses, but not other many details about employees, that are the interest
of the human resources department. Thus different departments need different views of the company's database, that
both include the employees' payments, possibly in a different level of detail (and presented in different visual forms).
To meet such requirement effectively database architecture consists of three levels: external, conceptual and
internal. Clearly separating the three levels was a major feature of the relational database model implementations
that dominate 21st century databases.[13]
The external level defines how each end-user type understands the organization of its respective relevant data in
the database, i.e., the different needed end-user views. A single database can have any number of views at the
external level.
The conceptual level unifies the various external views into a coherent whole, global view.[13] It provides the
common-denominator of all the external views. It comprises all the end-user needed generic data, i.e., all the data
from which any view may be derived/computed. It is provided in the simplest possible way of such generic data,
and comprises the back-bone of the database. It is out of the scope of the various database end-users, and serves
database application developers and defined by database administrators that build the database.
The Internal level (or Physical level) is as a matter of fact part of the database implementation inside a DBMS
(see Implementation section below). It is concerned with cost, performance, scalability and other operational
matters. It deals with storage layout of the conceptual level, provides supporting storage-structures like indexes, to
enhance performance, and occasionally stores data of individual views (materialized views), computed from
generic data, if performance justification exists for such redundancy. It balances all the external views'
performance requirements, possibly conflicting, in attempt to optimize the overall database usage by all its
end-uses according to the database goals and priorities.
All the three levels are maintained and updated according to changing needs by database administrators who often
also participate in the database design.
The above three-level database architecture also relates to and being motivated by the concept of data independence
which has been described for long time as a desired database property and was one of the major initial driving forces
of the Relational model. In the context of the above architecture it means that changes made at a certain level do not
affect definitions and software developed with higher level interfaces, and are being incorporated at the higher level
automatically. For example, changes in the internal level do not affect application programs written using conceptual
level interfaces, which saves substantial change work that would be needed otherwise.
In summary, the conceptual is a level of indirection between internal and external. On one hand it provides a
common view of the database, independent of different external view structures, and on the other hand it is
uncomplicated by details of how the data are stored or managed (internal level). In principle every level, and even
every external view, can be presented by a different data model. In practice usually a given DBMS uses the same

12

Database
data model for both the external and the conceptual levels (e.g., relational model). The internal level, which is hidden
inside the DBMS and depends on its implementation (see Implementation section below), requires a different level
of detail and uses its own data structure types, typically different in nature from the structures of the external and
conceptual levels which are exposed to DBMS users (e.g., the data models above): While the external and
conceptual levels are focused on and serve DBMS users, the concern of the internal level is effective implementation
details.

Database security
Database security deals with all various aspects of protecting the database content, its owners, and its users. It ranges
from protection from intentional unauthorized database uses to unintentional database accesses by unauthorized
entities (e.g., a person or a computer program).
The following are major areas of database security (among many others).
Access control
Database access control deals with controlling who (a person or a certain computer program) is allowed to access
what information in the database. The information may comprise specific database objects (e.g., record types,
specific records, data structures), certain computations over certain objects (e.g., query types, or specific queries), or
utilizing specific access paths to the former (e.g., using specific indexes or other data structures to access
information).
Database access controls are set by special authorized (by the database owner) personnel that uses dedicated
protected security DBMS interfaces.
Data security
The definition of data security varies and may overlap with other database security aspects. Broadly it deals with
protecting specific chunks of data, both physically (i.e., from corruption, or destruction, or removal; e.g., see
Physical security), or the interpretation of them, or parts of them to meaningful information (e.g., by looking at the
strings of bits that they comprise, concluding specific valid credit-card numbers; e.g., see Data encryption).
Database audit
Database audit primarily involves monitoring that no security breach, in all aspects, has happened. If security breach
is discovered then all possible corrective actions are taken.

Database design
Database design is done before building it to meet needs of end-users within a given application/information-system
that the database is intended to support. The database design defines the needed data and data structures that such a
database comprises. A design is typically carried out according to the common three architectural levels of a
database (see Database architecture above). First, the conceptual level is designed, which defines the over-all
picture/view of the database, and reflects all the real-world elements (entities) the database intends to model, as well
as the relationships among them. On top of it the external level, various views of the database, are designed
according to (possibly completely different) needs of specific end-user types. More external views can be added
later. External views requirements may modify the design of the conceptual level (i.e., add/remove entities and
relationships), but usually a well designed conceptual level for an application well supports most of the needed
external views. The conceptual view also determines the internal level (which primarily deals with data layout in
storage) to a great extent. External views requirement may add supporting storage structures, like materialized views
and indexes, for enhanced performance. Typically the internal layer is optimized for top performance, in an average
way that takes into account performance requirements (possibly conflicting) of different external views according to

13

Database
their relative importance. While the conceptual and external levels design can usually be done independently of any
DBMS (DBMS-independent design software packages exist, possibly with interfaces to some specific popular
DBMSs), the internal level design highly relies on the capabilities and internal data structure of the specific DBMS
utilized (see the Implementation section below).
A common way to carry out conceptual level design is to use the entity-relationship model (ERM) (both the basic
one, and with possible enhancement that it has gone over), since it provides a straightforward, intuitive perception of
an application's elements and semantics. An alternative approach, which preceded the ERM, is using the Relational
model and dependencies (mathematical relationships) among data to normalize the database, i.e., to define the
("optimal") relations (data record or tupple types) in the database. Though a large body of research exists for this
method it is more complex, less intuitive, and not more effective than the ERM method. Thus normalization is less
utilized in practice than the ERM method.
The ERM may be less subtle than normalization in several aspects, but it captures the main needed dependencies
which are induced by keys/identifiers of entities and relationships. Also the ERM inherently includes the important
inclusion dependencies (i.e., an entity instance that does not exist (has not been explicitly inserted) cannot appear in
a relationship with other entities) which usually have been ignored in normalization.[14] In addition the ERM allows
entity type generalization (the Is-a relationship) and implied property (attribute) inheritance (similarly to the that
found in the object model).
Another aspect of database design is its security. It involves both defining access control to database objects (e.g.,
Entities, Views) as well as defining security levels and methods for the data themselves (See Database security
above).
Entities and relationships
The most common database design methods are based on the entity relationship model (ERM, or ER model). This
model views the world in a simplistic but very powerful way: It consists of "Entities" and the "Relationships" among
them. Accordingly a database consists of entity and relationship types, each with defined attributes (field types) that
model concrete entities and relationships. Modeling a database in this way typically yields an effective one with
desired properties (as in some normal forms; see normalization below). Such models can be translated to any other
data model required by any specific DBMS for building an effective database.
Database normalization
In the design of a relational database, the process of organizing database relations to minimize redundancy is called
normalization. The goal is to produce well-structured relations so that additions, deletions, and modifications of a
field can be made in just one relation (table) without worrying about appearance and update of the same field in
other relations. The process is algorithmic and based on dependencies (mathematical relations) that exist among
relations' field types. The process result is bringing the database relations into a certain "normal form". Several
normal forms exist with different properties.

Database building, maintaining, and tuning


After designing a database for an application arrives the stage of building the database. Typically an appropriate
general-purpose DBMS can be selected to be utilized for this purpose. A DBMS provides the needed user interfaces
to be utilized by database administrators to define the needed application's data structures within the DBMS's
respective data model. Other user interfaces are used to select needed DBMS parameters (like security related,
storage allocation parameters, etc.).
When the database is ready (all its data structures and other needed components are defined) it is typically populated
with initial application's data (database initialization, which is typically a distinct project; in many cases using
specialized DBMS interfaces that support bulk insertion) before making it operational. In some cases the database

14

Database

15

becomes operational while empty from application's data, and data are accumulated along its operation.
After completing building the database and making it operational arrives the database maintenance stage: Various
database parameters may need changes and tuning for better performance, application's data structures may be
changed or added, new related application programs may be written to add to the application's functionality, etc.

Miscellaneous areas
Database migration between DBMSs
See also Database migration in Data migration
A database built with one DBMS is not portable to another DBMS (i.e., the other DBMS cannot run it). However, in
some situations it is desirable to move, migrate a database from one DBMS to another. The reasons are primarily
economical (different DBMSs may have different total costs of ownership-TCO), functional, and operational
(different DBMSs may have different capabilities). The migration involves the database's transformation from one
DBMS type to another. The transformation should maintain (if possible) the database related application (i.e., all
related application programs) intact. Thus, the database's conceptual and external architectural levels should be
maintained in the transformation. It may be desired that also some aspects of the architecture internal level are
maintained. A complex or large database migration may be a complicated and costly (one-time) project by itself,
which should be factored into the decision to migrate. This in spite of the fact that tools may exist to help migration
between specific DBMS. Typically a DBMS vendor provides tools to help importing databases from other popular
DBMSs.

Implementation: Database management systems


or How database usage requirements are met A database management system (DBMS) is a system that allows to
build and maintain databases, as well as to utilize their data and retrieve information from it. A DBMS implements
solutions (see Major functional areas above) to the database usability requirements above. It defines the database
type that it supports, as well as its functionality and operational capabilities. A DBMS provides the internal processes
for external applications built on them. The end-users of some such specific application are usually exposed only to
that application and do not directly interact with the DBMS. Thus end-users enjoy the effects of the underlying
DBMS, but its internals are completely invisible to end-users. Database designers and database administrators
interact with the DBMS through dedicated interfaces to build and maintain the applications' databases, and thus need
some more knowledge and understanding about how DBMSs operate and the DBMSs' external interfaces and tuning
parameters.
A DBMS consists of software that operates databases, providing storage, access, security, backup and other facilities
to meet needed requirements. DBMSs can be categorized according to the database model(s) that they support, such
as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, the query
language(s) that access the database, such as SQL or XQuery, performance trade-offs, such as maximum scale or
maximum speed or others. Some DBMSs cover more than one entry in these categories, e.g., supporting multiple
query languages. Database software typically support the Open Database Connectivity (ODBC) standard which
allows the database to integrate (to some extent) with other databases.
The development of a mature general-purpose DBMS typically takes several years and many man-years. Developers
of DBMS typically update their product to follow and take advantage of progress in computer and storage
technologies. Several DBMS products like Oracle and IBM DB2 have been in on-going development since the
1970s-1980s. Since DBMSs comprise a significant economical market, computer and storage vendors often take into
account DBMS requirements in their own development plans.

Database

16

DBMS architecture: major DBMS components


DBMS architecture specifies its components (including descriptions of their functions) and their interfaces. DBMS
architecture is distinct from database architecture. The following are major DBMS components:
DBMS external interfaces - They are the means to communicate with the DBMS (both ways, to and from the
DBMS) to perform all the operations needed for the DBMS. These can be operations on a database, or operations
to operate and manage the DBMS. For example:
- Direct database operations: defining data types, assigning security levels, updating data, querying the
database, etc.
- Operations related to DBMS operation and management: backup and restore, database recovery,
security monitoring, database storage allocation and database layout configuration monitoring,
performance monitoring and tuning, etc.
An external interface can be either a user interface (e.g., typically for a database administrator), or an
application programming interface (API) used for communication between an application program and the
DBMS.
Database language engines (or processors) - Most operations upon databases are performed through expression
in Database languages (see above). Languages exist for data definition, data manipulation and queries (e.g., SQL),
as well as for specifying various aspects of security, and more. Language expressions are fed into a DBMS
through proper interfaces. A language engine processes the language expressions (by a compiler or language
interpreter) to extract the intended database operations from the expression in a way that they can be executed by
the DBMS.
Query optimizer - Performs query optimization on every query to choose for it the most efficient query plan (a
partial order (tree) of operations) to be executed to compute the query result.
Database engine - Performs the received database operations on the database objects, typically at their
higher-level representation.
Storage engine - translates the operations to low-level operations on the storage bits. In some references the
Storage engine is viewed as part of the Database engine.
Transaction engine - for correctness and reliability purposes most DBMS internal operations are performed
encapsulated in transactions (see below). Transactions can also be specified externally to the DBMS to
encapsulate a group of operations. The transaction engine tracks all the transactions and manages their execution
according to the transaction rules (e.g., proper concurrency control, and proper commit or abort for each).
DBMS management and operation component - Comprises many components that deal with all the DBMS
management and operational aspects like performance monitoring and tuning, backup and restore, recovery from
failure, security management and monitoring, database storage allocation and database storage layout monitoring,
etc.

Database storage
Database storage is the container of the physical materialization of a database. It comprises the Internal (physical)
level in the database architecture. It also contains all the information needed (e.g., metadata, "data about the data",
and internal data structures) to reconstruct the Conceptual level and External level from the Internal level when
needed. It is not part of the DBMS but rather manipulated by the DBMS (by its Storage engine; see above) to
manage the database that resides in it. Though typically accessed by a DBMS through the underlying Operating
system (and often utilizing the operating systems' File systems as intermediates for storage layout), storage
properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are
closely maintained by database administrators. A DBMS, while in operation, always has its database residing in
several types of storage (e.g., memory and external storage). The database data and the additional needed
information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that

Database
look completely different from the way the data look in the conceptual and external levels, but in ways that attempt
to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for
computing additional types of needed information from the data (e.g., when querying the database).
In principle the database storage can be viewed as a linear address space, where every bit of data has its unique
address in this address space. Practically only a very small percentage of addresses is kept as initial reference points
(which also requires storage), and most of the database data are accessed by indirection using displacement
calculations (distance in bits from the reference points) and data structures which define access paths (using pointers)
to all needed data in an effective manner, optimized for the needed data access operations.
Data
Coding the data and Error-correcting codes
Data are encoded by assigning a bit pattern to each language alphabet character, digit, other numerical patterns,
and multimedia object. Many standards exist for encoding (e.g., ASCII, JPEG, MPEG-4).
By adding bits to each encoded unit, the redundancy allows both to detect errors in coded data and to correct them
based on mathematical algorithms. Errors occur regularly in low probabilities due to random bit value flipping, or
"physical bit fatigue," loss of the physical bit in storage its ability to maintain distinguishable value (0 or 1), or
due to errors in inter or intra-computer communication. A random bit flip (e.g., due to random radiation) is
typically corrected upon detection. A bit, or a group of malfunctioning physical bits (not always the specific
defective bit is known; group definition depends on specific storage device) is typically automatically fenced-out,
taken out of use by the device, and replaced with another functioning equivalent group in the device, where the
corrected bit values are restored (if possible). The Cyclic redundancy check (CRC) method is typically used in
storage for error detection.
Data compression
Data compression methods allow in many cases to represent a string of bits by a shorter bit string ("compress") and
reconstruct the original string ("decompress") when needed. This allows to utilize substantially less storage (tens of
percents) for many types of data at the cost of more computation (compress and decompress when needed). Analysis
of trade-off between storage cost saving and costs of related computations and possible delays in data availability is
done before deciding whether to keep certain data in a database compressed or not.
Data compression is typically controlled through the DBMS's data definition interface, but in some cases may be a
default and automatic.
Data encryption
For security reasons certain types of data (e.g., credit-card information) may be kept encrypted in storage to prevent
the possibility of unauthorized information reconstruction from chunks of storage snapshots (taken either via
unforeseen vulnerabilities in a DBMS, or more likely, by bypassing it).
Data encryption is typically controlled through the DBMS's data definition interface, but in some cases may be a
default and automatic.

17

Database
Data storage types
This collection of bits describes both the contained database data and their related metadata (i.e., data that describe
the contained data and allows computer programs to manipulate the database data correctly). The size of a database
can nowadays be tens of Terabytes, where a byte is eight bits. The physical materialization of a bit can employ
various existing technologies, while new and improved technologies are constantly under development. Common
examples are:
Magnetic medium (e.g., in Magnetic disk) - Orientation of magnetic field in magnetic regions on a surface of
material (two orientation directions, for 0 and 1).
Dynamic random-access memory (DRAM) - State of a miniature electronic circuit consisting of few transistors
(among millions nowadays) in an integrated circuit (two states for 0 and 1).
These two examples are respectively for two major storage types:
Nonvolatile storage can maintain its bit states (0s and 1s) without electrical power supply, or when power supply
is interrupted;
Volatile storage loses its bit values when power supply is interrupted (i.e., its content is erased).
Sophisticated storage units, which can, in fact, be effective dedicated parallel computers that support a large amount
of nonvolatile storage, typically must include also components with volatile storage. Some such units employ
batteries that can provide power for several hours in case of external power interruption (e.g., see the EMC
Symmetrix) and thus maintain the content of the volatile storage parts intact. Just before such a device's batteries lose
their power the device typically automatically backs-up its volatile content portion (into nonvolatile) and shuts off to
protect its data.
Databases are usually too expensive (in terms of importance and needed investment in resources, e.g., time, money,
to build them) to be lost by a power interruption. Thus at any point in time most of their content resides in
nonvolatile storage. Even if for operational reason very large portions of them reside in volatile storage (e.g., tens of
Gigabytes in volatile memory, for in-memory databases), most of this is backed-up in nonvolatile storage. A
relatively small portion of this, which temporarily may not have nonvolatile backup, can be reconstructed by proper
automatic database recovery procedures after volatile storage content loss.
More examples of storage types:
Volatile storage can be found in processors, computer memory (e.g., DRAM), etc.
Non-volatile storage types include ROM, EPROM, Hard disk drives, Flash memory and drives, Storage arrays,
etc.
Storage metrics
Databases always use several types of storage when operational (and implied several when idle). Different types may
significantly differ in their properties, and the optimal mix of storage types is determined by the types and quantities
of operations that each storage type needs to perform, as well as considerations like physical space and energy
consumption and dissipation (which may become critical for a large database). Storage types can be categorized by
the following attributes:

Volatile/Nonvolatile.
Cost of the medium (e.g., per Megabyte), Cost to operate (cost of energy consumed per unit time).
Access speed (e.g., bytes per second).
Granularity from fine to coarse (e.g., size in bytes of access operation).
Reliability (the probability of spontaneous bit value change under various conditions).

Maximal possible number of writes (of any specific bit or specific group of bits; could be constrained by the
technology used (e.g., "write once" or "write twice"), or due to "physical bit fatigue," loss of ability to distinguish
between the 0, 1 states due to many state changes (e.g., in Flash memory)).

18

Database

19

Power needed to operate (Energy per time; energy per byte accessed), Energy efficiency, Heat to dissipate.
Packaging density (e.g., realistic number of bytes per volume unit)
Protecting storage device content: Device mirroring (replication) and RAID
See also Disk storage replication
While a group of bits malfunction may be resolved by error detection and correction mechanisms (see above),
storage device malfunction requires different solutions. The following solutions are commonly used and valid for
most storage devices:
Device mirroring (replication) - A common solution to the problem is constantly maintaining an identical copy
of device content on another device (typically of a same type). The downside is that this doubles the storage, and
both devices (copies) need to be updated simultaneously with some overhead and possibly some delays. The
upside is possible concurrent read of a same data group by two independent processes, which increases
performance. When one of the replicated devices is detected to be defective, the other copy is still operational,
and is being utilized to generate a new copy on another device (usually available operational in a pool of stand-by
devices for this purpose).
Redundant array of independent disks (RAID) - This method generalizes the device mirroring above by
allowing one device in a group of N devices to fail and be replaced with content restored (Device mirroring is
RAID with N=2). RAID groups of N=5 or N=6 are common. N>2 saves storage, when comparing with N=2, at
the cost of more processing during both regular operation (with often reduced performance) and defective device
replacement.
Device mirroring and typical RAID are designed to handle a single device failure in the RAID group of devices.
However, if a second failure occurs before the RAID group is completely repaired from the first failure, then data
can be lost. The probability of a single failure is typically small. Thus the probability of two failures in a same RAID
group in time proximity is much smaller (approximately the probability squared, i.e., multiplied by itself). If a
database cannot tolerate even such smaller probability of data loss, then the RAID group itself is replicated
(mirrored). In many cases such mirroring is done geographically remotely, in a different storage array, to handle also
recovery from disasters (see disaster recovery above).
Database storage layout
Database bits are laid-out in storage in data-structures and grouping that can take advantage of both known effective
algorithms to retrieve and manipulate them and the storage own properties. Typically the storage itself is design to
meet requirements of various areas that extensively utilize storage, including databases. A DBMS in operation
always simultaneously utilizes several storage types (e.g., memory, and external storage), with respective layout
methods.
Database storage hierarchy
A database, while in operation, resides simultaneously in several types of storage. By the nature of contemporary
computers most of the database part inside a computer that hosts the DBMS resides (partially replicated) in volatile
storage. Data (pieces of the database) that are being processed/manipulated reside inside a processor, possibly in
processor's caches. These data are being read from/written to memory, typically through a computer bus (so far
typically volatile storage components). Computer memory is communicating data (transferred to/from) external
storage, typically through standard storage interfaces or networks (e.g., fibre channel, iSCSI). A storage array, a
common external storage unit, typically has storage hierarchy of it own, from a fast cache, typically consisting of
(volatile and fast) DRAM, which is connected (again via standard interfaces) to drives, possibly with different
speeds, like flash drives and magnetic disk drives (non-volatile). The drives may be connected to magnetic tapes, on
which typically the least active parts of a large database may reside, or database backup generations.

Database
Typically a correlation exists currently between storage speed and price, while the faster storage is typically volatile.
Data structures
A data structure is an abstract construct that embeds data in a well defined manner. An efficient data structure allows
to manipulate the data in efficient ways. The data manipulation may include data insertion, deletion, updating and
retrieval in various modes. A certain data structure type may be very effective in certain operations, and very
ineffective in others. A data structure type is selected upon DBMS development to best meet the operations needed
for the types of data it contains. Type of data structure selected for a certain task typically also takes into
consideration the type of storage it resides in (e.g., speed of access, minimal size of storage chunk accessed, etc.). In
some DBMSs database administrators have the flexibility to select among options of data structures to contain user
data for performance reasons. Sometimes the data structures have selectable parameters to tune the database
performance.
Databases may store data in many data structure types.[15] Common examples are the following:

ordered/unordered flat files


hash tables
B+ trees
ISAM

heaps
Application data and DBMS data
A typical DBMS cannot store the data of the application it serves alone. In order to handle the application data the
DBMS need to store these data in data structures that comprise specific data by themselves. In addition the DBMS
needs its own data structures and many types of bookkeeping data like indexes and logs. The DBMS data are an
integral part of the database and may comprise a substantial portion of it.
Database indexing
Indexing is a technique for improving database performance. The many types of indexes share the common property
that they reduce the need to examine every entry when running a query. In large databases, this can reduce query
time/cost by orders of magnitude. The simplest form of index is a sorted list of values that can be searched using a
binary search with an adjacent reference to the location of the entry, analogous to the index in the back of a book.
The same data can have multiple indexes (an employee database could be indexed by last name and hire date.)
Indexes affect performance, but not results. Database designers can add or remove indexes without changing
application logic, reducing maintenance costs as the database grows and database usage evolves.
Given a particular query, the DBMS' query optimizer is responsible for devising the most efficient strategy for
finding matching data.
Indexes can speed up data access, but they consume space in the database, and must be updated each time the data
are altered. Indexes therefore can speed data access but slow data maintenance. These two properties determine
whether a given index is worth the cost.
Database data clustering
In many cases substantial performance improvement is gained if different types of database objects that are usually
utilized together are laid in storage in proximity, being clustered. This usually allows to retrieve needed related
objects from storage in minimum number of input operations (each sometimes substantially time consuming). Even
for in-memory databases clustering provides performance advantage due to common utilization of large caches for
input-output operations in memory, with similar resulting behavior.

20

Database

21

For example it may be beneficial to cluster a record of an item in stock with all its respective order records. The
decision of whether to cluster certain objects or not depends on the objects' utilization statistics, object sizes, caches
sizes, storage types, etc. In a relational database clustering the two respective relations "Items" and "Orders" results
in saving the expensive execution of a Join operation between the two relations whenever such a join is needed in a
query (the join result is already ready in storage by the clustering, available to be utilized).
Database materialized views
Often storage redundancy is employed to increase performance. A common example is storing materialized views,
which consist of frequently needed external views or query results. Storing such views saves the expensive
computing of them each time they are needed. The downsides of materialized views are the overhead incurred when
updating them to keep them synchronized with their original updated database data, and the cost of storage
redundancy.
Database and database object replication
See also Replication below
Occasionally a database employs storage redundancy by database objects replication (with one or more copies) to
increase data availability (both to improve performance of simultaneous multiple end-user accesses to a same
database object, and to provide resiliency in a case of partial failure of a distributed database). Updates of a
replicated object need to be synchronized across the object copies. In many cases the entire database is replicated.

Database transactions
As with every software system, a DBMS that operates in a faulty computing environment is prone to failures of
many kinds. A failure can corrupt the respective database unless special measures are taken to prevent this. A DBMS
achieves certain levels of fault tolerance by encapsulating operations within transactions. The concept of a database
transaction (or atomic transaction) has evolved in order to enable both a well understood database system behavior
in a faulty environment where crashes can happen any time, and recovery from a crash to a well understood database
state. A database transaction is a unit of work, typically encapsulating a number of operations over a database (e.g.,
reading a database object, writing, acquiring lock, etc.), an abstraction supported in database and also other systems.
Each transaction has well defined boundaries in terms of which program/code executions are included in that
transaction (determined by the transaction's programmer via special transaction commands).
ACID rules
Every database transaction obeys the following rules:
Atomicity - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a
transaction is completed (committed or aborted respectively). In other words, to the outside world a committed
transaction appears (by its effects on the database) to be indivisible, atomic, and an aborted transaction does not
leave effects on the database at all, as if never existed.
Consistency - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the
predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction
must transform a database from one consistent state to another consistent state (however, it is the responsibility of
the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it
intends to perform (from the application's point of view) while the predefined integrity rules are enforced by the
DBMS). Thus since a database can be normally changed only by transactions, all the database's states are
consistent. An aborted transaction does not change the database state it has started from, as if it never existed
(atomicity above).
Isolation - Transactions cannot interfere with each other (as an end result of their executions). Moreover, usually
(depending on concurrency control method) the effects of an incomplete transaction are not even visible to

Database
another transaction. Providing isolation is the main goal of concurrency control.
Durability - Effects of successful (committed) transactions must persist through crashes (typically by recording
the transaction's effects and its commit event in a non-volatile memory).
Isolation, concurrency control, and locking
Isolation provides the ability for multiple users to operate on the database at the same time without corrupting the
data.
Concurrency control comprises the underlying mechanisms in a DBMS which handle isolation and guarantee
related correctness. It is heavily utilized by the Database and Storage engines (see above) both to guarantee the
correct execution of concurrent transactions, and (different mechanisms) the correctness of other DBMS
processes. The transaction-related mechanisms typically constrain the database data access operations' timing
(transaction schedules) to certain orders characterized as the Serializability and Recoverabiliry schedule
properties. Constraining database access operation execution typically means reduced performance (rates of
execution), and thus concurrency control mechanisms are typically designed to provide the best performance
possible under the constraints. Often, when possible without harming correctness, the serializability property is
compromised for better performance. However, recoverability cannot be compromised, since such typically
results in a quick database integrity violation.
Locking is the most common transaction concurrency control method in DBMSs, used to provide both
serializability and recoverability for correctness. In order to access a database object a transaction first needs to
acquire a lock for this object. Depending on the access operation type (e.g., reading or writing an object) and on
the lock type, acquiring the lock may be blocked and postponed, if another transaction is holding a lock for that
object.

Query optimization
A query is a request for information from a database. It can be as simple as "finding the address of a person with SS#
123-45-6789," or more complex like "finding the average salary of all the employed married men in California
between the ages 30 to 39, that earn less than their wives." Queries results are generated by accessing relevant
database data and manipulating them in a way that yields the requested information. Since database structures are
complex, in most cases, and especially for not-very-simple queries, the needed data for a query can be collected from
a database by accessing it in different ways, through different data-structures, and in different orders. Each different
way typically requires different processing time. Processing times of a same query may have large variance, from a
fraction of a second to hours, depending on the way selected. The purpose of query optimization, which is an
automated process, is to find the way to process a given query in minimum time. The large possible variance in time
justifies performing query optimization, though finding the exact optimal way to execute a query, among all
possibilities, is typically very complex, time consuming by itself, may be too costly, and often practically impossible.
Thus query optimization typically tries to approximate the optimum by comparing several common-sense
alternatives to provide in a reasonable time a "good enough" plan which typically does not deviate much from the
best possible result.

DBMS support for the development and maintenance of a database and its application
A DBMS typically intends to provide convenient environment to develop and later maintain an application built
around its respective database type. A DBMS either provides such tools, or allows integration with such external
tools. Examples for tools relate to database design, application programming, application program maintenance,
database performance analysis and monitoring, database configuration monitoring, DBMS hardware configuration (a
DBMS and related database may span computers, networks, and storage units) and related database mapping
(especially for a distributed DBMS), storage allocation and database layout monitoring, storage migration, etc.

22

Database

References
[1] Jeffrey Ullman and Jennifer widom 1997: First course in database systems, Prentice-Hall Inc., Simon & Schuster, Page 1, ISBN
0-13-861337-0.
[2] C. W. Bachmann, The Programmer as Navigator
[3] Graves, Steve. "COTS Databases For Embedded Systems" (http:/ / www. embedded-computing. com/ articles/ id/ ?2020), Embedded
Computing Design magazine, January, 2007. Retrieved on August 13, 2008.
[4] "TeleCommunication Systems Signs up as a Reseller of TimesTen; Mobile Operators and Carriers Gain Real-Time Platform for
Location-Based Services" (http:/ / findarticles. com/ p/ articles/ mi_m0EIN/ is_2002_June_24/ ai_87694370). Business Wire. 2002-06-24. .
[5] Argumentation in Artificial Intelligence by Iyad Rahwan, Guillermo R. Simari
[6] "OWL DL Semantics" (http:/ / www. obitko. com/ tutorials/ ontologies-semantic-web/ owl-dl-semantics. html). . Retrieved 10 December
2010.
[7] Introducing databases by Stephen Chu, in Conrick, M. (2006) Health informatics: transforming healthcare with technology, Thomson, ISBN
0-17-012731-1, p. 69.
[8] Date, C. J. (June 1, 1999). "When's an extension not an extension?" (http:/ / intelligent-enterprise. informationweek. com/ db_area/ archives/
1999/ 990106/ online1. jhtml;jsessionid=Y2UNK1QFKXMBTQE1GHRSKH4ATMY32JVN). Intelligent Enterprise 2 (8). .
[9] Zhuge, H. (2008). The Web Resource Space Model. Web Information Systems Engineering and Internet Technologies Book Series. 4.
Springer. ISBN978-0-387-72771-4.
[10] Chapple, Mike. "SQL Fundamentals" (http:/ / databases. about. com/ od/ sql/ a/ sqlfundamentals. htm). Databases. About.com. . Retrieved
2009-01-28.
[11] "Structured Query Language (SQL)" (http:/ / publib. boulder. ibm. com/ infocenter/ db2luw/ v9/ index. jsp?topic=com. ibm. db2. udb.
admin. doc/ doc/ c0004100. htm). International Business Machines. October 27, 2006. . Retrieved 2007-06-10.
[12] Wagner, Michael (2010), "1. Auflage", SQL/XML:2006 - Evaluierung der Standardkonformitt ausgewhlter Datenbanksysteme, Diplomica
Verlag, ISBN3-8366-9609-6
[13] Date 1990, pp.3132
[14] Johann A. Makowsky, Victor M. Markowitz and Nimrod Rotics, 1986: "Entity-relationship consistency for relational schemas" (http:/ /
www. springerlink. com/ content/ p67756164r127m18/ ) Proceedings of the 1986 Conference on Database Theory (ICDT '86), Lecture Notes
in Computer Science, 1986, Volume 243/1986, pp. 306-322, Springer, doi:10.1007/3-540-17187-8_43
[15] Lightstone, Teorey & Nadeau 2007

Further reading
Ling Liu and Tamer M. zsu (Eds.) (2009). " Encyclopedia of Database Systems (http://www.springer.com/
computer/database+management+&+information+retrieval/book/978-0-387-49616-0), 4100 p.60 illus.
ISBN 978-0-387-49616-0.
Beynon-Davies, P. (2004). Database Systems. 3rd Edition. Palgrave, Houndmills, Basingstoke.
Connolly, Thomas and Carolyn Begg. Database Systems. New York: Harlow, 2002.
Date, C. J. (2003). An Introduction to Database Systems, Fifth Edition. Addison Wesley. ISBN0-201-51381-1.
Gray, J. and Reuter, A. Transaction Processing: Concepts and Techniques, 1st edition, Morgan Kaufmann
Publishers, 1992.
Kroenke, David M. and David J. Auer. Database Concepts. 3rd ed. New York: Prentice, 2007.
Lightstone, S.; Teorey, T.; Nadeau, T. (2007). Physical Database Design: the database professional's guide to
exploiting indexes, views, storage, and more. Morgan Kaufmann Press. ISBN0-12-369389-6.
Teorey, T.; Lightstone, S. and Nadeau, T. Database Modeling & Design: Logical Design, 4th edition, Morgan
Kaufmann Press, 2005. ISBN 0-12-685352-5

External links
Database (http://www.dmoz.org/Computers/Data_Formats/Database/) at the Open Directory Project

23

Entityrelationship model

24

Entityrelationship model
In software engineering, an Entity
Relationship model (ER model for
short) is an abstract way to describe a
database. It usually starts with a
relational database, which stores data in
tables. Some of the data in these tables
point to data in other tables - for
instance, your entry in the database
could point to several entries for each of
the phone numbers that are yours. The
ER model would say that you are an
entity, and each phone number is an
entity, and the relationship between you
and the phone numbers is 'has a phone
number'. Diagrams created to design
these entities and relationships are
called entityrelationship diagrams or
ER diagrams.
A sample Entity Relationship diagram using Chen's notation

This article refers to the techniques


proposed in Peter Chen's 1976 paper.[1] However, variants of the idea existed previously,[2] and have been devised
subsequently such as supertype and subtype data entities [3] and commonality relationships (an example with
additional concepts is the enhanced entityrelationship model).

Overview
Using the three schema approach to software engineering, there are three levels of ER models that may be
developed. The conceptual data model is the highest level ER model in that it contains the least granular detail but
establishes the overall scope of what is to be included within the model set. The conceptual ER model normally
defines master reference data entities that are commonly used by the organization. Developing an enterprise-wide
conceptual ER model is useful to support documenting the data architecture for an organization.
A conceptual ER model may be used as the foundation for one or more logical data models. The purpose of the
conceptual ER model is then to establish structural metadata commonality for the master data entities between the set
of logical ER models. The conceptual data model may be used to form commonality relationships between ER
models as a basis for data model integration.
A logical ER model does not require a conceptual ER model especially if the scope of the logical ER model is to
develop a single disparate information system. The logical ER model contains more detail than the conceptual ER
model. In addition to master data entities, operational and transactional data entities are now defined. The details of
each data entity are developed and the entity relationships between these data entities are established. The logical ER
model is however developed independent of technology into which it will be implemented.
One or more physical ER models may be developed from each logical ER model. The physical ER model is
normally developed be instantiated as a database. Therefore, each physical ER model must contain enough detail to
produce a database and each physical ER model is technology dependent since each database management system is
somewhat different.

Entityrelationship model

25

The physical model is normally forward engineered to instantiate the structural metadata into a database
management system as relational database objects such as database tables, database indexes such as unique key
indexes, and database constraints such as a foreign key constraint or a commonality constraint. The ER model is also
normally used to design modifications to the relational database objects and to maintain the structural metadata of
the database.
The first stage of information system design uses these models during the requirements analysis to describe
information needs or the type of information that is to be stored in a database. The data modeling technique can be
used to describe any ontology (i.e. an overview and classifications of used terms and their relationships) for a certain
area of interest. In the case of the design of an information system that is based on a database, the conceptual data
model is, at a later stage (usually called logical design), mapped to a logical data model, such as the relational model;
this in turn is mapped to a physical model during physical design. Note that sometimes, both of these phases are
referred to as "physical design".

A UML metamodel of Extended Entity


Relationship models

The building blocks: entities, relationships, and attributes


An entity may be defined as a thing which is recognized as being
capable of an independent existence and which can be uniquely
identified. An entity is an abstraction from the complexities of a
domain. When we speak of an entity, we normally speak of some
aspect of the real world which can be distinguished from other aspects
of the real world.[4]
An entity may be a physical object such as a house or a car, an event
such as a house sale or a car service, or a concept such as a customer
transaction or order. Although the term entity is the one most
commonly used, following Chen we should really distinguish between
an entity and an entity-type. An entity-type is a category. An entity,
strictly speaking, is an instance of a given entity-type. There are
usually many instances of an entity-type. Because the term entity-type
is somewhat cumbersome, most people tend to use the term entity as a
synonym for this term.

Two related entities

An entity with an attribute

A relationship with an attribute

Entities can be thought of as nouns. Examples: a computer, an


employee, a song, a mathematical theorem.
A relationship captures how entities are related to one another.
Relationships can be thought of as verbs, linking two or more nouns.
Primary key

Entityrelationship model
Examples: an owns relationship between a company and a computer, a supervises relationship between an employee
and a department, a performs relationship between an artist and a song, a proved relationship between a
mathematician and a theorem.
The model's linguistic aspect described above is utilized in the declarative database query language ERROL, which
mimics natural language constructs. ERROL's semantics and implementation are based on Reshaped relational
algebra (RRA), a relational algebra which is adapted to the entityrelationship model and captures its linguistic
aspect.
Entities and relationships can both have attributes. Examples: an employee entity might have a Social Security
Number (SSN) attribute; the proved relationship may have a date attribute.
Every entity (unless it is a weak entity) must have a minimal set of uniquely identifying attributes, which is called
the entity's primary key.
Entityrelationship diagrams don't show single entities or single instances of relations. Rather, they show entity sets
and relationship sets. Example: a particular song is an entity. The collection of all songs in a database is an entity set.
The eaten relationship between a child and her lunch is a single relationship. The set of all such child-lunch
relationships in a database is a relationship set. In other words, a relationship set corresponds to a relation in
mathematics, while a relationship corresponds to a member of the relation.
Certain cardinality constraints on relationship sets may be indicated as well.

Relationships, roles and cardinalities


In Chen's original paper he gives an example of a relationship and its roles. He describes a relationship "marriage"
and its two roles "husband" and "wife".
A person plays the role of husband in a marriage (relationship) and another person plays the role of wife in the
(same) marriage. These words are nouns. That is no surprise; naming things requires a noun.
However as is quite usual with new ideas, many eagerly appropriated the new terminology but then applied it to their
own old ideas. Thus the lines, arrows and crows-feet of their diagrams owed more to the earlier Bachman diagrams
than to Chen's relationship diamonds. And they similarly misunderstood other important concepts.
In particular, it became fashionable (now almost to the point of exclusivity) to "name" relationships and roles as
verbs or phrases.

Relationship names
A relationship expressed with a single verb implying direction, makes it impossible to discuss the model using the
following proper English. For example:
the song and the artist are related by a 'performs'
the husband and wife are related by an 'is-married-to'.
Expressing the relationships with a noun resolves this:
the song and the artist are related by a 'performance'
the husband and wife are related by a 'marriage'.
Traditionally, the relationships are expressed twice, (using present continuous verb phrases), once in each direction.
This gives two English statements per relationship. For example:
the song is performed by the artist
the artist performs the song

26

Entityrelationship model

Role naming
It has also become prevalent to name roles with phrases e.g. is-the-owner-of and is-owned-by etc. Correct nouns in
this case are "owner" and "possession". Thus "person plays the role of owner" and "car plays the role of possession"
rather than "person plays the role of is-the-owner-of" etc.
The use of nouns has direct benefit when generating physical implementations from semantic models. When a
person has two relationships with car then it is possible to very simply generate names such as "owner_person" and
"driver_person" which are immediately meaningful.

Cardinalities
Modifications to the original specification can be beneficial. Chen described look-across cardinalities. As an aside,
the Barker-Ellis notation, used in Oracle Designer, uses same-side for minimum cardinality (analogous to
optionality) and role, but look-across for maximum cardinality (the crows foot).
In Merise,[5] Elmasri & Navathe[6] and others[7] there is a preference for same-side for roles and both minimum and
maximum cardinalities. Recent researchers (Feinerer,[8] Dullea et al.[9]) have shown that this is more coherent when
applied to n-ary relationships of order > 2.
In Dullea et al. one reads "A 'look across' notation such as used in the UML does not effectively represent the
semantics of participation constraints imposed on relationships where the degree is higher than binary."
In Feinerer it says "Problems arise if we operate under the look-across semantics as used for UML associations.
Hartmann[10] investigates this situation and shows how and why different transformations fail." (Although the
"reduction" mentioned is spurious as the two diagrams 3.4 and 3.5 are in fact the same) and also "As we will see on
the next few pages, the look-across interpretation introduces several difficulties which prevent the extension of
simple mechanisms from binary to n-ary associations."

Semantic modelling
The father of ER modelling said in his seminal paper: "The entity-relationship model adopts the more natural view
that the real world consists of entities and relationships. It incorporates some of the important semantic information
about the real world." [1] He is here in accord with philosophic and theoretical traditions from the time of the Ancient
Greek philosophers: Socrates, Plato and Aristotle (428 BC) through to modern epistemology, semiotics and logic of
Peirce, Frege and Russell. Plato himself associates knowledge with the apprehension of unchanging Forms (The
forms, according to Socrates, are roughly speaking archetypes or abstract representations of the many types of
things, and properties) and their relationships to one another. In his original 1976 article Chen explicitly contrasts
entityrelationship diagrams with record modelling techniques: "The data structure diagram is a representation of the
organisation of records and is not an exact representation of entities and relationships." Several other authors also
support his program:
Kent in "Data and Reality" [11] : "One thing we ought to have clear in our minds at the outset of a modelling
endeavour is whether we are intent on describing a portion of "reality" (some human enterprise) or a data processing
activity."
Abrial in "Data Semantics" : "... the so called "logical" definition and manipulation of data are still influenced
(sometimes unconsciously) by the "physical" storage and retrieval mechanisms currently available on computer
systems."
Stamper: "They pretend to describe entity types, but the vocabulary is from data processing: fields, data items,
values. Naming rules don't reflect the conventions we use for naming people and things; they reflect instead
techniques for locating records in files."
In Jackson's words: "The developer begins by creating a model of the reality with which the system is concerned, the
reality which furnishes its [the system's] subject matter ..."

27

Entityrelationship model

28

Elmasri, Navathe: "The ER model concepts are designed to be closer to the users perception of data and are not
meant to describe the way in which data will be stored in the computer."
A semantic model is a model of concepts, it is sometimes called a "platform independent model". It is an intensional
model. At the latest since Carnap, it is well known that:[12] "...the full meaning of a concept is constituted by two
aspects, its intension and its extension. The first part comprises the embedding of a concept in the world of concepts
as a whole, i.e. the totality of all relations to other concepts. The second part establishes the referential meaning of
the concept, i.e. its counterpart in the real or in a possible world". An extensional model is that which maps to the
elements of a particular methodology or technology, and is thus a "platform specific model". The UML specification
explicitly states that associations in class models are extensional and this is in fact self-evident by considering the
extensive array of additional "adornments" provided by the specification over and above those provided by any of
the prior candidate "semantic modelling languages"."UML as a Data Modeling Notation, Part 2" [13]

Diagramming conventions
Chen's notation for entityrelationship modeling uses
rectangles to represent entities, and diamonds to represent
relationships appropriate for first-class objects: they can
have attributes and relationships of their own. Entity sets
are drawn as rectangles, relationship sets as diamonds. If an
entity set participates in a relationship set, they are
connected with a line.
Attributes are drawn as ovals and are connected with a line
to exactly one entity or relationship set.
Cardinality constraints are expressed as follows:
a double line indicates a participation constraint, totality
or surjectivity: all entities in the entity set must
participate in at least one relationship in the relationship
set;
an arrow from entity set to relationship set indicates a
key constraint, i.e. injectivity: each entity of the entity
set can participate in at most one relationship in the
relationship set;
a thick line indicates both, i.e. bijectivity: each entity in
the entity set is involved in exactly one relationship.

Various methods of representing the same one to many


relationship. In each case, the diagram shows the relationship
between a person and a place of birth: each person must have
been born at one, and only one, location, but each location may
have had zero or more people born at it.

Entityrelationship model

29

an underlined name of an attribute


indicates that it is a key: two
different entities or relationships
with this attribute always have
different values for this attribute.
Attributes are often omitted as they can
clutter up a diagram; other diagram
techniques often list entity attributes
within the rectangles drawn for entity
sets.

Two related entities shown using Crow's Foot notation. In this example, an optional
relationship is shown between Artist and Song; the symbols closest to the song entity
represents "zero, one, or many", whereas a song has "one and only one" Artist. The
former is therefore read as, an Artist (can) perform(s) "zero, one, or many" song(s).

Related diagramming convention techniques:

Bachman notation
Barker's Notation
EXPRESS
IDEF1X[14]

Martin notation
(min, max)-notation of Jean-Raymond Abrial in 1974
UML class diagrams
Merise
Object-Role Modeling

Crow's Foot Notation


Crow's Foot notation is used in Barker's Notation, SSADM and Information Engineering. Crow's Foot diagrams
represent entities as boxes, and relationships as lines between the boxes. Different shapes at the ends of these lines
represent the cardinality of the relationship.
Crow's Foot notation was used in the 1980s by the consultancy practice CACI. Many of the consultants at CACI
(including Richard Barker) subsequently moved to Oracle UK, where they developed the early versions of Oracle's
CASE tools, introducing the notation to a wider audience. The following tools use Crow's Foot notation: ARIS,
System Architect, Visio, PowerDesigner, Toad Data Modeler, DeZign for Databases, Devgems Data Modeler,
OmniGraffle, MySQL Workbench and SQL Developer Data Modeler. CA's ICASE tool, CA Gen aka Information
Engineering Facility also uses this notation.

ER diagramming tools
There are many ER diagramming tools. Free software ER diagramming tools that can interpret and generate ER
models and SQL and do database analysis are MySQL Workbench (formerly DBDesigner), and Open ModelSphere
(open-source). A freeware ER tool that can generate database and application layer code (webservices) is the RISE
Editor.
Proprietary ER diagramming tools are Avolution, dbForge Studio for MySQL, ER/Studio, ERwin, MagicDraw,
MEGA International, ModelRight, Navicat Data Modeler, OmniGraffle, Oracle Designer, PowerDesigner, Rational
Rose, Sparx Enterprise Architect, SQLyog, System Architect, Toad Data Modeler, and Visual Paradigm.
Free software diagram tools just draw the shapes without having any knowledge of what they mean, nor do they
generate SQL. These include Creately, yEd, LucidChart, Kivio, and Dia.

Entityrelationship model

Limitations
ER models assume information content that can readily be represented in a relational database. They describe only a
relational structure for this information.
Hence, they are inadequate for systems in which the information cannot readily be represented in relational form,
such as with semi-structured data.
Furthermore, for many systems, the possible changes to the information contained are nontrivial and important
enough to warrant explicit specification. Some authors have extended ER modeling with constructs to represent
change, an approach supported by the original author;[15] an example is Anchor Modeling.
An alternative is to model change separately, using a process modeling technique.
Additional techniques can be used for other aspects of systems. For instance, ER models roughly correspond to just 1
of the 14 different modeling techniques offered by UML.
Another limitation: ER modeling is aimed at specifying information from scratch. This suits the design of new,
standalone information systems, but is of less help in integrating pre-existing information sources that already define
their own data representations in detail.
Even where it is suitable in principle, ER modeling is rarely used as a separate activity. One reason for this is today's
abundance of tools to support diagramming and other design support directly on relational database management
systems. These tools can readily extract database diagrams that are very close to ER diagrams from existing
databases, and they provide alternative views on the information contained in such diagrams.
In a survey, Brodie and Liu[16] could not find a single instance of entityrelationship modeling inside a sample of ten
Fortune 100 companies. Badia and Lemire[17] blame this lack of use on the lack of guidance but also on the lack of
benefits, such as lack of support for data integration.
Also, the enhanced entityrelationship model (EER modeling) introduces several concepts which are not present in
ER modeling.

References
[1] "The Entity Relationship Model: Toward a Unified View of Data" (http:/ / citeseerx. ist. psu. edu/ viewdoc/ summary?doi=10. 1. 1. 123.
1085) for entityrelationship modeling.
[2] A.P.G. Brown, "Modelling a Real-World System and Designing a Schema to Represent It", in Douque and Nijssen (eds.), Data Base
Description, North-Holland, 1975, ISBN 0-7204-2833-5.
[3] Designing a Logical Database: Supertypes and Subtypes (http:/ / technet. microsoft. com/ en-us/ library/ cc505839. aspx)
[4] Paul Beynon-Davies (2004). Database Systems. Houndmills, Basingstoke, UK: Palgrave
[5] Hubert Tardieu, Arnold Rochfeld and Ren Colletti La methode MERISE: Principes et outils (Paperback - 1983)
[6] Elmasri, Ramez, B. Shamkant, Navathe, Fundamentals of Database Systems, third ed., Addison-Wesley, Menlo Park, CA, USA, 2000.
[7] ER 2004 : 23rd International Conference on Conceptual Modeling, Shanghai, China, November 8-12, 2004 (http:/ / books. google. com/
books?id=odZK99osY1EC& pg=PA52& img=1& pgis=1& dq=genova& sig=ACfU3U3tDC_q8WOMqUJW4EZCa5YQywoYLw& edge=0)
[8] A Formal Treatment of UML Class Diagrams as an Efficient Method for Configuration Management 2007 (http:/ / publik. tuwien. ac. at/
files/ pub-inf_4582. pdf)
[9] James Dullea, Il-Yeol Song, Ioanna Lamprou - An analysis of structural validity in entity-relationship modeling 2002 (http:/ / www. ischool.
drexel. edu/ faculty/ song/ publications/ p_DKE_03_Validity. pdf)
[10] "Reasoning about participation constraints and Chen's constraints" S Hartmann - 2003 (http:/ / www. acs. org. au/ documents/ public/ crpit/
CRPITV17Hartmann. pdf)
[11] http:/ / www. bkent. net/ Doc/ darxrp. htm
[12] http:/ / wenku. baidu. com/ view/ 8048e7bb1a37f111f1855b22. html
[13] http:/ / www. tdan. com/ view-articles/ 8589
[14] IDEF1X (https:/ / idbms. navo. navy. mil/ DataModel/ IDEF1X. html)
[15] P. Chen. Suggested research directions for a new frontier: Active conceptual modeling (http:/ / www. springerlink. com/ content/
5160x2634402663r/ ). ER 2006, volume 4215 of Lecture Notes in Computer Science, pages 14. Springer Berlin / Heidelberg, 2006.
[16] M. L. Brodie and J. T. Liu. The power and limits of relational technology in the age of information ecosystems (http:/ / www.
michaelbrodie. com/ documents/ The Power and Limits of Relational Technology In the Age of Information Ecosystems V2. pdf). On The
Move Federated Conferences, 2010.

30

Entityrelationship model
[17] A. Badia and D. Lemire. A call to arms: revisiting database design (http:/ / dl. acm. org/ citation. cfm?id=2070750). SIGMOD Record 40, 3
(November 2011), 61-69.

Further reading
Richard Barker (1990). CASE Method: Tasks and Deliverables. Wokingham, England: Addison-Wesley.
Paul Beynon-Davies (2004). Database Systems. Houndmills, Basingstoke, UK: Palgrave
Peter Chen (March 1976). "The Entity-Relationship Model - Toward a Unified View of Data". ACM Transactions
on Database Systems 1 (1): 936. doi:10.1145/320434.320440.
1976. "The Entity-Relationship Model--Toward a Unified View of Data" (http://csc.lsu.edu/news/erd.pdf). In:
ACM Transactions on Database Systems 1/1/1976 ACM-Press ISSN 0362-5915, S. 936

External links
Entity Relationship Modeling (http://www.devarticles.com/c/a/Development-Cycles/
Entity-Relationship-Modeling/) - Article from Development Cycles
Entity Relationship Modelling (http://www.databasedesign.co.uk/bookdatabasesafirstcourse/chap3/chap3.
htm)
An Entity Relationship Diagram Example (http://rapidapplicationdevelopment.blogspot.com/2007/06/
entity-relationship-diagram-example.html). Demonstrates the crow's feet notation by way of an example.
"Entity-Relationship Modeling: Historical Events, Future Trends, and Lessons Learned" (http://bit.csc.lsu.edu/
~chen/pdf/Chen_Pioneers.pdf) by Peter Chen.
"English, Chinese and ER diagrams" (http://bit.csc.lsu.edu/~chen/pdf/ER_C.pdf) by Peter Chen.
Case study: E-R diagram for Acme Fashion Supplies (http://www.cilco.co.uk/briefing-studies/
acme-fashion-supplies-feasibility-study/slides/logical-data-structure.html) by Mark H. Ridley.
Logical Data Structures (LDSs) - Getting started (http://www.cems.uwe.ac.uk/~tdrewry/lds.htm) by Tony
Drewry.
Introduction to Data Modeling (http://www.utexas.edu/its/archive/windows/database/datamodeling/index.
html)
Lecture by Prof.Dr.Muhittin GKMEN (http://www3.itu.edu.tr/~gokmen/SE-lecture-5.pdf), Department of
Computer Engineering, Istanbul Technical University.
ER-Diagram Convention (http://www.scribd.com/doc/3053988/ER-Diagram-convention)
Crow's Foot Notation (http://www2.cs.uregina.ca/~bernatja/crowsfoot.html)
"Articulated Entity Relationship (AER) Diagram for Complete Automation of Relational Database
Normalization" (http://airccse.org/journal/ijdms/papers/0510ijdms06.pdf) P. S. Dhabe, Dr. M. S.
Patwardhan, Asavari A. Deshpande.

31

Database design

32

Database design
Database design is the process of producing a detailed data model of a database. This logical data model contains all
the needed logical and physical design choices and physical storage parameters needed to generate a design in a Data
Definition Language, which can then be used to create a database. A fully attributed data model contains detailed
attributes for each entity.
The term database design can be used to describe many different parts of the design of an overall database system.
Principally, and most correctly, it can be thought of as the logical design of the base data structures used to store the
data. In the relational model these are the tables and views. In an object database the entities and relationships map
directly to object classes and named relationships. However, the term database design could also be used to apply to
the overall process of designing, not just the base data structures, but also the forms and queries used as part of the
overall database application within the database management system (DBMS).[1]
The process of doing database design generally consists of a number of steps which will be carried out by the
database designer. Usually, the designer must:
Determine the relationships between the different data elements.
Superimpose a logical structure upon the data on the basis of these relationships.[2]

ER Diagram (Entity-relationship model)


Database designs also include ER
(Entity-relationship model) diagrams.
An ER diagram is a diagram that helps
to design databases in an efficient way.
Attributes in ER diagrams are usually
modeled as an oval with the name of the
attribute, linked to the entity or
relationship that contains the attribute.
Within the relational model the final
step can generally be broken down into
two further steps, that of determining
the grouping of information within the
system, generally determining what are
the basic objects about which
information is being stored, and then
determining the relationships between
these groups of information, or objects.
This step is not necessary with an
Object database.[2]

A sample Entity-relationship diagram

Database design

The Design Process[3]


1. Determine the purpose of the database - This helps prepare for the remaining steps.
2. Find and organize the information required - Gather all of the types of information to record in the database,
such as product name and order number.
3. Divide the information into tables - Divide information items into major entities or subjects, such as Products
or Orders. Each subject then becomes a table.
4. Turn information items into columns - Decide what information needs to stored in each table. Each item
becomes a field, and is displayed as a column in the table. For example, an Employees table might include fields
such as Last Name and Hire Date.
5. Specify primary keys - Choose each tables primary key. The primary key is a column that is used to uniquely
identify each row. An example might be Product ID or Order ID.
6. Set up the table relationships - Look at each table and decide how the data in one table is related to the data in
other tables. Add fields to tables or create new tables to clarify the relationships, as necessary.
7. Refine the design - Analyze the design for errors. Create tables and add a few records of sample data. Check if
results come from the tables as expected. Make adjustments to the design, as needed.
8. Apply the normalization rules - Apply the data normalization rules to see if tables are structured correctly.
Make adjustments to the tables

Determining data to be stored


In a majority of cases, a person who is doing the design of a database is a person with expertise in the area of
database design, rather than expertise in the domain from which the data to be stored is drawn e.g. financial
information, biological information etc. Therefore the data to be stored in the database must be determined in
cooperation with a person who does have expertise in that domain, and who is aware of what data must be stored
within the system.
This process is one which is generally considered part of requirements analysis, and requires skill on the part of the
database designer to elicit the needed information from those with the domain knowledge. This is because those with
the necessary domain knowledge frequently cannot express clearly what their system requirements for the database
are as they are unaccustomed to thinking in terms of the discrete data elements which must be stored. Data to be
stored can be determined by Requirement Specification.[4]

Normalization
In the field of relational database design, normalization is a systematic way of ensuring that a database structure is
suitable for general-purpose querying and free of certain undesirable characteristicsinsertion, update, and deletion
anomaliesthat could lead to a loss of data integrity.
A standard piece of database design guidance is that the designer should create a fully normalized design; selective
denormalization can subsequently be performed, but only for performance reasons. However, some modeling
disciplines, such as the dimensional modeling approach to data warehouse design, explicitly recommend
non-normalized designs, i.e. designs that in large part do not adhere to 3NF. Normalization consists of normal forms
that are 1NF,2NF,3NF,BOYCE-CODD NF (3.5NF),4NF and 5NF

33

Database design

Types of Database design


Conceptual schema
Once a database designer is aware of the data which is to be stored within the database, they must then determine
where dependency is within the data. Sometimes when data is changed you can be changing other data that is not
visible. For example, in a list of names and addresses, assuming a situation where multiple people can have the same
address, but one person cannot have more than one address, the name is dependent upon the address, because if the
address is different, then the associated name is different too. However, the other way around is different. One
attribute can change and not another.
(NOTE: A common misconception is that the relational model is so called because of the stating of relationships
between data elements therein. This is not true. The relational model is so named because it is based upon the
mathematical structures known as relations.)

Logically structuring data


Once the relationships and dependencies amongst the various pieces of information have been determined, it is
possible to arrange the data into a logical structure which can then be mapped into the storage objects supported by
the database management system. In the case of relational databases the storage objects are tables which store data in
rows and columns.
Each table may represent an implementation of either a logical object or a relationship joining one or more instances
of one or more logical objects. Relationships between tables may then be stored as links connecting child tables with
parents. Since complex logical relationships are themselves tables they will probably have links to more than one
parent.
In an Object database the storage objects correspond directly to the objects used by the Object-oriented programming
language used to write the applications that will manage and access the data. The relationships may be defined as
attributes of the object classes involved or as methods that operate on the object classes.

Physical design
The physical design of the database specifies the physical configuration of the database on the storage media. This
includes detailed specification of data elements, data types, indexing options and other parameters residing in the
DBMS data dictionary. It is the detailed design of a system that includes modules & the database's hardware &
software specifications of the system.

References
[1] Gehani, N. (2006). The Database Book: Principles and practice using MySQL. 1st ed., Summit, NJ.: Silicon Press
[2] Teorey, T.J., Lightstone, S.S., et al., (2009). Database Design: Know it all.1st ed. Burlington, MA.: Morgan Kaufmann Publishers
[3] Database design basics. (n.d.). Database design basics. Retrieved May 1, 2010, from http:/ / office. microsoft. com/ en-us/ access/
HA012242471033. aspx
[4] Teorey, T.; Lightstone, S. and Nadeau, T.(2005) Database Modeling & Design: Logical Design, 4th edition, Morgan Kaufmann Press. ISBN
0-12-685352-5

34

Database design

35

Further reading
S. Lightstone, T. Teorey, T. Nadeau, Physical Database Design: the database professional's guide to exploiting
indexes, views, storage, and more, Morgan Kaufmann Press, 2007. ISBN 0-12-369389-6

External links
(http://www.sqlteam.com/article/database-design-and-modeling-fundamentals)
(http://office.microsoft.com/en-us/access/HA012242471033.aspx)
Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm) by
Mike Chapple (About.com)
Database Normalization Intro (http://www.databasejournal.com/sqletc/article.php/1428511), Part 2 (http://
www.databasejournal.com/sqletc/article.php/26861_1474411_1)
"An Introduction to Database Normalization" (http://web.archive.org/web/20110606025027/http://dev.
mysql.com/tech-resources/articles/intro-to-normalization.html). Archived from the original (http://dev.
mysql.com/tech-resources/articles/intro-to-normalization.html) on 2011-06-06. Retrieved 2012-02-25.
"Normalization" (http://web.archive.org/web/20100106115112/http://www.utexas.edu/its/archive/
windows/database/datamodeling/rm/rm7.html). Archived from the original (http://www.utexas.edu/its/
windows/database/datamodeling/rm/rm7.html) on 2010-01-06. Retrieved 2012-02-25.
Efficient Database Design (http://www.sum-it.nl/cursus/enindex.php3#dbdesign)
Data Modelers Community (http://www.datamodelers.com/)
Relational database design tutorial (http://en.tekstenuitleg.net/articles/software/database-design-tutorial/intro.
html)
Database design (http://www.dmoz.org/Computers/Data_Formats/Database/) at the Open Directory Project

Relational database
A relational database is a collection of data items organised as a set of formally described tables from which data
can be accessed easily. A relational database is created using the relational model. The software used in a relational
database is called a relational database management system (RDBMS). A relational database is the predominant
choice in storing data, over other models like the hierarchical database model or the network model.
The relational database was first defined in 1970 by Edgar Codd, of IBM's San Jose Research Laboratory.[1]

Terminology
Relational database theory uses a set of
mathematical terms, which are roughly
equivalent
to
SQL
database
terminology.
The
table
below
summarizes some of the most
important relational database terms and
their SQL database equivalents.

Relational database terminology.

Relational database

36

Relational term

SQL equivalent

relation, base relvar table


derived relvar

view, query result, result set

tuple

row

attribute

column

Relations or Tables
A relation is defined as a set of tuples that have the same attributes. A tuple usually represents an object and
information about that object. Objects are typically physical objects or concepts. A relation is usually described as a
table, which is organized into rows and columns. All the data referenced by an attribute are in the same domain and
conform to the same constraints. The relational model specifies that the tuples of a relation have no specific order
and that the tuples, in turn, impose no order on the attributes. Applications access data by specifying queries, which
use operations such as select to identify tuples, project to identify attributes, and join to combine relations. Relations
can be modified using the insert, delete, and update operators. New tuples can supply explicit values or be derived
from a query. Similarly, queries identify tuples for updating or deleting. It is necessary for each tuple of a relation to
be uniquely identifiable by some combination (one or more) of its attribute values. This combination is referred to as
the primary key.

Base and derived relations


In a relational database, all data are stored and accessed via relations. Relations that store data are called "base
relations", and in implementations are called "tables". Other relations do not store data, but are computed by
applying relational operations to other relations. These relations are sometimes called "derived relations". In
implementations these are called "views" or "queries". Derived relations are convenient in that they act as a single
relation, even though they may grab information from several relations. Also, derived relations can be used as an
abstraction layer.

Domain
A domain describes the set of possible values for a given attribute, and can be considered a constraint on the value of
the attribute. Mathematically, attaching a domain to an attribute means that any value for the attribute must be an
element of the specified set. The character data value 'ABC', for instance, is not in the integer domain. The integer
value 123 satisfies the domain constraint.

Constraints
Constraints make it possible to further restrict the domain of an attribute. For instance, a constraint can restrict a
given integer attribute to values between 1 and 10. Constraints provide one method of implementing business rules
in the database. SQL implements constraint functionality in the form of check constraints. Constraints restrict the
data that can be stored in relations. These are usually defined using expressions that result in a boolean value,
indicating whether or not the data satisfies the constraint. Constraints can apply to single attributes, to a tuple
(restricting combinations of attributes) or to an entire relation. Since every attribute has an associated domain, there
are constraints (domain constraints). The two principal rules for the relational model are known as entity integrity
and referential integrity.

Relational database

Primary keys
A primary key uniquely defines a relationship within a database. In order for an attribute to be a good primary key it
must not repeat. While natural attributes are sometimes good primary keys, surrogate keys are often used instead. A
surrogate key is an artificial attribute assigned to an object which uniquely identifies it (for instance, in a table of
information about students at a school they might all be assigned a student ID in order to differentiate them). The
surrogate key has no intrinsic (inherent) meaning, but rather is useful through its ability to uniquely identify a tuple.
Another common occurrence, especially in regards to N:M cardinality is the composite key. A composite key is a
key made up of two or more attributes within a table that (together) uniquely identify a record. (For example, in a
database relating students, teachers, and classes. Classes could be uniquely identified by a composite key of their
room number and time slot, since no other class could have exactly the same combination of attributes. In fact, use of
a composite key such as this can be a form of data verification, albeit a weak one.)

Foreign key
A foreign key is a field in a relational table that matches the primary key column of another table. The foreign key
can be used to cross-reference tables. Foreign keys need not have unique values in the referencing relation. Foreign
keys effectively use the values of attributes in the referenced relation to restrict the domain of one or more attributes
in the referencing relation. A foreign key could be described formally as: "For all tuples in the referencing relation
projected over the referencing attributes, there must exist a tuple in the referenced relation projected over those same
attributes such that the values in each of the referencing attributes match the corresponding values in the referenced
attributes."

Stored procedures
A stored procedure is executable code that is associated with, and generally stored in, the database. Stored
procedures usually collect and customize common operations, like inserting a tuple into a relation, gathering
statistical information about usage patterns, or encapsulating complex business logic and calculations. Frequently
they are used as an application programming interface (API) for security or simplicity. Implementations of stored
procedures on SQL RDBMSs often allow developers to take advantage of procedural extensions (often
vendor-specific) to the standard declarative SQL syntax. Stored procedures are not part of the relational database
model, but all commercial implementations include them.

Index
An index is one way of providing quicker access to data. Indices can be created on any combination of attributes on
a relation. Queries that filter using those attributes can find matching tuples randomly using the index, without
having to check each tuple in turn. This is analogous to using the index of a book to go directly to the page on which
the information you are looking for is found, that is you do not have to read the entire book to find what you are
looking for. Relational databases typically supply multiple indexing techniques, each of which is optimal for some
combination of data distribution, relation size, and typical access pattern. Indices are usually implemented via B+
trees, R-trees, and bitmaps. Indices are usually not considered part of the database, as they are considered an
implementation detail, though indices are usually maintained by the same group that maintains the other parts of the
database. It should be noted that use of efficient indexes on both primary and foreign keys can dramatically improve
query performance. This is because B-tree indexes result in query times proportional to log(n) where N is the number
of rows in a table and hash indexes result in constant time queries (no size dependency so long as the relevant part of
the index fits into memory).

37

Relational database

Relational operations
Queries made against the relational database, and the derived relvars in the database are expressed in a relational
calculus or a relational algebra. In his original relational algebra, Codd introduced eight relational operators in two
groups of four operators each. The first four operators were based on the traditional mathematical set operations:
The union operator combines the tuples of two relations and removes all duplicate tuples from the result. The
relational union operator is equivalent to the SQL UNION operator.
The intersection operator produces the set of tuples that two relations share in common. Intersection is
implemented in SQL in the form of the INTERSECT operator.
The difference operator acts on two relations and produces the set of tuples from the first relation that do not exist
in the second relation. Difference is implemented in SQL in the form of the EXCEPT or MINUS operator.
The cartesian product of two relations is a join that is not restricted by any criteria, resulting in every tuple of the
first relation being matched with every tuple of the second relation. The cartesian product is implemented in SQL
as the CROSS JOIN operator.
The remaining operators proposed by Codd involve special operations specific to relational databases:
The selection, or restriction, operation retrieves tuples from a relation, limiting the results to only those that meet
a specific criteria, i.e. a subset in terms of set theory. The SQL equivalent of selection is the SELECT query
statement with a WHERE clause.
The projection operation extracts only the specified attributes from a tuple or set of tuples.
The join operation defined for relational databases is often referred to as a natural join. In this type of join, two
relations are connected by their common attributes. SQL's approximation of a natural join is the INNER JOIN
operator.
The relational division operation is a slightly more complex operation, which involves essentially using the tuples
of one relation (the dividend) to partition a second relation (the divisor). The relational division operator is
effectively the opposite of the cartesian product operator (hence the name).
Other operators have been introduced or proposed since Codd's introduction of the original eight including relational
comparison operators and extensions that offer support for nesting and hierarchical data, among others.

Normalization
Normalization was first proposed by Codd as an integral part of the relational model. It encompasses a set of
procedures designed to eliminate nonsimple domains (non-atomic values) and the redundancy (duplication) of data,
which in turn prevents data manipulation anomalies and loss of data integrity. The most common forms of
normalization applied to databases are called the normal forms.

Relational database management systems


Relational databases, as implemented in relational database management systems, have become a predominant
choice for the storage of information in new databases used for financial records, manufacturing and logistical
information, personnel data and much more. Relational databases have often replaced legacy hierarchical databases
and network databases because they are easier to understand and use, even though they are much less efficient. As
computer power has increased, the inefficiencies of relational databases, which made them impractical in earlier
times, have been outweighed by their ease of use. However, relational databases have been challenged by object
databases, which were introduced in an attempt to address the object-relational impedance mismatch in relational
database, and XML databases. The three leading commercial relational database vendors are Oracle, Microsoft, and
IBM.[2] The three leading open source implementations are MySQL, PostgreSQL, and SQLite. Amazon Relational
Database Service is a database as a service offering MySQL and Oracle database engines.

38

Relational database

39

References
[1] Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377387.
doi:10.1145/362384.362685.
[2] Gartner Says Worldwide Relational Database Market Increased 14 Percent in 2006 (http:/ / www. gartner. com/ it/ page. jsp?id=507466),
includes revenue estimates for leading database companies

Relational model
The relational model for database management is a database model based on first-order predicate logic, first
formulated and proposed in 1969 by Edgar F. Codd.[1][2] In the relational model of a database, all data is represented
in terms of tuples, grouped into relations. A database organized in terms of the relational model is a relational
database.
The purpose of the relational model is to provide
a declarative method for specifying data and
queries: users directly state what information the
database contains and what information they
want from it, and let the database management
system software take care of describing data
structures for storing the data and retrieval
procedures for answering queries.
Most implementations of the relational model
use the SQL data definition and query language.
A table in an SQL database schema corresponds
to a predicate variable; the contents of a table to
a relation; key constraints, other constraints, and
SQL queries correspond to predicates. However,
SQL databases, including DB2, deviate from the
relational model in many details; Codd fiercely
argued against deviations that compromise the
original principles.[4]

[3]

Diagram of an example database according to the Relational model.

Overview
The relational model's central idea is to describe a database as a collection of predicates over a finite set of predicate
variables, describing constraints on the possible values and combinations of values. The content of the database at
any given time is a finite (logical) model of the database, i.e. a set of relations, one per predicate variable, such that
all predicates are satisfied. A request for information from the database (a database query) is also a predicate.

Relational model

40

In the relational model, related records are linked together with a "key".

Alternatives to the relational


model
Other models are the hierarchical
model and network model. Some
systems using these older architectures
are still in use today in data centers
with high data volume needs, or where
existing systems are so complex and
abstract it would be cost-prohibitive to
migrate to systems employing the
relational model; also of note are
newer object-oriented databases.

Relational model concepts.

A recent development is the Object-Relation type-Object model, which is based on the assumption that any fact can
be expressed in the form of one or more binary relationships. The model is used in Object Role Modeling (ORM),
RDF/Notation 3 (N3).
The relational model was the first database model to be described in formal mathematical terms. Hierarchical and
network databases existed before relational databases, but their specifications were relatively informal. After the
relational model was defined, there were many attempts to compare and contrast the different models, and this led to
the emergence of more rigorous descriptions of the earlier models; though the procedural nature of the data
manipulation interfaces for hierarchical and network databases limited the scope for formalization.

Relational model

Implementation
There have been several attempts to produce a true implementation of the relational database model as originally
defined by Codd and explained by Date, Darwen and others, but none have been popular successes so far. Rel is one
of the more recent attempts to do this.

History
The relational model was invented by E.F. (Ted) Codd as a general model of data, and subsequently maintained and
developed by Chris Date and Hugh Darwen among others. In The Third Manifesto (first published in 1995) Date and
Darwen show how the relational model can accommodate certain desired object-oriented features.

Controversies
Codd himself, some years after publication of his 1970 model, proposed a three-valued logic (True, False, Missing or
NULL) version of it to deal with missing information, and in his The Relational Model for Database Management
Version 2 (1990) he went a step further with a four-valued logic (True, False, Missing but Applicable, Missing but
Inapplicable) version. But these have never been implemented, presumably because of attending complexity. SQL's
NULL construct was intended to be part of a three-valued logic system, but fell short of that due to logical errors in
the standard and in its implementations.

Relational model topics


The model
The fundamental assumption of the relational model is that all data is represented as mathematical n-ary relations,
an n-ary relation being a subset of the Cartesian product of n domains. In the mathematical model, reasoning about
such data is done in two-valued predicate logic, meaning there are two possible evaluations for each proposition:
either true or false (and in particular no third value such as unknown, or not applicable, either of which are often
associated with the concept of NULL). Data are operated upon by means of a relational calculus or relational
algebra, these being equivalent in expressive power.
The relational model of data permits the database designer to create a consistent, logical representation of
information. Consistency is achieved by including declared constraints in the database design, which is usually
referred to as the logical schema. The theory includes a process of database normalization whereby a design with
certain desirable properties can be selected from a set of logically equivalent alternatives. The access plans and other
implementation and operation details are handled by the DBMS engine, and are not reflected in the logical model.
This contrasts with common practice for SQL DBMSs in which performance tuning often requires changes to the
logical model.
The basic relational building block is the domain or data type, usually abbreviated nowadays to type. A tuple is an
ordered set of attribute values. An attribute is an ordered pair of attribute name and type name. An attribute value is
a specific valid value for the type of the attribute. This can be either a scalar value or a more complex type.
A relation consists of a heading and a body. A heading is a set of attributes. A body (of an n-ary relation) is a set of
n-tuples. The heading of the relation is also the heading of each of its tuples.
A relation is defined as a set of n-tuples. In both mathematics and the relational database model, a set is an
unordered collection of unique, non-duplicated items, although some DBMSs impose an order to their data. In
mathematics, a tuple has an order, and allows for duplication. E.F. Codd originally defined tuples using this
mathematical definition.[5] Later, it was one of E.F. Codd's great insights that using attribute names instead of an
ordering would be so much more convenient (in general) in a computer language based on relations . This insight is
still being used today. Though the concept has changed, the name "tuple" has not. An immediate and important

41

Relational model
consequence of this distinguishing feature is that in the relational model the Cartesian product becomes
commutative.
A table is an accepted visual representation of a relation; a tuple is similar to the concept of row, but note that in the
database language SQL the columns and the rows of a table are ordered.
A relvar is a named variable of some specific relation type, to which at all times some relation of that type is
assigned, though the relation may contain zero tuples.
The basic principle of the relational model is the Information Principle: all information is represented by data values
in relations. In accordance with this Principle, a relational database is a set of relvars and the result of every query is
presented as a relation.
The consistency of a relational database is enforced, not by rules built into the applications that use it, but rather by
constraints, declared as part of the logical schema and enforced by the DBMS for all applications. In general,
constraints are expressed using relational comparison operators, of which just one, "is subset of" (), is theoretically
sufficient. In practice, several useful shorthands are expected to be available, of which the most important are
candidate key (really, superkey) and foreign key constraints.

Interpretation
To fully appreciate the relational model of data it is essential to understand the intended interpretation of a relation.
The body of a relation is sometimes called its extension. This is because it is to be interpreted as a representation of
the extension of some predicate, this being the set of true propositions that can be formed by replacing each free
variable in that predicate by a name (a term that designates something).
There is a one-to-one correspondence between the free variables of the predicate and the attribute names of the
relation heading. Each tuple of the relation body provides attribute values to instantiate the predicate by substituting
each of its free variables. The result is a proposition that is deemed, on account of the appearance of the tuple in the
relation body, to be true. Contrariwise, every tuple whose heading conforms to that of the relation but which does not
appear in the body is deemed to be false. This assumption is known as the closed world assumption: it is often
violated in practical databases, where the absence of a tuple might mean that the truth of the corresponding
proposition is unknown. For example, the absence of the tuple ('John', 'Spanish') from a table of language skills
cannot necessarily be taken as evidence that John does not speak Spanish.
For a formal exposition of these ideas, see the section Set-theoretic Formulation, below.

Application to databases
A data type as used in a typical relational database might be the set of integers, the set of character strings, the set of
dates, or the two boolean values true and false, and so on. The corresponding type names for these types might be
the strings "int", "char", "date", "boolean", etc. It is important to understand, though, that relational theory does not
dictate what types are to be supported; indeed, nowadays provisions are expected to be available for user-defined
types in addition to the built-in ones provided by the system.
Attribute is the term used in the theory for what is commonly referred to as a column. Similarly, table is commonly
used in place of the theoretical term relation (though in SQL the term is by no means synonymous with relation). A
table data structure is specified as a list of column definitions, each of which specifies a unique column name and the
type of the values that are permitted for that column. An attribute value is the entry in a specific column and row,
such as "John Doe" or "35".
A tuple is basically the same thing as a row, except in an SQL DBMS, where the column values in a row are
ordered. (Tuples are not ordered; instead, each attribute value is identified solely by the attribute name and never
by its ordinal position within the tuple.) An attribute name might be "name" or "age".

42

Relational model
A relation is a table structure definition (a set of column definitions) along with the data appearing in that structure.
The structure definition is the heading and the data appearing in it is the body, a set of rows. A database relvar
(relation variable) is commonly known as a base table. The heading of its assigned value at any time is as specified
in the table declaration and its body is that most recently assigned to it by invoking some update operator
(typically, INSERT, UPDATE, or DELETE). The heading and body of the table resulting from evaluation of some
query are determined by the definitions of the operators used in the expression of that query. (Note that in SQL the
heading is not always a set of column definitions as described above, because it is possible for a column to have no
name and also for two or more columns to have the same name. Also, the body is not always a set of rows because in
SQL it is possible for the same row to appear more than once in the same body.)

SQL and the relational model


SQL, initially pushed as the standard language for relational databases, deviates from the relational model in several
places. The current ISO SQL standard doesn't mention the relational model or use relational terms or concepts.
However, it is possible to create a database conforming to the relational model using SQL if one does not use certain
SQL features.
The following deviations from the relational model have been noted in SQL. Note that few database servers
implement the entire SQL standard and in particular do not allow some of these deviations. Whereas NULL is
ubiquitous, for example, allowing duplicate column names within a table or anonymous columns is uncommon.
Duplicate rows
The same row can appear more than once in an SQL table. The same tuple cannot appear more than once in a
relation.
Anonymous columns
A column in an SQL table can be unnamed and thus unable to be referenced in expressions. The relational
model requires every attribute to be named and referenceable.
Duplicate column names
Two or more columns of the same SQL table can have the same name and therefore cannot be referenced, on
account of the obvious ambiguity. The relational model requires every attribute to be referenceable.
Column order significance
The order of columns in an SQL table is defined and significant, one consequence being that SQL's
implementations of Cartesian product and union are both noncommutative. The relational model requires there
to be no significance to any ordering of the attributes of a relation.
Views without CHECK OPTION
Updates to a view defined without CHECK OPTION can be accepted but the resulting update to the database
does not necessarily have the expressed effect on its target. For example, an invocation of INSERT can be
accepted but the inserted rows might not all appear in the view, or an invocation of UPDATE can result in
rows disappearing from the view. The relational model requires updates to a view to have the same effect as if
the view were a base relvar.
Columnless tables unrecognized
SQL requires every table to have at least one column, but there are two relations of degree zero (of cardinality
one and zero) and they are needed to represent extensions of predicates that contain no free variables.
NULL
This special mark can appear instead of a value wherever a value can appear in SQL, in particular in place of a
column value in some row. The deviation from the relational model arises from the fact that the
implementation of this ad hoc concept in SQL involves the use of three-valued logic, under which the

43

Relational model
comparison of NULL with itself does not yield true but instead yields the third truth value, unknown; similarly
the comparison NULL with something other than itself does not yield false but instead yields unknown. It is
because of this behaviour in comparisons that NULL is described as a mark rather than a value. The relational
model depends on the law of excluded middle under which anything that is not true is false and anything that
is not false is true; it also requires every tuple in a relation body to have a value for every attribute of that
relation. This particular deviation is disputed by some if only because E.F. Codd himself eventually advocated
the use of special marks and a 4-valued logic, but this was based on his observation that there are two distinct
reasons why one might want to use a special mark in place of a value, which led opponents of the use of such
logics to discover more distinct reasons and at least as many as 19 have been noted, which would require a
21-valued logic. SQL itself uses NULL for several purposes other than to represent "value unknown". For
example, the sum of the empty set is NULL, meaning zero, the average of the empty set is NULL, meaning
undefined, and NULL appearing in the result of a LEFT JOIN can mean "no value because there is no
matching row in the right-hand operand".

Relational operations
Users (or programs) request data from a relational database by sending it a query that is written in a special language,
usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL
queries to be embedded into software that provides an easier user interface. Many web sites, such as Wikipedia,
perform SQL queries when generating pages.
In response to a query, the database returns a result set, which is just a list of rows containing the answers. The
simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return
just the answer wanted.
Often, data from multiple tables are combined into one, by doing a join. Conceptually, this is done by taking all
possible combinations of rows (the Cartesian product), and then filtering out everything except the answer. In
practice, relational database management systems rewrite ("optimize") queries to perform faster, using a variety of
techniques.
There are a number of relational operations in addition to join. These include project (the process of eliminating
some of the columns), restrict (the process of eliminating some of the rows), union (a way of combining two tables
with similar structures), difference (which lists the rows in one table that are not found in the other), intersect (which
lists the rows found in both tables), and product (mentioned above, which combines each row of one table with each
row of the other). Depending on which other sources you consult, there are a number of other operators many of
which can be defined in terms of those listed above. These include semi-join, outer operators such as outer join and
outer union, and various forms of division. Then there are operators to rename columns, and summarizing or
aggregating operators, and if you permit relation values as attributes (RVA relation-valued attribute), then
operators such as group and ungroup. The SELECT statement in SQL serves to handle all of these except for the
group and ungroup operators.
The flexibility of relational databases allows programmers to write queries that were not anticipated by the database
designers. As a result, relational databases can be used by multiple applications in ways the original designers did
not foresee, which is especially important for databases that might be used for a long time (perhaps several decades).
This has made the idea and implementation of relational databases very popular with businesses.

44

Relational model

45

Database normalization
Relations are classified based upon the types of anomalies to which they're vulnerable. A database that's in the first
normal form is vulnerable to all types of anomalies, while a database that's in the domain/key normal form has no
modification anomalies. Normal forms are hierarchical in nature. That is, the lowest level is the first normal form,
and the database cannot meet the requirements for higher level normal forms without first having met all the
requirements of the lesser normal forms.[6]

Examples
Database
An idealized, very simple example of a description of some relvars (relation variables) and their attributes:

Customer (Customer ID, Tax ID, Name, Address, City, State, Zip, Phone, Email)
Order (Order No, Customer ID, Invoice No, Date Placed, Date Promised, Terms, Status)
Order Line (Order No, Order Line No, Product Code, Qty)
Invoice (Invoice No, Customer ID, Order No, Date, Status)
Invoice Line (Invoice No, Invoice Line No, Product Code, Qty Shipped)

Product (Product Code, Product Description)


In this design we have six relvars: Customer, Order, Order Line, Invoice, Invoice Line and Product. The bold,
underlined attributes are candidate keys. The non-bold, underlined attributes are foreign keys.
Usually one candidate key is arbitrarily chosen to be called the primary key and used in preference over the other
candidate keys, which are then called alternate keys.
A candidate key is a unique identifier enforcing that no tuple will be duplicated; this would make the relation into
something else, namely a bag, by violating the basic definition of a set. Both foreign keys and superkeys (which
includes candidate keys) can be composite, that is, can be composed of several attributes. Below is a tabular
depiction of a relation of our example Customer relvar; a relation can be thought of as a value that can be attributed
to a relvar.

Customer relation
Customer ID

Tax ID

Name

Address

[More fields....]

==================================================================================================
1234567890

555-5512222

Munmun

323 Broadway

...

2223344556

555-5523232

Wile E.

1200 Main Street

...

3334445563

555-5533323

Ekta

871 1st Street

...

4232342432

555-5325523

E. F. Codd

123 It Way

...

If we attempted to insert a new customer with the ID 1234567890, this would violate the design of the relvar since
Customer ID is a primary key and we already have a customer 1234567890. The DBMS must reject a transaction
such as this that would render the database inconsistent by a violation of an integrity constraint.
Foreign keys are integrity constraints enforcing that the value of the attribute set is drawn from a candidate key in
another relation. For example in the Order relation the attribute Customer ID is a foreign key. A join is the
operation that draws on information from several relations at once. By joining relvars from the example above we
could query the database for all of the Customers, Orders, and Invoices. If we only wanted the tuples for a specific
customer, we would specify this using a restriction condition.
If we wanted to retrieve all of the Orders for Customer 1234567890, we could query the database to return every row
in the Order table with Customer ID 1234567890 and join the Order table to the Order Line table based on Order

Relational model

46

No.
There is a flaw in our database design above. The Invoice relvar contains an Order No attribute. So, each tuple in the
Invoice relvar will have one Order No, which implies that there is precisely one Order for each Invoice. But in
reality an invoice can be created against many orders, or indeed for no particular order. Additionally the Order relvar
contains an Invoice No attribute, implying that each Order has a corresponding Invoice. But again this is not always
true in the real world. An order is sometimes paid through several invoices, and sometimes paid without an invoice.
In other words there can be many Invoices per Order and many Orders per Invoice. This is a many-to-many
relationship between Order and Invoice (also called a non-specific relationship). To represent this relationship in the
database a new relvar should be introduced whose role is to specify the correspondence between Orders and
Invoices:
OrderInvoice(Order No,Invoice No)
Now, the Order relvar has a one-to-many relationship to the OrderInvoice table, as does the Invoice relvar. If we
want to retrieve every Invoice for a particular Order, we can query for all orders where Order No in the Order
relation equals the Order No in OrderInvoice, and where Invoice No in OrderInvoice equals the Invoice No in
Invoice.

Set-theoretic formulation
Basic notions in the relational model are relation names and attribute names. We will represent these as strings such
as "Person" and "name" and we will usually use the variables
and
to range over them. Another
basic notion is the set of atomic values that contains values such as numbers and strings.
Our first definition concerns the notion of tuple, which formalizes the notion of row or record in a table:
Tuple
A tuple is a partial function from attribute names to atomic values.
Header
A header is a finite set of attribute names.
Projection
The projection of a tuple

on a finite set of attributes

is

The next definition defines relation which formalizes the contents of a table as it is defined in the relational model.
Relation
A relation is a tuple

with

, the header, and

, the body, a set of tuples that all have the domain

.
Such a relation closely corresponds to what is usually called the extension of a predicate in first-order logic except
that here we identify the places in the predicate with attribute names. Usually in the relational model a database
schema is said to consist of a set of relation names, the headers that are associated with these names and the
constraints that should hold for every instance of the database schema.
Relation universe
A relation universe

over a header

is a non-empty set of relations with header

Relation schema
A relation schema
with header

consists of a header

and a predicate

. A relation satisfies a relation schema

that is defined for all relations

if it has header

and satisfies

Relational model

47

Key constraints and functional dependencies


One of the simplest and most important types of relation constraints is the key constraint. It tells us that in every
instance of a certain relational schema the tuples can be identified by their values for certain attributes.
Superkey
A superkey is written as a finite set of attribute names.
A superkey

holds in a relation

if:

and

there exist no two distinct tuples

such that

A superkey holds in a relation universe


Theorem: A superkey
holds in

if it holds in all relations in

holds in a relation universe

over

if and only if

and

Candidate key
A superkey

holds as a candidate key for a relation universe

no proper subset of

that also holds as a superkey for

if it holds as a superkey for

and there is

Functional dependency
A functional dependency (FD for short) is written as

for

A functional dependency

if:

holds in a relation

finite sets of attribute names.

and

tuples

A functional dependency

holds in a relation universe

if it holds in all relations in

Trivial functional dependency


A functional dependency is trivial under a header
Theorem: An FD

if it holds in all relation universes over

is trivial under a header

if and only if

Closure
Armstrong's axioms: The closure of a set of FDs
superset of

under a header

, written as

, is the smallest

such that:

(reflexivity)
(transitivity) and

(augmentation)
Theorem: Armstrong's axioms are sound and complete; given a header
contain subsets of
which all FDs in

if and only if

and a set

of FDs that only

holds in all relation universes over

in

hold.

Completion
The completion of a finite set of attributes
superset of

under a finite set of FDs

, written as

, is the smallest

such that:

The completion of an attribute set can be used to compute if a certain dependency is in the closure of a set of
FDs.
Theorem: Given a set

of FDs,

if and only if

Irreducible cover
An irreducible cover of a set

of FDs is a set

of FDs such that:

Relational model

there exists no

48

such that
is a singleton set and
.

Algorithm to derive candidate keys from functional dependencies


INPUT: a set S of FDs that contain only subsets of a header H
OUTPUT: the set C of superkeys that hold as candidate keys in
all relation universes over H in which all FDs in S hold
begin
C := ;
// found candidate keys
Q := { H };
// superkeys that contain candidate keys
while Q <> do
let K be some element from Q;
Q := Q { K };
minimal := true;
for each X->Y in S do
K' := (K Y) X;
// derive new superkey
if K' K then
minimal := false;
Q := Q { K' };
end if
end for
if minimal and there is not a subset of K in C then
remove all supersets of K from C;
C := C { K };
end if
end while
end

References
[1] "Derivability, Redundancy, and Consistency of Relations Stored in Large Data Banks", E.F. Codd, IBM Research Report, 1969
[2] "A Relational Model of Data for Large Shared Data Banks", in Communications of the ACM, 1970.
[3] Data Integration Glossary (http:/ / knowledge. fhwa. dot. gov/ tam/ aashto. nsf/ All+ Documents/ 4825476B2B5C687285256B1F00544258/
$FILE/ DIGloss. pdf), U.S. Department of Transportation, August 2001.
[4] E. F. Codd, The Relational Model for Database Management, Addison-Wesley Publishing Company, 1990, ISBN 0-201-14192-2
[5] Codd, E.F. (1970). "A Relational Model of Data for Large Shared Data Banks" (http:/ / www. acm. org/ classics/ nov95/ toc. html).
Communications of the ACM 13 (6): 377387. doi:10.1145/362384.362685. .
[6] David M. Kroenke, Database Processing: Fundamentals, Design, and Implementation (1997), Prentice-Hall, Inc., pages 130144

Relational model

Further reading
Date, C. J., Darwen, H. (2000). Foundation for Future Database Systems: The Third Manifesto, 2nd edition,
Addison-Wesley Professional. ISBN 0-201-70928-7.
Date, C. J. (2003). Introduction to Database Systems. 8th edition, Addison-Wesley. ISBN 0-321-19784-4.

External links
Feasibility of a set-theoretic data structure : a general structure based on a reconstituted definition of relation
(Childs' 1968 research cited by Codd's 1970 paper) (http://hdl.handle.net/2027.42/4164)
The Third Manifesto (TTM) (http://www.thethirdmanifesto.com/)
Relational Databases (http://www.dmoz.org/Computers/Software/Databases/Relational/) at the Open
Directory Project
Relational Model (http://c2.com/cgi/wiki?RelationalModel)
Binary relations and tuples compared with respect to the semantic web (http://blogs.sun.com/bblfish/entry/
why_binary_relations_beat_tuples)

Binary relation
In mathematics, a binary relation on a set A is a collection of ordered pairs of elements of A. In other words, it is a
subset of the Cartesian product A2 = A A. More generally, a binary relation between two sets A and B is a subset
of A B. The terms dyadic relation and 2-place relation are synonyms for binary relations.
An example is the "divides" relation between the set of prime numbers P and the set of integers Z, in which every
prime p is associated with every integer z that is a multiple of p (and not with any integer that is not a multiple of p).
In this relation, for instance, the prime 2 is associated with numbers that include 4, 0, 6, 10, but not 1 or 9; and the
prime 3 is associated with numbers that include 0, 6, and 9, but not 4 or 13.
Binary relations are used in many branches of mathematics to model concepts like "is greater than", "is equal to",
and "divides" in arithmetic, "is congruent to" in geometry, "is adjacent to" in graph theory, "is orthogonal to" in
linear algebra and many more. The concept of function is defined as a special kind of binary relation. Binary
relations are also heavily used in computer science.
A binary relation is the special case n = 2 of an n-ary relation RA1An, that is, a set of n-tuples where the jth
component of each n-tuple is taken from the jth domain Aj of the relation.
In some systems of axiomatic set theory, relations are extended to classes, which are generalizations of sets. This
extension is needed for, among other things, modeling the concepts of "is an element of" or "is a subset of" in set
theory, without running into logical inconsistencies such as Russell's paradox.

Formal definition
A binary relation R is usually defined as an ordered triple (X, Y, G) where X and Y are arbitrary sets (or classes), and
G is a subset of the Cartesian product X Y. The sets X and Y are called the domain (or the set of departure) and
codomain (or the set of destination), respectively, of the relation, and G is called its graph.
The statement (x,y) R is read "x is R-related to y", and is denoted by xRy or R(x,y). The latter notation corresponds
to viewing R as the characteristic function on "X" x "Y" for the set of pairs of G.
The order of the elements in each pair of G is important: if a b, then aRb and bRa can be true or false,
independently of each other.

49

Binary relation

50

A relation as defined by the triple (X, Y, G) is sometimes referred to as a correspondence instead.[1] In this case the
relation from X to Y is the subset G of XY, and "from X to Y" must always be either specified or implied by the
context when referring to the relation. In practice correspondence and relation tend to be used interchangeably.

Is a relation more than its graph?


According to the definition above, two relations with the same graph may be different, if they differ in the sets
and

. For example, if

, then

, and

are three

distinct relations.
Some mathematicians,especially in set theory, do not consider the sets
therefore define a binary relation as being a subset of

and

, that is, just the graph

to be part of the relation, and


.

A special case of this difference in points of view applies to the notion of function. Many authors insist on
distinguishing between a function's codomain and its range. Thus, a single "rule," like mapping every real number x
to x2, can lead to distinct functions
and
, depending on whether the images under that
rule are understood to be reals or, more restrictively, non-negative reals. But others view functions as simply sets of
ordered pairs with unique first components. This difference in perspectives does raise some nontrivial issues. As an
example, the former camp considers surjectivityor being ontoas a property of functions, while the latter sees it
as a relationship that functions may bear to sets.
Either approach is adequate for most uses, provided that one attends to the necessary changes in language, notation,
and the definitions of concepts like restrictions, composition, inverse relation, and so on. The choice between the two
definitions usually matters only in very formal contexts, like category theory.

Example
Example: Suppose there are four objects {ball, car, doll, gun} and four persons {John, Mary, Ian, Venus}. Suppose
that John owns the ball, Mary owns the doll, and Venus owns the car. Nobody owns the gun and Ian owns nothing.
Then the binary relation "is owned by" is given as
R=({ball, car, doll, gun}, {John, Mary, Ian, Venus}, {(ball, John), (doll, Mary), (car, Venus)}).
Thus the first element of R is the set of objects, the second is the set of people, and the last element is a set of
ordered pairs of the form (object, owner).
The pair (ball, John), denoted by ballRJohn means that the ball is owned by John.
Two different relations could have the same graph. For example: the relation
({ball, car, doll, gun}, {John, Mary, Venus}, {(ball,John), (doll, Mary), (car, Venus)})
is different from the previous one as everyone is an owner. But the graphs of the two relations are the same.
Nevertheless, R is usually identified or even defined as G(R) and "an ordered pair (x, y) G(R)" is usually denoted
as "(x, y) R".

Special types of binary relations


Some important types of binary relations R between X and Y are listed below.
Uniqueness properties:
injective (also called left-unique[2]): for all x and z in X and y in Y it holds that if xRy and zRy then x = z.
functional (also called right-unique[2] or right-definite): for all x in X, and y and z in Y it holds that if xRy and
xRz then y = z; such a binary relation is called a partial function.
one-to-one (also written 1-to-1): injective and functional.
Totality properties:

Binary relation
left-total[2]: for all x in X there exists a y in Y such that xRy (this property, although sometimes also referred to as
total, is different from the definition of total in the next section).
surjective (also called right-total[2]): for all y in Y there exists an x in X such that xRy.
Uniqueness and totality properties:
A function: a relation that is functional and left-total.
A bijection: a one-to-one correspondence; such a relation is a function and is said to be bijective.

Relations over a set


If X = Y then we simply say that the binary relation is over X, or that it is an endorelation over X. Some types of
endorelations are widely studied in graph theory, where they're known as directed graphs.
The set of all binary relations B(X) on a set X is a semigroup with involution with the involution being the mapping
of a relation to its inverse relation.
Some important types of binary relations over a set X are:
reflexive: for all x in X it holds that xRx. For example, "greater than or equal to" is a reflexive relation but "greater
than" is not.
irreflexive (or strict): for all x in X it holds that not xRx. "Greater than" is an example of an irreflexive relation.
coreflexive: for all x and y in X it holds that if xRy then x = y. "Equal to and odd" is an example of a coreflexive
relation.
symmetric: for all x and y in X it holds that if xRy then yRx. "Is a blood relative of" is a symmetric relation,
because x is a blood relative of y if and only if y is a blood relative of x.
antisymmetric: for all distinct x and y in X, if xRy then not yRx.
asymmetric: for all x and y in X, if xRy then not yRx. (So asymmetricity is stronger than anti-symmetry. In fact,
asymmetry is equivalent to anti-symmetry plus irreflexivity.)
transitive: for all x, y and z in X it holds that if xRy and yRz then xRz. (Note that, under the assumption of
transitivity, irreflexivity and asymmetry are equivalent.)
total: for all x and y in X it holds that xRy or yRx (or both). "Is greater than or equal to" is an example of a total
relation (this definition for total is different from left total in the previous section).
trichotomous: for all x and y in X exactly one of xRy, yRx or x = y holds. "Is greater than" is an example of a
trichotomous relation.
Euclidean: for all x, y and z in X it holds that if xRy and xRz, then yRz (and zRy). Equality is a Euclidean relation
because if x=y and x=z, then y=z.
serial: for all x in X, there exists y in X such that xRy. "Is greater than" is a serial relation on the integers. But it is
not a serial relation on the positive integers, because there is no y in the positive integers such that 1>y.[3]
However, the "Is less than" is a serial relation on the positive integers (the natural numbers), the rational numbers
and the real numbers. Every reflexive relation is serial.
set-like: for every x in X, the class of all y such that yRx is a set. (This makes sense only if we allow relations on
proper classes.) The usual ordering < on the class of ordinal numbers is set-like, while its inverse > is not.
A relation that is reflexive, symmetric, and transitive is called an equivalence relation. A relation that is reflexive,
antisymmetric, and transitive is called a partial order. A partial order that is total is called a total order, simple order,
linear order, or a chain.[4] A linear order where every nonempty set has a least element is called a well-order. A
relation that is symmetric, transitive, and serial is also reflexive.

51

Binary relation

Operations on binary relations


If R is a binary relation over X and Y, then the following is a binary relation over Y and X:
Inverse or converse: R1, defined as R1 = { (y, x) | (x, y) R }. A binary relation over a set is equal to its
inverse if and only if it is symmetric. See also duality (order theory).
If R is a binary relation over X, then each of the following is a binary relation over X:
Reflexive closure: R =, defined as R = = { (x, x) | x X } R or the smallest reflexive relation over X containing
R. This can be seen to be equal to the intersection of all reflexive relations containing R.
Reflexive reduction: R , defined as R = R \ { (x, x) | x X } or the largest irreflexive relation over X contained
in R.
Transitive closure: R +, defined as the smallest transitive relation over X containing R. This can be seen to be
equal to the intersection of all transitive relations containing R.
Transitive reduction: R , defined as a minimal relation having the same transitive closure as R.
Reflexive transitive closure: R *, defined as R * = (R +) =, the smallest preorder containing R.
Reflexive transitive symmetric closure: R , defined as the smallest equivalence relation over X containing R.
If R, S are binary relations over X and Y, then each of the following is a binary relation:
Union: R S X Y, defined as R S = { (x, y) | (x, y) R or (x, y) S }.
Intersection: R S X Y, defined as R S = { (x, y) | (x, y) R and (x, y) S }.
If R is a binary relation over X and Y, and S is a binary relation over Y and Z, then the following is a binary relation
over X and Z: (see main article composition of relations)
Composition: S R, also denoted R ; S (or more ambiguously R S), defined as S R = { (x, z) | there exists y
Y, such that (x, y) R and (y, z) S }. The order of R and S in the notation S R, used here agrees with the
standard notational order for composition of functions.

Complement
If R is a binary relation over X and Y, then the following too:
The complement S is defined as x S y if not x R y.
The complement of the inverse is the inverse of the complement.
If X = Y the complement has the following properties:
If a relation is symmetric, the complement is too.
The complement of a reflexive relation is irreflexive and vice versa.
The complement of a strict weak order is a total preorder and vice versa.
The complement of the inverse has these same properties.

Restriction
The restriction of a binary relation on a set X to a subset S is the set of all pairs (x, y) in the relation for which x and y
are in S.
If a relation is reflexive, irreflexive, symmetric, antisymmetric, asymmetric, transitive, total, trichotomous, a partial
order, total order, strict weak order, total preorder (weak order), or an equivalence relation, its restrictions are too.
However, the transitive closure of a restriction is a subset of the restriction of the transitive closure, i.e., in general
not equal.
Also, the various concepts of completeness (not to be confused with being "total") do not carry over to restrictions.
For example, on the set of real numbers a property of the relation "" is that every non-empty subset S of R with an
upper bound in R has a least upper bound (also called supremum) in R. However, for a set of rational numbers this

52

Binary relation

53

supremum is not necessarily rational, so the same property does not hold on the restriction of the relation "" to the
set of rational numbers.
The left-restriction (right-restriction, respectively) of a binary relation between X and Y to a subset S of its domain
(codomain) is the set of all pairs (x, y) in the relation for which x (y) is an element of S.

Sets versus classes


Certain mathematical "relations", such as "equal to", "member of", and "subset of", cannot be understood to be
binary relations as defined above, because their domains and codomains cannot be taken to be sets in the usual
systems of axiomatic set theory. For example, if we try to model the general concept of "equality" as a binary
relation =, we must take the domain and codomain to be the "set of all sets", which is not a set in the usual set theory.
In most mathematical contexts, references to the relations of equality, membership and subset are harmless because
they can be understood implicitly to be restricted to some set in the context. The usual work-around to this problem
is to select a "large enough" set A, that contains all the objects of interest, and work with the restriction =A instead of
=. Similarly, the "subset of" relation needs to be restricted to have domain and codomain P(A) (the power set of a
specific set A): the resulting set relation can be denoted A. Also, the "member of" relation needs to be restricted to
have domain A and codomain P(A) to obtain a binary relation A that is a set.
Another solution to this problem is to use a set theory with proper classes, such as NBG or MorseKelley set theory,
and allow the domain and codomain (and so the graph) to be proper classes: in such a theory, equality, membership,
and subset are binary relations without special comment. (A minor modification needs to be made to the concept of
the ordered triple (X, Y, G), as normally a proper class cannot be a member of an ordered tuple; or of course one can
identify the function with its graph in this context.) [5] With this definition one can for instance define a function
relation between every set and its power set.

The number of binary relations


The number of distinct binary relations on an n-element set is 2n2 (sequence A002416 in OEIS):
Number of n-element binary relations of different types
n

all

transitive reflexive preorder partial order total preorder total order equivalence relation

16

13

512

171

64

29

19

13

65536

3994

4096

355

219

75

24

15

A000670

A000142

A000110

OEIS A002416 A006905

A053763 A000798 A001035

Notes:

The number of irreflexive relations is the same as that of reflexive relations.


The number of strict partial orders (irreflexive transitive relations) is the same as that of partial orders.
The number of strict weak orders is the same as that of total preorders.
The total orders are the partial orders that are also total preorders. The number of preorders that are neither a
partial order nor a total preorder is, therefore, the number of preorders, minus the number of partial orders, minus
the number of total preorders, plus the number of total orders: 0, 0, 0, 3, and 85, respectively.
the number of equivalence relations is the number of partitions, which is the Bell number.

Binary relation

54

The binary relations can be grouped into pairs (relation, complement), except that for n = 0 the relation is its own
complement. The non-symmetric ones can be grouped into quadruples (relation, complement, inverse, inverse
complement).

Examples of common binary relations


order relations, including strict orders:

greater than
greater than or equal to
less than
less than or equal to
divides (evenly)
is a subset of

equivalence relations:
equality
is parallel to (for affine spaces)
is in bijection with
isomorphy
dependency relation, a finite, symmetric, reflexive relation.
independency relation, a symmetric, irreflexive relation which is the complement of some dependency relation.

Binary relations by property


reflexive symmetric transitive symbol
directed graph

example

undirected graph

No

Yes

tournament

No

No

dependency

Yes

Yes

weak order
preorder

Yes

partial order

Yes

partial equivalence

pecking order

Yes

Yes

preference

No

Yes

subset

Yes

Yes

equivalence relation

Yes

Yes

Yes

, , , equality

strict partial order

No

No

Yes

<

proper subset

Binary relation

55

Notes
[1] Encyclopedic dictionary of Mathematics (http:/ / books. google. co. uk/ books?id=azS2ktxrz3EC& pg=PA1331& hl=en& sa=X&
ei=glo6T_PmC9Ow8QPvwYmFCw& ved=0CGIQ6AEwBg#v=onepage& f=false). MIT. 2000. pp.13301331. ISBN0-262-59020-4. .
[2] Kilp, Knauer and Mikhalev: p. 3
[3] Yao, Y.Y.; Wong, S.K.M. (1995). "Generalization of rough sets using relationships between attribute values" (http:/ / www2. cs. uregina. ca/
~yyao/ PAPERS/ relation. pdf). Proceedings of the 2nd Annual Joint Conference on Information Sciences: 3033. ..
[4] Joseph G. Rosenstein, Linear orderings, Academic Press, 1982, ISBN 012597680, p. 4
[5] Tarski, Alfred; Givant, Steven (1987). A formalization of set theory without variables. American Mathematical Society. p.3.
ISBN0-8218-1041-3.

References
M. Kilp, U. Knauer, A.V. Mikhalev, Monoids, Acts and Categories: with Applications to Wreath Products and
Graphs, De Gruyter Expositions in Mathematics vol. 29, Walter de Gruyter, 2000, ISBN 3-11-015248-7.
Gunther Schmidt, 2010. Relational Mathematics. Cambridge University Press, ISBN 978-0-521-76268-7.

External links
Hazewinkel, Michiel, ed. (2001), "Binary relation" (http://www.encyclopediaofmath.org/index.php?title=p/
b016380), Encyclopedia of Mathematics, Springer, ISBN978-1-55608-010-4

Database normalization
Database normalization is the process of organizing the fields and tables of a relational database to minimize
redundancy and dependency. Normalization usually involves dividing large tables into smaller (and less redundant)
tables and defining relationships between them. The objective is to isolate data so that additions, deletions, and
modifications of a field can be made in just one table and then propagated through the rest of the database via the
defined relationships.
Edgar F. Codd, the inventor of the relational model, introduced the concept of normalization and what we now know
as the First Normal Form (1NF) in 1970.[1] Codd went on to define the Second Normal Form (2NF) and Third
Normal Form (3NF) in 1971,[2] and Codd and Raymond F. Boyce defined the Boyce-Codd Normal Form (BCNF) in
1974.[3] Informally, a relational database table is often described as "normalized" if it is in the Third Normal Form.[4]
Most 3NF tables are free of insertion, update, and deletion anomalies.
A standard piece of database design guidance is that the designer should create a fully normalized design; selective
denormalization can subsequently be performed for performance reasons.[5]

Objectives of normalization
A basic objective of the first normal form defined by Codd in 1970 was to permit data to be queried and manipulated
using a "universal data sub-language" grounded in first-order logic.[6] (SQL is an example of such a data
sub-language, albeit one that Codd regarded as seriously flawed.)[7]
The objectives of normalization beyond 1NF (First Normal Form) were stated as follows by Codd:
1. To free the collection of relations from undesirable insertion, update and deletion dependencies;
2. To reduce the need for restructuring the collection of relations, as new types of data are introduced,
and thus increase the life span of application programs;
3. To make the relational model more informative to users;

Database normalization

56

4. To make the collection of relations neutral to the query statistics, where these statistics are liable to
change as time goes by.
E.F. Codd, "Further Normalization of the Data Base Relational Model"[8]
The sections below give details of each of these objectives.

Free the database of modification anomalies


When an attempt is made to modify (update,
insert into, or delete from) a table, undesired
side-effects may follow. Not all tables can suffer
from these side-effects; rather, the side-effects
can only arise in tables that have not been
sufficiently normalized. An insufficiently
normalized table might have one or more of the
following characteristics:
The same information can be expressed on
multiple rows; therefore updates to the table
may result in logical inconsistencies. For
example, each record in an "Employees'
Skills" table might contain an Employee ID,
Employee Address, and Skill; thus a change
of address for a particular employee will
potentially need to be applied to multiple
records (one for each of his skills). If the
update is not carried through successfullyif,
that is, the employee's address is updated on
some records but not othersthen the table is
left in an inconsistent state. Specifically, the
table provides conflicting answers to the
question of what this particular employee's
address is. This phenomenon is known as an
update anomaly.

An update anomaly. Employee 519 is shown as having different addresses


on different records.

An insertion anomaly. Until the new faculty member, Dr. Newsome, is


assigned to teach at least one course, his details cannot be recorded.

There are circumstances in which certain


facts cannot be recorded at all. For example,
each record in a "Faculty and Their Courses"
table might contain a Faculty ID, Faculty
Name, Faculty Hire Date, and Course
Codethus we can record the details of any
A deletion anomaly. All information about Dr. Giddens is lost when he
temporarily ceases to be assigned to any courses.
faculty member who teaches at least one
course, but we cannot record the details of a
newly hired faculty member who has not yet been assigned to teach any courses except by setting the Course
Code to null. This phenomenon is known as an insertion anomaly.
Under certain circumstances, deletion of data representing certain facts necessitates deletion of data representing
completely different facts. The "Faculty and Their Courses" table described in the previous example suffers from
this type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses, we must delete the

Database normalization

57

last of the records on which that faculty member appears, effectively also deleting the faculty member. This
phenomenon is known as a deletion anomaly.

Minimize redesign when extending the database structure


When a fully normalized database structure is extended to allow it to accommodate new types of data, the
pre-existing aspects of the database structure can remain largely or entirely unchanged. As a result, applications
interacting with the database are minimally affected.

Make the data model more informative to users


Normalized tables, and the relationship between one normalized table and another, mirror real-world concepts and
their interrelationships.

Avoid bias towards any particular pattern of querying


Normalized tables are suitable for general-purpose querying. This means any queries against these tables, including
future queries whose details cannot be anticipated, are supported. In contrast, tables that are not normalized lend
themselves to some types of queries, but not others.
For example, consider an online bookseller whose customers maintain wishlists of books they'd like to have. For the
obvious, anticipated querywhat books does this customer want?it's enough to store the customer's wishlist in the
table as, say, a homogeneous string of authors and titles.
With this design, though, the database can answer only that one single query. It cannot by itself answer interesting
but unanticipated queries: What is the most-wished-for book? Which customers are interested in WWII espionage?
How does Lord Byron stack up against his contemporary poets? Answers to these questions must come from special
adaptive tools completely separate from the database. One tool might be software written especially to handle such
queries. This special adaptive software has just one single purpose: in effect to normalize the non-normalized field.
Unforeseen queries can be answered trivially, and entirely within the database framework, with a normalized table.

Example
Querying and manipulating the data within an unnormalized data structure, such as the following non-1NF
representation of customers' credit card transactions, involves more complexity than is really necessary:
Customer Jones Wilkins Stevens Transactions
Tr. ID

Date

Amount

12890 14-Oct-2003 87
12904 15-Oct-2003 50

Tr. ID

Date

Amount

12898 14-Oct-2003 21

Database normalization

58

Tr. ID

Date

Amount

12907 15-Oct-2003

18

14920 20-Nov-2003 70
15003 27-Nov-2003 60

To each customer corresponds a repeating group of transactions. The automated evaluation of any query relating to
customers' transactions therefore would broadly involve two stages:
1. Unpacking one or more customers' groups of transactions allowing the individual transactions in a group to be
examined, and
2. Deriving a query result based on the results of the first stage
For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all
customers, the system would have to know that it must first unpack the Transactions group of each customer, then
sum the Amounts of all transactions thus obtained where the Date of the transaction falls in October 2003.
One of Codd's important insights was that this structural complexity could always be removed completely, leading to
much greater power and flexibility in the way queries could be formulated (by users and applications) and evaluated
(by the DBMS). The normalized equivalent of the structure above would look like this:
Customer Tr. ID

Date

Amount

Jones

12890 14-Oct-2003

87

Jones

12904 15-Oct-2003

50

Wilkins

12898 14-Oct-2003

21

Stevens

12907 15-Oct-2003

18

Stevens

14920 20-Nov-2003 70

Stevens

15003 27-Nov-2003 60

Now each row represents an individual credit card transaction, and the DBMS can obtain the answer of interest,
simply by finding all rows with a Date falling in October, and summing their Amounts. The data structure places all
of the values on an equal footing, exposing each to the DBMS directly, so each can potentially participate directly in
queries; whereas in the previous situation some values were embedded in lower-level structures that had to be
handled specially. Accordingly, the normalized design lends itself to general-purpose query processing, whereas the
unnormalized design does not.

Background to normalization: definitions


Functional dependency
In a given table, an attribute Y is said to have a functional dependency on a set of attributes X (written X Y)
if and only if each X value is associated with precisely one Y value. For example, in an "Employee" table that
includes the attributes "Employee ID" and "Employee Date of Birth", the functional dependency {Employee
ID} {Employee Date of Birth} would hold. It follows from the previous two sentences that each
{Employee ID} is associated with precisely one {Employee Date of Birth}.
Trivial functional dependency
A trivial functional dependency is a functional dependency of an attribute on a superset of itself. {Employee
ID, Employee Address} {Employee Address} is trivial, as is {Employee Address} {Employee
Address}.
Full functional dependency

Database normalization
An attribute is fully functionally dependent on a set of attributes X if it is:
functionally dependent on X, and
not functionally dependent on any proper subset of X. {Employee Address} has a functional dependency on
{Employee ID, Skill}, but not a full functional dependency, because it is also dependent on {Employee ID}.
Transitive dependency
A transitive dependency is an indirect functional dependency, one in which XZ only by virtue of XY and
YZ.
Multivalued dependency
A multivalued dependency is a constraint according to which the presence of certain rows in a table implies
the presence of certain other rows.
Join dependency
A table T is subject to a join dependency if T can always be recreated by joining multiple tables each having a
subset of the attributes of T.
Superkey
A superkey is a combination of attributes that can be used to uniquely identify a database record. A table
might have many superkeys.
Candidate key
A candidate key is a special subset of superkeys that do not have any extraneous information in them: it is a
minimal superkey.
Examples: Imagine a table with the fields <Name>, <Age>, <SSN> and <Phone Extension>. This table has many
possible superkeys. Three of these are <SSN>, <Phone Extension, Name> and <SSN, Name>. Of those listed, only
<SSN> is a candidate key, as the others contain information not necessary to uniquely identify records ('SSN' here
refers to Social Security Number, which is unique to each person).
Non-prime attribute
A non-prime attribute is an attribute that does not occur in any candidate key. Employee Address would be a
non-prime attribute in the "Employees' Skills" table.
Prime attribute
A prime attribute, conversely, is an attribute that does occur in some candidate key.
Primary key
Most DBMSs require a table to be defined as having a single unique key, rather than a number of possible
unique keys. A primary key is a key which the database designer has designated for this purpose.

Normal forms
The normal forms (abbrev. NF) of relational database theory provide criteria for determining a table's degree of
vulnerability to logical inconsistencies and anomalies. The higher the normal form applicable to a table, the less
vulnerable it is. Each table has a "highest normal form" (HNF): by definition, a table always meets the
requirements of its HNF and of all normal forms lower than its HNF; also by definition, a table fails to meet the
requirements of any normal form higher than its HNF.
The normal forms are applicable to individual tables; to say that an entire database is in normal form n is to say that
all of its tables are in normal form n.
Newcomers to database design sometimes suppose that normalization proceeds in an iterative fashion, i.e. a 1NF
design is first normalized to 2NF, then to 3NF, and so on. This is not an accurate description of how normalization
typically works. A sensibly designed table is likely to be in 3NF on the first attempt; furthermore, if it is 3NF, it is

59

Database normalization

60

overwhelmingly likely to have an HNF of 5NF. Achieving the "higher" normal forms (above 3NF) does not usually
require an extra expenditure of effort on the part of the designer, because 3NF tables usually need no modification to
meet the requirements of these higher normal forms.
The main normal forms are summarized below.
Normal form

Defined by

In

Brief definition

1NF

First normal form Two versions: E.F.


Codd (1970), C.J.
Date (2003)

[1]
1970
and 2003
[9]

2NF

Second normal
form

E.F. Codd

1971

[2]

No non-prime attribute in the table is functionally dependent on a proper subset of


any candidate key

3NF

Third normal
form

Two versions: E.F.


Codd (1971), C.
Zaniolo (1982)

[2]
1971
and 1982
[10]

Every non-prime attribute is non-transitively dependent on every candidate key in


the table. The attributes that do not contribute to the description of the primary key
are removed from the table. In other words, no transitivity dependency is allowed.

EKNF Elementary Key


Normal Form

C. Zaniolo

1982

BCNF BoyceCodd
normal form

Raymond F. Boyce
and E.F. Codd

1974

4NF

Fourth normal
form

Ronald Fagin

1977

[12] Every non-trivial multivalued dependency in the table is a dependency on a


superkey

5NF

Fifth normal form Ronald Fagin

1979

[13] Every non-trivial join dependency in the table is implied by the superkeys of the
table

Table faithfully represents a relation and has no repeating groups

[10] Every non-trivial functional dependency in the table is either the dependency of an
elementary key attribute or a dependency on a superkey
[11] Every non-trivial functional dependency in the table is a dependency on a superkey

DKNF Domain/key
normal form

Ronald Fagin

1981

[14] Every constraint on the table is a logical consequence of the table's domain
constraints and key constraints

6NF

C.J. Date, Hugh


Darwen, and Nikos
Lorentzos

2002

[15] Table features no non-trivial join dependencies at all (with reference to generalized
join operator)

Sixth normal
form

Denormalization
Databases intended for online transaction processing (OLTP) are typically more normalized than databases intended
for online analytical processing (OLAP). OLTP applications are characterized by a high volume of small
transactions such as updating a sales record at a supermarket checkout counter. The expectation is that each
transaction will leave the database in a consistent state. By contrast, databases intended for OLAP operations are
primarily "read mostly" databases. OLAP applications tend to extract historical data that has accumulated over a
long period of time. For such databases, redundant or "denormalized" data may facilitate business intelligence
applications. Specifically, dimensional tables in a star schema often contain denormalized data. The denormalized or
redundant data must be carefully controlled during extract, transform, load (ETL) processing, and users should not
be permitted to see the data until it is in a consistent state. The normalized alternative to the star schema is the
snowflake schema. In many cases, the need for denormalization has waned as computers and RDBMS software have
become more powerful, but since data volumes have generally increased along with hardware and software
performance, OLAP databases often still use denormalized schemas.
Denormalization is also used to improve performance on smaller computers as in computerized cash-registers and
mobile devices, since these may use the data for look-up only (e.g. price lookups). Denormalization may also be
used when no RDBMS exists for a platform (such as Palm), or no changes are to be made to the data and a swift
response is crucial.

Database normalization

61

Non-first normal form (NF or N1NF)


In recognition that denormalization can be deliberate and useful, the non-first normal form is a definition of database
designs which do not conform to first normal form, by allowing "sets and sets of sets to be attribute domains" (Schek
1982). The languages used to query and manipulate data in the model must be extended accordingly to support such
values.
One way of looking at this is to consider such structured values as being specialized types of values (domains), with
their own domain-specific languages. However, what is usually meant by non-1NF models is the approach in which
the relational model and the languages used to query it are extended with a general mechanism for such structure; for
instance, the nested relational model supports the use of relations as domain values, by adding two additional
operators (nest and unnest) to the relational algebra that can create and flatten nested relations, respectively.
Consider the following table:

First Normal Form


Person Favourite Colour
Bob

blue

Bob

red

Jane

green

Jane

yellow

Jane

red

Assume a person has several favourite colours. Obviously, favourite colours consist of a set of colours modeled by
the given table. To transform a 1NF into an NF table a "nest" operator is required which extends the relational
algebra of the higher normal forms. Applying the "nest" operator to the 1NF table yields the following NF table:

Non-First Normal Form


Person Favourite Colours
Bob
Favourite Colour
blue
red

Jane
Favourite Colour
green
yellow
red

To transform this NF table back into a 1NF an "unnest" operator is required which extends the relational algebra of
the higher normal forms. The unnest, in this case, would make "colours" into its own table.
Although "unnest" is the mathematical inverse to "nest", the operator "nest" is not always the mathematical inverse
of "unnest". Another constraint required is for the operators to be bijective, which is covered by the Partitioned
Normal Form (PNF).

Database normalization

Notes and references


[1] Codd, E.F. (June 1970). "A Relational Model of Data for Large Shared Data Banks" (http:/ / www. acm. org/ classics/ nov95/ toc. html).
Communications of the ACM 13 (6): 377387. doi:10.1145/362384.362685. .
[2] Codd, E.F. "Further Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Series 6, "Data
Base Systems", New York City, May 2425, 1971.) IBM Research Report RJ909 (August 31, 1971). Republished in Randall J. Rustin (ed.),
Data Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.
[3] Codd, E. F. "Recent Investigations into Relational Data Base Systems". IBM Research Report RJ1385 (April 23, 1974). Republished in Proc.
1974 Congress (Stockholm, Sweden, 1974). , N.Y.: North-Holland (1974).
[4] C.J. Date. An Introduction to Database Systems. Addison-Wesley (1999), p. 290
[5] Chris Date, for example, writes: "I believe firmly that anything less than a fully normalized design is strongly contraindicated ... [Y]ou should
"denormalize" only as a last resort. That is, you should back off from a fully normalized design only if all other strategies for improving
performance have somehow failed to meet requirements." Date, C.J. Database in Depth: Relational Theory for Practitioners. O'Reilly (2005),
p. 152.
[6] "The adoption of a relational model of data ... permits the development of a universal data sub-language based on an applied predicate
calculus. A first-order predicate calculus suffices if the collection of relations is in first normal form. Such a language would provide a
yardstick of linguistic power for all other proposed data languages, and would itself be a strong candidate for embedding (with appropriate
syntactic modification) in a variety of host Ianguages (programming, command- or problem-oriented)." Codd, "A Relational Model of Data
for Large Shared Data Banks" (http:/ / www. acm. org/ classics/ nov95/ toc. html), p. 381
[7] Codd, E.F. Chapter 23, "Serious Flaws in SQL", in The Relational Model for Database Management: Version 2. Addison-Wesley (1990), pp.
371389
[8] Codd, E.F. "Further Normalization of the Data Base Relational Model", p. 34
[9] Date, C. J. "What First Normal Form Really Means" in Date on Database: Writings 20002006 (Springer-Verlag, 2006), pp. 127128.
[10] Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata." ACM Transactions on Database Systems 7(3),
September 1982.
[11] Codd, E. F. "Recent Investigations into Relational Data Base Systems". IBM Research Report RJ1385 (April 23, 1974). Republished in
Proc. 1974 Congress (Stockholm, Sweden, 1974). New York, N.Y.: North-Holland (1974).
[12] Fagin, Ronald (September 1977). "Multivalued Dependencies and a New Normal Form for Relational Databases" (http:/ / www. almaden.
ibm. com/ cs/ people/ fagin/ tods77. pdf). ACM Transactions on Database Systems 2 (1): 267. doi:10.1145/320557.320571. .
[13] Ronald Fagin. "Normal Forms and Relational Database Operators". ACM SIGMOD International Conference on Management of Data, May
31-June 1, 1979, Boston, Mass. Also IBM Research Report RJ2471, Feb. 1979.
[14] Ronald Fagin (1981) A Normal Form for Relational Databases That Is Based on Domains and Keys (http:/ / www. almaden. ibm. com/ cs/
people/ fagin/ tods81. pdf), Communications of the ACM, vol. 6, pp. 387415
[15] C.J. Date, Hugh Darwen, Nikos Lorentzos. Temporal Data and the Relational Model. Morgan Kaufmann (2002), p. 176

Paper: "Non First Normal Form Relations" by G. Jaeschke, H. -J Schek ; IBM Heidelberg Scientific Center. ->
Paper studying normalization and denormalization operators nest and unnest as mildly described at the end of this
wiki page.

Further reading
Litt's Tips: Normalization (http://www.troubleshooters.com/littstip/ltnorm.html)
Date, C. J. (1999), An Introduction to Database Systems (http://www.aw-bc.com/catalog/academic/product/
0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp.120125
Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)
H.-J. Schek, P. Pistor Data Structures for an Integrated Data Base Management and Information Retrieval System

62

Database normalization

63

External links
Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm) by
Mike Chapple (About.com)
Database Normalization Intro (http://www.databasejournal.com/sqletc/article.php/1428511), Part 2 (http://
www.databasejournal.com/sqletc/article.php/26861_1474411_1)
An Introduction to Database Normalization (http://mikehillyer.com/articles/
an-introduction-to-database-normalization/) by Mike Hillyer.
A tutorial on the first 3 normal forms (http://phlonx.com/resources/nf3/) by Fred Coulson
DB Normalization Examples (http://www.dbnormalization.com/)
Description of the database normalization basics (http://support.microsoft.com/kb/283878) by Microsoft
Database Normalization and Design Techniques (http://www.barrywise.com/2008/01/
database-normalization-and-design-techniques/) by Barry Wise, recommended reading for the Harvard MIS.
A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/Doc/simple5.
htm)

First normal form


First normal form (1NF) is a property of a relation in a relational database. A relation is in first normal form if the
domain of each attribute contains only atomic values, and the value of each attribute contains only a single value
from that domain.[1]
Edgar Codd, in a 1971 conference paper, defined a relation in first normal form to be one such that none of the
domains of that relation should have elements which are themselves sets.[2]
First normal form is an essential property of a relation in a relational database. Database normalization is the process
of representing a database in terms of relations in standard normal forms, where first normal is a minimal
requirement.

Examples
The following scenario illustrates how a database design might violate first normal form.

Domains and values


Suppose a designer wishes to record the names and telephone numbers of customers. He defines a customer table
which looks like this:

Customer
Customer ID First Name Surname Telephone Number
123

Robert

Ingram

555-861-2025

456

Jane

Wright

555-403-1659

789

Maria

Fernandez 555-808-9633

The designer then becomes aware of a requirement to record multiple telephone numbers for some customers. He
reasons that the simplest way of doing this is to allow the "Telephone Number" field in any given record to contain
more than one value:

First normal form

64

Customer
Customer ID First Name Surname Telephone Number
123

Robert

Ingram

555-861-2025

456

Jane

Wright

555-403-1659
555-776-4100

789

Maria

Fernandez 555-808-9633

Assuming, however, that the Telephone Number column is defined on some telephone number-like domain, such as
the domain of 12-character strings, the representation above is not in first normal form. It is in violation of first
normal form as a single field has been allowed to contain multiple values. A typical relational database management
system will not allow fields in a table to contain multiple values in this way.

A design that complies with 1NF


A design that is unambiguously in first normal form makes use of two tables: a Customer Name table and a
Customer Telephone Number table.

Customer Name
Customer ID First Name Surname
123

Robert

Ingram

456

Jane

Wright

789

Maria

Fernandez

Customer Telephone Number


Customer ID Telephone Number
123

555-861-2025

456

555-403-1659

456

555-776-4100

789

555-808-9633

Repeating groups of telephone numbers do not occur in this design. Instead, each Customer-to-Telephone Number
link appears on its own record. With Customer ID as key fields, a "parent-child" or one-to-many relationship exists
between the two tables. A record in the "parent" table, Customer Name, can have many telephone number records in
the "child" table, Customer Telephone Number, but each telephone number belongs to one, and only one customer. It
is worth noting that this design meets the additional requirements for second and third normal form.

Atomicity
Edgar F. Codd's definition of 1NF makes reference to the concept of 'atomicity'. Codd states that the "values in the
domains on which each relation is defined are required to be atomic with respect to the DBMS."[3] Codd defines an
atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special
functions)."[4] Meaning a field should not be divided into parts with more than one kind of data in it such that what
one part means to the DBMS depends on another part of the same field.
Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this
ambiguity has led to widespread confusion about how 1NF should be understood.[5][6] In particular, the notion of a

First normal form


"value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:
A character string would seem not to be atomic, as the RDBMS typically provides operators to decompose it into
substrings.
A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it
into integer and fractional components.
Date suggests that "the notion of atomicity has no absolute meaning":[7] a value may be considered atomic for some
purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is
accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string
types and numeric types to array types and table types) are then acceptable in a 1NF tablealthough perhaps not
always desirable (For example, it would be more desirable to separate a Customer Name field into two separate
fields as First Name, Surname).
First normal form, as defined by Chris Date, permits relation-valued attributes (tables within tables). Date argues that
relation-valued attributes, by means of which a field within a table can contain a table, are useful in rare cases.[8]

1NF tables as representations of relations


According to Date's definition, a table is in first normal form if and only if it is "isomorphic to some relation", which
means, specifically, that it satisfies the following five conditions:[9]
1.
2.
3.
4.
5.

There's no top-to-bottom ordering to the rows.


There's no left-to-right ordering to the columns.
There are no duplicate rows.
Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else).
All columns are regular [i.e. rows have no hidden components such as row IDs, object IDs, or hidden
timestamps].

Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in
first normal form.
Examples of tables (or views) that would not meet this definition of first normal form are:
A table that lacks a unique key. Such a table would be able to accommodate duplicate rows, in violation of
condition 3.
A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an
intrinsic and meaningful aspect of the view.[10] This violates condition 1. The tuples in true relations are not
ordered with respect to each other.
A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which
requires every field to contain exactly one value from its column's domain. It should be noted, however, that this
aspect of condition 4 is controversial. It marks an important departure from Codd's later vision of the relational
model,[11] which made explicit provision for nulls.[12]

65

First normal form

References
[1] Elmasri, Ramez and Navathe, Shamkant B. (July 2003). Fundamentals of Database Systems, Fourth Edition. Pearson. p.315.
ISBN0321204484. "It states that the domain of an attribute must include only atomic (simple, indivisible) values and that the value of any
attribute in a tuple must be a single value from the domain of that attribute."
[2] E. F. Codd (Oct 1972). "Further normalization of the database relational model". Data Base Systems. Courant Institute: Prentice-Hall.
ISBN013196741X. "A relation is in first normal form if it has the property that none of its domains has elements which are themselves sets."
[3] Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990).
[4] Codd, E. F. The Relational Model for Database Management Version 2 (Addison-Wesley, 1990), p. 6.
[5] Darwen, Hugh. "Relation-Valued Attributes; or, Will the Real First Normal Form Please Stand Up?", in C. J. Date and Hugh Darwen,
Relational Database Writings 1989-1991 (Addison-Wesley, 1992).
[6] "[F]or many years," writes Date, "I was as confused as anyone else. What's worse, I did my best (worst?) to spread that confusion through my
writings, seminars, and other presentations." Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/ page/ page/
629796. htm) in Date on Database: Writings 2000-2006 (Springer-Verlag, 2006), p. 108
[7] Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/ page/ page/ 629796. htm) p. 112.
[8] Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/ page/ page/ 629796. htm) pp. 121126.
[9] Date, C. J. "What First Normal Form Really Means" (http:/ / www. dbdebunk. com/ page/ page/ 629796. htm) pp. 127128.
[10] Such views cannot be created using SQL that conforms to the SQL:2003 standard.
[11] "Codd first defined the relational model in 1969 and didn't introduce nulls until 1979" Date, C. J. SQL and Relational Theory (O'Reilly,
2009), Appendix A.2.
[12] The third of Codd's 12 rules states that "Null values ... [must be] supported in a fully relational DBMS for representing missing information
and inapplicable information in a systematic way, independent of data type." Codd, E. F. "Is Your DBMS Really Relational?" Computerworld,
October 14, 1985.

Further reading
Litt's Tips: Normalization (http://www.troubleshooters.com/littstip/ltnorm.html)
Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (http://www.
elsevier.com/wps/product/cws_home/680662) (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.
Date, C. J. (1999), An Introduction to Database Systems (http://www.aw-bc.com/catalog/academic/product/
0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp.120125
Date, C. J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)

66

Second normal form

67

Second normal form


Second normal form (2NF) is a normal form used in database normalization. 2NF was originally defined by E.F.
Codd in 1971.[1]
A table that is in first normal form (1NF) must meet additional criteria if it is to qualify for second normal form.
Specifically: a table is in 2NF if and only if it is in 1NF and no non prime attribute is dependent on any proper subset
of any candidate key of the table. A non prime attribute of a table is an attribute that is not a part of any candidate
key of the table.
Put simply, a table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is either dependent
on the whole of a candidate key, or on another non prime attribute.
Note that when a 1NF table has no composite candidate keys (candidate keys consisting of more than one attribute),
the table is automatically in 2NF.

Example
Consider a table describing employees' skills:

Employees' Skills
Employee

Skill

Current Work Location

Jones

Typing

114 Main Street

Jones

Shorthand

114 Main Street

Jones

Whittling

114 Main Street

Bravo

Light Cleaning 73 Industrial Way

Ellis

Alchemy

73 Industrial Way

Ellis

Flying

73 Industrial Way

Harrison

Light Cleaning 73 Industrial Way

Neither {Employee} nor {Skill} is a candidate key for the table. This is because a given Employee might need to
appear more than once (he might have multiple Skills), and a given Skill might need to appear more than once (it
might be possessed by multiple Employees). Only the composite key {Employee, Skill} qualifies as a candidate key
for the table.
The remaining attribute, Current Work Location, is dependent on only part of the candidate key, namely Employee.
Therefore the table is not in 2NF. Note the redundancy in the way Current Work Locations are represented: we are
told three times that Jones works at 114 Main Street, and twice that Ellis works at 73 Industrial Way. This
redundancy makes the table vulnerable to update anomalies: it is, for example, possible to update Jones' work
location on his "Typing" and "Shorthand" records and not update his "Whittling" record. The resulting data would
imply contradictory answers to the question "What is Jones' current work location?"
A 2NF alternative to this design would represent the same information in two tables: an "Employees" table with
candidate key {Employee}, and an "Employees' Skills" table with candidate key {Employee, Skill}:

Second normal form

68

Employees
Employee Current Work Location
Jones

114 Main Street

Bravo

73 Industrial Way

Ellis

73 Industrial Way

Harrison

73 Industrial Way

Employees' Skills
Employee

Skill

Jones

Typing

Jones

Shorthand

Jones

Whittling

Bravo

Light Cleaning

Ellis

Alchemy

Ellis

Flying

Harrison

Light Cleaning

Neither of these tables can suffer from update anomalies.


Not all 2NF tables are free from update anomalies, however. An example of a 2NF table which suffers from update
anomalies is:

Tournament Winners
Tournament

Year

Winner

Winner Date of Birth

Des Moines Masters 1998 Chip Masterson 14 March 1977


Indiana Invitational

1998 Al Fredrickson

21 July 1975

Cleveland Open

1999 Bob Albertson

28 September 1968

Des Moines Masters 1999 Al Fredrickson


Indiana Invitational

21 July 1975

1999 Chip Masterson 14 March 1977

Even though Winner and Winner Date of Birth are determined by the whole key {Tournament, Year} and not part of
it, particular Winner / Winner Date of Birth combinations are shown redundantly on multiple records. This leads to
an update anomaly: if updates are not carried out consistently, a particular winner could be shown as having two
different dates of birth.
The underlying problem is the transitive dependency to which the Winner Date of Birth attribute is subject. Winner
Date of Birth actually depends on Winner, which in turn depends on the key Tournament / Year.
This problem is addressed by third normal form (3NF).

Second normal form

69

2NF and candidate keys


A table for which there are no partial functional dependencies on the primary key is typically, but not always, in
2NF. In addition to the primary key, the table may contain other candidate keys; it is necessary to establish that no
non-prime attributes have part-key dependencies on any of these candidate keys.
Multiple candidate keys occur in the following table:

Electric Toothbrush Models


Manufacturer

Model

Model Full Name

Manufacturer Country

Forte

X-Prime

Forte X-Prime

Italy

Forte

Ultraclean

Forte Ultraclean

Italy

Dent-o-Fresh

EZbrush

Dent-o-Fresh BananaBrush-2000 USA

Kobayashi

ST-60

Kobayashi ST-60

Hoch

Toothmaster Hoch Toothmaster

Germany

Hoch

X-Prime

Germany

Japan

Hoch X-Prime

Even if the designer has specified the primary key as {Model Full Name}, the table is not in 2NF. {Manufacturer,
Model} is also a candidate key, and Manufacturer Country is dependent on a proper subset of it: Manufacturer. To
make the design conform to 2NF, it is necessary to have two tables:

Electric Toothbrush Manufacturers


Manufacturer Manufacturer Country
Forte

Italy

Dent-o-Fresh

USA

Kobayashi

Japan

Hoch

Germany

Electric Toothbrush Models


Manufacturer

Model

Model Full Name

Forte

X-Prime

Forte X-Prime

Forte

Ultraclean

Forte Ultraclean

Dent-o-Fresh

EZbrush

Dent-o-Fresh BananaBrush-2000

Kobayashi

ST-60

Kobayashi ST-60

Hoch

Toothmaster Hoch Toothmaster

Hoch

X-Prime

Hoch X-Prime

Second normal form

References
[1] Codd, E.F. "Further Normalization of the Data Base Relational Model." (Presented at Courant Computer Science Symposia Series 6, "Data
Base Systems," New York City, May 24th-25th, 1971.) IBM Research Report RJ909 (August 31st, 1971). Republished in Randall J. Rustin
(ed.), Data Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.

Further reading
Litt's Tips: Normalization (http://www.troubleshooters.com/littstip/ltnorm.html)
Date, C. J., & Lorentzos, N., & Darwen, H. (2002). Temporal Data & the Relational Model (http://www.
elsevier.com/wps/product/cws_home/680662) (1st ed.). Morgan Kaufmann. ISBN 1-55860-855-9.
C.J.Date (2004). Introduction to Database Systems (8th ed.). Boston: Addison-Wesley. ISBN978-0-321-19784-9.
Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp.120125
Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)

External links
Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm) by
Mike Chapple (About.com)
An Introduction to Database Normalization (http://mikehillyer.com/articles/
an-introduction-to-database-normalization/) by Mike Hillyer.
A tutorial on the first 3 normal forms (http://phlonx.com/resources/nf3/) by Fred Coulson
Description of the database normalization basics (http://support.microsoft.com/kb/283878) by Microsoft

Third normal form


In computer science, the third normal form (3NF) is a normal form used in database normalization. 3NF was
originally defined by E.F. Codd in 1971.[1] Codd's definition states that a table is in 3NF if and only if both of the
following conditions hold:
The relation R (table) is in second normal form (2NF)
Every non-prime attribute of R is non-transitively dependent (i.e. directly dependent) on every superkey of R.
A non-prime attribute of R is an attribute that does not belong to any candidate key of R.[2] A transitive
dependency is a functional dependency in which X Z (X determines Z) indirectly, by virtue of X Y and Y Z
(where it is not the case that Y X).[3]
A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This
definition states that a table is in 3NF if and only if, for each of its functional dependencies X A, at least one of
the following conditions holds:
X contains A (that is, X A is trivial functional dependency), or
X is a superkey, or
Every element of A-X, the set difference between A and X, is a prime attribute (i.e., each column in A-X is
contained in some candidate key)[4][5]
Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent BoyceCodd normal
form (BCNF). BCNF simply eliminates the third alternative ("A is a prime attribute").

70

Third normal form

71

"Nothing but the key"


A memorable statement of Codd's definition of 3NF, paralleling the traditional pledge to give true evidence in a
court of law, was given by Bill Kent: "[Every] non-key [attribute] must provide a fact about the key, the whole key,
and nothing but the key."[6] A common variation supplements this definition with the oath: "so help me Codd".[7]
Requiring existence of "the key" ensures that the table is in 1NF; requiring that non-key attributes be dependent on
"the whole key" ensures 2NF; further requiring that non-key attributes be dependent on "nothing but the key" ensures
3NF.
Chris Date refers to Kent's summary as "an intuitively attractive characterization" of 3NF, and notes that with slight
adaptation it may serve as a definition of the slightly stronger BoyceCodd normal form: "Each attribute must
represent a fact about the key, the whole key, and nothing but the key."[8] The 3NF version of the definition is
weaker than Date's BCNF variation, as the former is concerned only with ensuring that non-key attributes are
dependent on keys. Prime attributes (which are keys or parts of keys) must not be functionally dependent at all; they
each represent a fact about the key in the sense of providing part or all of the key itself. (It should be noted here that
this rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit
composite candidate keys, since each part of any such key would violate the "whole key" clause.)
An example of a 2NF table that fails to meet the requirements of 3NF is:

Tournament Winners
Tournament

Year

Winner

Winner Date of Birth

Indiana Invitational

1998 Al Fredrickson

21 July 1975

Cleveland Open

1999 Bob Albertson

28 September 1968

Des Moines Masters 1999 Al Fredrickson


Indiana Invitational

21 July 1975

1999 Chip Masterson 14 March 1977

Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the composite
key {Tournament, Year} is a minimal set of attributes guaranteed to uniquely identify a row. That is, {Tournament,
Year} is a candidate key for the table.
The breach of 3NF occurs because the non-prime attribute Winner Date of Birth is transitively dependent on the
candidate key {Tournament, Year} via the non-prime attribute Winner. The fact that Winner Date of Birth is
functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop
the same person from being shown with different dates of birth on different records.
In order to express the same facts without violating 3NF, it is necessary to split the table into two:

Tournament Winners
Tournament

Year

Winner

Indiana Invitational

1998 Al Fredrickson

Cleveland Open

1999 Bob Albertson

Des Moines Masters 1999 Al Fredrickson


Indiana Invitational

1999 Chip Masterson

Third normal form

72

Player Dates of Birth


Player

Date of Birth

Chip Masterson 14 March 1977


Al Fredrickson

21 July 1975

Bob Albertson

28 September 1968

Update anomalies cannot occur in these tables, which are both in 3NF.

Derivation of Zaniolo's conditions


The definition of 3NF offered by Carlo Zaniolo in 1982, and given above, is proved in the following way: Let X
A be a nontrivial FD (i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R.
Then Y X. Therefore A is not transitively dependent on Y if and only if X Y, that is, if and only if X is a
superkey.[9]

Normalization beyond 3NF


Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF tables, rarely met with in
practice, are affected by such anomalies; these are tables which either fall short of BoyceCodd normal form
(BCNF) or, if they meet BCNF, fall short of the higher normal forms 4NF or 5NF.

References
[1] Codd, E.F. "Further Normalization of the Data Base Relational Model." (Presented at Courant Computer Science Symposia Series 6, "Data
Base Systems," New York City, May 24th25th, 1971.) IBM Research Report RJ909 (August 31st, 1971). Republished in Randall J. Rustin
(ed.), Data Base Systems: Courant Computer Science Symposia Series 6. Prentice-Hall, 1972.
[2] Codd, p. 43.
[3] Codd, p. 4546.
[4] Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata." ACM Transactions on Database Systems 7(3),
September 1982.
[5] Abraham Silberschatz, Henry F. Korth, S. Sudarshan, Database System Concepts (http:/ / www. db-book. com/ ) (5th edition), p. 276-277
[6] Kent, William. "A Simple Guide to Five Normal Forms in Relational Database Theory" (http:/ / www. bkent. net/ Doc/ simple5. htm),
Communications of the ACM 26 (2), Feb. 1983, pp. 120125.
[7] The author of a 1989 book on database management credits one of his students with coming up with the "so help me Codd" addendum. Diehr,
George. Database Management (Scott, Foresman, 1989), p. 331.
[8] Date, C.J. An Introduction to Database Systems (7th ed.) (Addison Wesley, 2000), p. 379.
[9] Zaniolo, p. 494.

Further reading
Date, C. J. (1999), An Introduction to Database Systems (http://www.aw-bc.com/catalog/academic/product/
0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp.120125

External links
Litt's Tips: Normalization (http://www.troubleshooters.com/littstip/ltnorm.html)
Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm) by
Mike Chapple (About.com)

Third normal form

73

An Introduction to Database Normalization (http://mikehillyer.com/articles/


an-introduction-to-database-normalization/) by Mike Hillyer.
A tutorial on the first 3 normal forms (http://phlonx.com/resources/nf3/) by Fred Coulson
Description of the database normalization basics (http://support.microsoft.com/kb/283878) by Microsoft

BoyceCodd normal form


BoyceCodd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly
stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F.
Codd to address certain types of anomaly not dealt with by 3NF as originally defined.[1]
Chris Date has pointed out that a definition of what we now know as BCNF appeared in a paper by Ian Heath in
1971.[2] Date writes:
"Since that definition predated Boyce and Codd's own definition by some three years, it seems to me
that BCNF ought by rights to be called Heath normal form. But it isn't."[3]
If a relational scheme is in BCNF then all redundancy based on functional dependency has been removed, although
other types of redundancy may still exist. A relational schema R is in BoyceCodd normal form if and only if for
every one of its dependencies X Y, at least one of the following conditions hold:[4]
X Y is a trivial functional dependency (Y X)
X is a superkey for schema R

3NF tables not meeting BCNF


Only in rare cases does a 3NF table not meet the requirements of BCNF. A 3NF table which does not have multiple
overlapping candidate keys is guaranteed to be in BCNF.[5] Depending on what its functional dependencies are, a
3NF table with two or more overlapping candidate keys may or may not be in BCNF
An example of a 3NF table that does not meet BCNF is:

Today's Court Bookings


Court Start Time End Time

Rate Type

09:30

10:30

SAVER

11:00

12:00

SAVER

14:00

15:30

STANDARD

10:00

11:30

PREMIUM-B

11:30

13:30

PREMIUM-B

15:00

16:30

PREMIUM-A

Each row in the table represents a court booking at a tennis club that has one hard court (Court 1) and one grass
court (Court 2)
A booking is defined by its Court and the period for which the Court is reserved
Additionally, each booking has a Rate Type associated with it. There are four distinct rate types:

SAVER, for Court 1 bookings made by members


STANDARD, for Court 1 bookings made by non-members
PREMIUM-A, for Court 2 bookings made by members
PREMIUM-B, for Court 2 bookings made by non-members

BoyceCodd normal form

74

Note that, Court 1 (normal quality) is less expensive than Court 2 (high quality)
The table's superkeys are:

S1 = {Court, Start Time}


S2 = {Court, End Time}
S3 = {Rate Type, Start Time}
S4 = {Rate Type, End Time}
S5 = {Court, Start Time, End Time}
S6 = {Rate Type, Start Time, End Time}
S7 = {Court, Rate Type, Start Time}
S8 = {Court, Rate Type, End Time}
ST = {Court, Rate Type, Start Time, End Time}, the trivial superkey

Note that even though in the above table Start Time and End Time attributes have no duplicate values for each of
them, we still have to admit that in some other days two different bookings on court 1 and court 2 could start at the
same time or end at the same time. This is the reason why {Start Time} and {End Time} cannot be considered as the
table's superkeys.
However, only S1, S2, S3 and S4 are candidate keys (that is, minimal superkeys for that relation) because e.g. S1 S5,
so S5 cannot be a candidate key.
Recall that 2NF prohibits partial functional dependencies of non-prime attributes (i.e. an attribute that does not occur
in ANY candidate key) on candidate keys, and that 3NF prohibits transitive functional dependencies of non-prime
attributes on candidate keys.
In Today's Court Bookings table, there are no non-prime attributes: that is, all attributes belong to some candidate
key. Therefore the table adheres to both 2NF and 3NF.
The table does not adhere to BCNF. This is because of the dependency Rate Type Court, in which the
determining attribute (Rate Type) is neither a candidate key nor a superset of a candidate key.
Dependency Rate Type Court is respected as a Rate Type should only ever apply to a single Court.
The design can be amended so that it meets BCNF:

Rate Types
Rate Type

Court Member Flag

SAVER

Yes

STANDARD

No

PREMIUM-A 2

Yes

PREMIUM-B 2

No

BoyceCodd normal form

75

Today's Bookings
Rate Type

Start Time End Time

SAVER

09:30

10:30

SAVER

11:00

12:00

STANDARD

14:00

15:30

PREMIUM-B 10:00

11:30

PREMIUM-B 11:30

13:30

PREMIUM-A 15:00

16:30

The candidate keys for the Rate Types table are {Rate Type} and {Court, Member Flag}; the candidate keys for the
Today's Bookings table are {Rate Type, Start Time} and {Rate Type, End Time}. Both tables are in BCNF. Having
one Rate Type associated with two different Courts is now impossible, so the anomaly affecting the original table
has been eliminated.

Achievability of BCNF
In some cases, a non-BCNF table cannot be decomposed into tables that satisfy BCNF and preserve the
dependencies that held in the original table. Beeri and Bernstein showed in 1979 that, for example, a set of functional
dependencies {AB C, C B} cannot be represented by a BCNF schema.[6] Thus, unlike the first three normal
forms, BCNF is not always achievable.
Consider the following non-BCNF table whose functional dependencies follow the {AB C, C B} pattern:

Nearest Shops
Person

Shop Type

Davidson Optician

Nearest Shop
Eagle Eye

Davidson Hairdresser Snippets


Wright

Bookshop

Merlin Books

Fuller

Bakery

Doughy's

Fuller

Hairdresser Sweeney Todd's

Fuller

Optician

Eagle Eye

For each Person / Shop Type combination, the table tells us which shop of this type is geographically nearest to the
person's home. We assume for simplicity that a single shop cannot be of more than one type.
The candidate keys of the table are:
{Person, Shop Type}
{Person, Nearest Shop}
Because all three attributes are prime attributes (i.e. belong to candidate keys), the table is in 3NF. The table is not in
BCNF, however, as the Shop Type attribute is functionally dependent on a non-superkey: Nearest Shop.
The violation of BCNF means that the table is subject to anomalies. For example, Eagle Eye might have its Shop
Type changed to "Optometrist" on its "Fuller" record while retaining the Shop Type "Optician" on its "Davidson"
record. This would imply contradictory answers to the question: "What is Eagle Eye's Shop Type?" Holding each
shop's Shop Type only once would seem preferable, as doing so would prevent such anomalies from occurring:

BoyceCodd normal form

76

Shop Near Person


Person

Shop

Davidson Eagle Eye


Davidson Snippets
Wright

Merlin Books

Fuller

Doughy's

Fuller

Sweeney Todd's

Fuller

Eagle Eye

Shop
Shop

Shop Type

Eagle Eye

Optician

Snippets

Hairdresser

Merlin Books

Bookshop

Doughy's

Bakery

Sweeney Todd's Hairdresser

In this revised design, the "Shop Near Person" table has a candidate key of {Person, Shop}, and the "Shop" table has
a candidate key of {Shop}. Unfortunately, although this design adheres to BCNF, it is unacceptable on different
grounds: it allows us to record multiple shops of the same type against the same person. In other words, its candidate
keys do not guarantee that the functional dependency {Person, Shop Type} {Shop} will be respected.
A design that eliminates all of these anomalies (but does not conform to BCNF) is possible.[7] This design consists of
the original "Nearest Shops" table supplemented by the "Shop" table described above.

Nearest Shops
Person

Shop Type

Davidson Optician

Nearest Shop
Eagle Eye

Davidson Hairdresser Snippets


Wright

Bookshop

Merlin Books

Fuller

Bakery

Doughy's

Fuller

Hairdresser Sweeney Todd's

Fuller

Optician

Eagle Eye

BoyceCodd normal form

77

Shop
Shop

Shop Type

Eagle Eye

Optician

Snippets

Hairdresser

Merlin Books

Bookshop

Doughy's

Bakery

Sweeney Todd's Hairdresser

If a referential integrity constraint is defined to the effect that {Shop Type, Nearest Shop} from the first table must
refer to a {Shop Type, Shop} from the second table, then the data anomalies described previously are prevented.

References
[1] Codd, E. F. "Recent Investigations into Relational Data Base Systems." IBM Research Report RJ1385 (April 23, 1974). Republished in Proc.
1974 Congress (Stockholm, Sweden, 1974). New York, N.Y.: North-Holland (1974).
[2] Heath, I. "Unacceptable File Operations in a Relational Database." Proc. 1971 ACM SIGFIDET Workshop on Data Description, Access, and
Control, San Diego, Calif. (November 11th12th, 1971).
[3] Date, C.J. Database in Depth: Relational Theory for Practitioners. O'Reilly (2005), p. 142.
[4] Silberschatz, Abraham (2006). Database System Concepts (6th ed.). McGraw-Hill. pp.333. ISBN978-0-07-352332-3.
[5] Vincent, M.W. and B. Srinivasan. "A Note on Relation Schemes Which Are in 3NF But Not in BCNF." Information Processing Letters
48(6), 1993, pp. 28183.
[6] Beeri, Catriel and Bernstein, Philip A. "Computational problems related to the design of normal form relational schemas." ACM Transactions
on Database Systems 4(1), March 1979, p. 50.
[7] Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata." ACM Transactions on Database Systems 7(3),
September 1982, pp. 493.

Bibliography
Date, C. J. (1999). An Introduction to Database Systems (8th ed.). Addison-Wesley Longman. ISBN
0-321-19784-4.

External links
Rules Of Data Normalization (http://web.archive.org/web/20080805014412/http://www.datamodel.org/
NormalizationRules.html)
Advanced Normalization (http://web.archive.org/web/20080423014733/http://www.utexas.edu/its/
archive/windows/database/datamodeling/rm/rm8.html) by ITS, University of Texas.

Fourth normal form

78

Fourth normal form


Fourth normal form (4NF) is a normal form used in database normalization. Introduced by Ronald Fagin in 1977,
4NF is the next level of normalization after BoyceCodd normal form (BCNF). Whereas the second, third, and
BoyceCodd normal forms are concerned with functional dependencies, 4NF is concerned with a more general type
of dependency known as a multivalued dependency. A Table is in 4NF if and only if, for every one of its non-trivial
multivalued dependencies X
Y, X is a superkeythat is, X is either a candidate key or a superset thereof.[1]

Multivalued dependencies
If the column headings in a relational database table are divided into three disjoint groupings X, Y, and Z, then, in the
context of a particular row, we can refer to the data beneath each group of headings as x, y, and z respectively. A
multivalued dependency X
Y signifies that if we choose any x actually occurring in the table (call this choice xc),
and compile a list of all the xcyz combinations that occur in the table, we will find that xc is associated with the same
y entries regardless of z.
A trivial multivalued dependency X
whole set of attributes of the relation.

Y is one where either Y is a subset of X, or X and Y together form the

A functional dependency is a special case of multivalued dependency. In a functional dependency X Y, every x


determines exactly one y, never more than one.

Example
Consider the following example:

Pizza Delivery Permutations


Restaurant

Pizza Variety Delivery Area

A1 Pizza

Thick Crust

Springfield

A1 Pizza

Thick Crust

Shelbyville

A1 Pizza

Thick Crust

Capital City

A1 Pizza

Stuffed Crust

Springfield

A1 Pizza

Stuffed Crust

Shelbyville

A1 Pizza

Stuffed Crust

Capital City

Elite Pizza

Thin Crust

Capital City

Elite Pizza

Stuffed Crust

Capital City

Vincenzo's Pizza Thick Crust

Springfield

Vincenzo's Pizza Thick Crust

Shelbyville

Vincenzo's Pizza Thin Crust

Springfield

Vincenzo's Pizza Thin Crust

Shelbyville

Each row indicates that a given restaurant can deliver a given variety of pizza to a given area.
The table has no non-key attributes because its only key is {Restaurant, Pizza Variety, Delivery Area}. Therefore it
meets all normal forms up to BCNF. If we assume, however, that pizza varieties offered by a restaurant are not
affected by delivery area, then it does not meet 4NF. The problem is that the table features two non-trivial
multivalued dependencies on the {Restaurant} attribute (which is not a superkey). The dependencies are:

Fourth normal form


{Restaurant}
{Restaurant}

79
{Pizza Variety}
{Delivery Area}

These non-trivial multivalued dependencies on a non-superkey reflect the fact that the varieties of pizza a restaurant
offers are independent from the areas to which the restaurant delivers. This state of affairs leads to redundancy in the
table: for example, we are told three times that A1 Pizza offers Stuffed Crust, and if A1 Pizza starts producing
Cheese Crust pizzas then we will need to add multiple rows, one for each of A1 Pizza's delivery areas. There is,
moreover, nothing to prevent us from doing this incorrectly: we might add Cheese Crust rows for all but one of A1
Pizza's delivery areas, thereby failing to respect the multivalued dependency {Restaurant}
{Pizza Variety}.
To eliminate the possibility of these anomalies, we must place the facts about varieties offered into a different table
from the facts about delivery areas, yielding two tables that are both in 4NF:

Varieties By Restaurant
Restaurant

Pizza Variety

A1 Pizza

Thick Crust

A1 Pizza

Stuffed Crust

Elite Pizza

Thin Crust

Elite Pizza

Stuffed Crust

Vincenzo's Pizza Thick Crust


Vincenzo's Pizza Thin Crust

Delivery Areas By Restaurant


Restaurant

Delivery Area

A1 Pizza

Springfield

A1 Pizza

Shelbyville

A1 Pizza

Capital City

Elite Pizza

Capital City

Vincenzo's Pizza Springfield


Vincenzo's Pizza Shelbyville

In contrast, if the pizza varieties offered by a restaurant sometimes did legitimately vary from one delivery area to
another, the original three-column table would satisfy 4NF.
Ronald Fagin demonstrated that it is always possible to achieve 4NF.[2] Rissanen's theorem is also applicable on
multivalued dependencies.

Fourth normal form

4NF in practice
A 1992 paper by Margaret S. Wu notes that the teaching of database normalization typically stops short of 4NF,
perhaps because of a belief that tables violating 4NF (but meeting all lower normal forms) are rarely encountered in
business applications. This belief may not be accurate, however. Wu reports that in a study of forty organizational
databases, over 20% contained one or more tables that violated 4NF while meeting all lower normal forms.[3]

References
[1] "A relation schema R* is in fourth normal form (4NF) if, whenever a nontrivial multivalued dependency X
Y holds for R*, then so does
the functional dependency X A for every column name A of R*. Intuitively all dependencies are the result of keys." Fagin, Ronald
(September 1977). "Multivalued Dependencies and a New Normal Form for Relational Databases" (http:/ / www. almaden. ibm. com/ cs/
people/ fagin/ tods77. pdf). ACM Transactions on Database Systems 2 (1): 267. doi:10.1145/320557.320571. .
[2] Fagin, p. 268
[3] Wu, Margaret S. (March 1992). "The Practical Need for Fourth Normal Form". ACM SIGCSE Bulletin 24 (1): 1923.
doi:10.1145/135250.134515.

Further reading
Date, C. J. (1999), An Introduction to Database Systems (http://www.aw-bc.com/catalog/academic/product/
0,1144,0321197844,00.html) (8th ed.). Addison-Wesley Longman. ISBN 0-321-19784-4.
Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp.120125
Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)
Advanced Normalization (http://www.utexas.edu/its/windows/database/datamodeling/rm/rm8.html) by
ITS, University of Texas.

Fifth normal form


Fifth normal form (5NF), also known as Project-join normal form (PJ/NF) is a level of database normalization
designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related
multiple relationships. A table is said to be in the 5NF if and only if every join dependency in it is implied by the
candidate keys.
A join dependency *{A, B, Z} on R is implied by the candidate key(s) of R if and only if each of A, B, , Z is a
superkey for R.[1]

Example
Consider the following example:

80

Fifth normal form

81

Travelling Salesman Product Availability By Brand


Travelling Salesman

Brand

Product Type

Jack Schneider

Acme

Vacuum Cleaner

Jack Schneider

Acme

Breadbox

Willy Loman

Robusto Pruning Shears

Willy Loman

Robusto Vacuum Cleaner

Willy Loman

Robusto Breadbox

Willy Loman

Robusto Umbrella Stand

Louis Ferguson

Robusto Vacuum Cleaner

Louis Ferguson

Robusto Telescope

Louis Ferguson

Acme

Vacuum Cleaner

Louis Ferguson

Acme

Lava Lamp

Louis Ferguson

Nimbus Tie Rack

The table's predicate is: Products of the type designated by Product Type, made by the brand designated by Brand,
are available from the travelling salesman designated by Travelling Salesman.
In the absence of any rules restricting the valid possible combinations of Travelling Salesman, Brand, and Product
Type, the three-attribute table above is necessary in order to model the situation correctly.
Suppose, however, that the following rule applies: A Travelling Salesman has certain Brands and certain Product
Types in his repertoire. If Brand B is in his repertoire, and Product Type P is in his repertoire, then (assuming
Brand B makes Product Type P), the Travelling Salesman must offer only the products of Product Type P made by
Brand B.
In that case, it is possible to split the table into three:

Product Types By Travelling Salesman


Travelling Salesman

Product Type

Jack Schneider

Vacuum Cleaner

Jack Schneider

Breadbox

Willy Loman

Pruning Shears

Willy Loman

Vacuum Cleaner

Willy Loman

Breadbox

Willy Loman

Umbrella Stand

Louis Ferguson

Telescope

Louis Ferguson

Vacuum Cleaner

Louis Ferguson

Lava Lamp

Louis Ferguson

Tie Rack

Fifth normal form

82

Brands By Travelling Salesman


Travelling Salesman

Brand

Jack Schneider

Acme

Willy Loman

Robusto

Louis Ferguson

Robusto

Louis Ferguson

Acme

Louis Ferguson

Nimbus

Product Types By Brand


Brand

Product Type

Acme

Vacuum Cleaner

Acme

Breadbox

Acme

Lava Lamp

Robusto Pruning Shears


Robusto Vacuum Cleaner
Robusto Breadbox
Robusto Umbrella Stand
Robusto Telescope
Nimbus Tie Rack

Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling Robusto's products. In
the previous setup we would have to add two new entries since Jack Schneider is able to sell two Product Types
covered by Robusto: Breadboxes and Vacuum Cleaners. With the new setup we need only add a single entry (in
Brands By Travelling Salesman).

Usage
Only in rare situations does a 4NF table not conform to 5NF. These are situations in which a complex real-world
constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that
table. If such a table is not normalized to 5NF, the burden of maintaining the logical consistency of the data within
the table must be carried partly by the application responsible for insertions, deletions, and updates to it; and there is
a heightened risk that the data within the table will become inconsistent. In contrast, the 5NF design excludes the
possibility of such inconsistencies..

Fifth normal form

References
[1] Analysis of normal forms for anchor-tables (http:/ / www. anchormodeling. com/ wp-content/ uploads/ 2010/ 08/ 6nf. pdf)

Further reading
Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory (http://www.bkent.net/
Doc/simple5.htm), Communications of the ACM, vol. 26, pp.120125
Date, C.J., & Darwen, H., & Pascal, F. Database Debunkings (http://www.dbdebunk.com)

Domain/key normal form


Domain/key normal form (DKNF) is a normal form used in database normalization which requires that the
database contains no constraints other than domain constraints and key constraints.
A domain constraint specifies the permissible values for a given attribute, while a key constraint specifies the
attributes that uniquely identify a row in a given table.
The domain/key normal form is achieved when every constraint on the relation is a logical consequence of the
definition of keys and domains, and enforcing key and domain restraints and conditions causes all constraints to be
met. Thus, it avoids all non-temporal anomalies.
The reason to use domain/key normal form is to avoid having general constraints in the database that are not clear
domain or key constraints. Most databases can easily test domain and key constraints on attributes. General
constraints however would normally require special database programming in the form of stored procedures that are
expensive to maintain and expensive for the database to execute. Therefore general constraints are split into domain
and key constraints.
It's much easier to build a database in domain/key normal form than it is to convert lesser databases which may
contain numerous anomalies. However, successfully building a domain/key normal form database remains a difficult
task, even for experienced database programmers. Thus, while the domain/key normal form eliminates the problems
found in most databases, it tends to be the most costly normal form to achieve. However, failing to achieve the
domain/key normal form may carry long-term, hidden costs due to anomalies which appear in databases adhering
only to lower normal forms over time.
The third normal form, BoyceCodd normal form, fourth normal form and fifth normal form are special cases of the
domain/key normal form. All have either functional, multi-valued or join dependencies that can be converted into
(super)keys. The domains on those normal forms were unconstrained so all domain constraints are satisfied.
However, transforming a higher normal form into domain/key normal form is not always a dependency-preserving
transformation and therefore not always possible.

Example
A violation of DKNF occurs in the following table:

83

Domain/key normal form

84

Wealthy Person
Wealthy Person Wealthy Person Type Net Worth in Dollars
Steve

Eccentric Millionaire

124,543,621

Roderick

Evil Billionaire

6,553,228,893

Katrina

Eccentric Billionaire

8,829,462,998

Gary

Evil Millionaire

495,565,211

(Assume that the domain for Wealthy Person consists of the names of all wealthy people in a pre-defined sample of
wealthy people; the domain for Wealthy Person Type consists of the values 'Eccentric Millionaire', 'Eccentric
Billionaire', 'Evil Millionaire', and 'Evil Billionaire'; and the domain for Net Worth in Dollars consists of all integers
greater than or equal to 1,000,000.)
There is a constraint linking Wealthy Person Type to Net Worth in Dollars, even though we cannot deduce one from
the other. The constraint dictates that an Eccentric Millionaire or Evil Millionaire will have a net worth of 1,000,000
to 999,999,999 inclusive, while an Eccentric Billionaire or Evil Billionaire will have a net worth of 1,000,000,000 or
higher. This constraint is neither a domain constraint nor a key constraint; therefore we cannot rely on domain
constraints and key constraints to guarantee that an inconsistent Wealthy Person Type / Net Worth in Dollars
combination does not make its way into the database.
The DKNF violation could be eliminated by altering the Wealthy Person Type domain to make it consist of just two
values, 'Evil' and 'Eccentric' (the wealthy person's status as a millionaire or billionaire is implicit in their Net Worth
in Dollars, so no useful information is lost).

Wealthy Person
Wealthy Person Wealthy Person Type Net Worth in Dollars
Steve

Eccentric

124,543,621

Roderick

Evil

6,553,228,893

Katrina

Eccentric

8,829,462,998

Gary

Evil

495,565,211

Wealthiness Status
Status

Minimum

Millionaire 1,000,000

Maximum
999,999,999

Billionaire 1,000,000,000 999,999,999,999

DKNF is frequently difficult to achieve in practice.

Domain/key normal form

References
Fagin, Ronald (1981). "A Normal Form for Relational Databases That Is Based on Domains and Keys" [1]. ACM
Transactions on Database Systems 6: 387415. doi:10.1145/319587.319592.
[1] http:/ / www. almaden. ibm. com/ cs/ people/ fagin/ tods81. pdf

External links
Database Normalization Basics (http://databases.about.com/od/specificproducts/a/normalization.htm) by
Mike Chapple (About.com)
An Introduction to Database Normalization (http://dev.mysql.com/tech-resources/articles/
intro-to-normalization.html) by Mike Hillyer.
Normalization (http://www.utexas.edu/its-archive/windows/database/datamodeling/rm/rm7.html) by ITS,
University of Texas.
A tutorial on the first 3 normal forms (http://phlonx.com/resources/nf3/) by Fred Coulson
Description of the database normalization basics (http://support.microsoft.com/kb/283878) by Microsoft

Sixth normal form


Sixth normal form (6NF) is a term in relational database theory, used in two different ways.

6NF (C. Date's definition)


A book by Christopher J. Date and others on temporal databases,[1] defined sixth normal form as a normal form for
databases based on an extension of the relational algebra.
In this work, the relational operators, such as join, are generalized to support a natural treatment of interval data,
such as sequences of dates or moments in time.[2] Sixth normal form is then based on this generalized join, as
follows:
A relvar R [table] is in sixth normal form (abbreviated 6NF) if and only if it satisfies no nontrivial join
dependencies at all where, as before, a join dependency is trivial if and only if at least one of the
projections (possibly U_projections) involved is taken over the set of all attributes of the relvar [table]
concerned.[Date et al.][3]
Any relation in 6NF is also in 5NF.
Sixth normal form is intended to decompose relation variables to irreducible components. Though this may be
relatively unimportant for non-temporal relation variables, it can be important when dealing with temporal variables
or other interval data. For instance, if a relation comprises a supplier's name, status, and city, we may also want to
add temporal data, such as the time during which these values are, or were, valid (e.g., for historical data) but the
three values may vary independently of each other and at different rates. We may, for instance, wish to trace the
history of changes to Status.
For further discussion on Temporal Aggregation in SQL, see also Zimanyi.[4] For a different approach, see
TSQL2.[5]

85

Sixth normal form

DKNF
Some authors use the term sixth normal form differently, namely, as a synonym for Domain/key normal form
(DKNF). This usage predates Date et al.'s work.[6]

Usage
The sixth normal form is currently being used in some data warehouses where the benefits outweigh the
drawbacks,[7] for example using Anchor Modeling. Although using 6NF leads to an explosion of tables, modern
databases can prune the tables from select queries (using a process called 'table elimination') where they are not
required and thus speed up queries that only access several attributes.

References
[1]
[2]
[3]
[4]
[5]

Date et al., 2003


op. cit., chapter 9: Generalizing the relational operators
op. cit., section 10.4, p. 176
Zimanyi 2005
Snodgrass, Richard T. TSQL2 Temporal Query Language (http:/ / www. cs. arizona. edu/ ~rts/ tsql2. html). Describes history, gives
references to standard and original book.

[6] See www.dbdebunk.com for a discussion on this topic (http:/ / www. dbdebunk. com/ page/ page/ 621935. htm)
[7] See the Anchor Modeling website (http:/ / www. anchormodeling. com) for a website that describes a data warehouse modelling method
based on the sixth normal form

Further reading
Date, C.J. (2006). The relational database dictionary: a comprehensive glossary of relational terms and concepts,
with illustrative examples. O'Reilly Series Pocket references. O'Reilly Media, Inc.. p.90.
ISBN978-0-596-52798-3.
Date, Chris J.; Hugh Darwen, Nikos A. Lorentzos (January 2003). Temporal Data and the Relational Model: A
Detailed Investigation into the Application of Interval and Relation Theory to the Problem of Temporal Database
Management. Oxford: Elsevier LTD. ISBN1-55860-855-9.
Zimanyi,, E. (June 2006). "Temporal Aggregates and Temporal Universal Quantification in Standard SQL" (http:/
/www.sigmod.org/publications/sigmod-record/0606/sigmod-record.june2006.pdf) (PDF). ACM SIGMOD
Record, volume 35, number 2, page 16. ACM.

86

Relation (database)

87

Relation (database)
In a relational database, a relation is a set of tuples (d1,d2,...,dj),
where each element dn is a member of Dn, a data domain.[1] Each
distinct domain used in the definition of a relation is called an
attribute, and each attribute may be named.
In SQL, a query language for relational databases, relations are
represented by tables, where each row of a table represents a single
tuple, and where the values of each attribute form a column.

Relation, tuple, and attribute represented as table,


row, and column.

E. F. Codd originally used the term in its mathematical sense of a finitary relation, a set of tuples on some set of n
sets S1,S2,.... ,Sn.[2] In this sense, the term was used by Augustus De Morgan in 1858.[3]
Where all values of every attribute of a relation are atomic, that relation is said to be in first normal form.

Examples
Below is an example of a relation having three named attributes: 'ID' from the domain of integers, and 'Name' and
'Address' from the domain of strings:
ID (Integer)

Name (String)

Address (String)

102

Yonezawa Akinori

Naha, Okinawa

202

Murata Makoto

Sendai, Miyagi

104

Sakamura Ken

Kumamoto, Kumamoto

152

Matsumoto Yukihiro Okinawa, Okinawa

The tuples are unordered - one cannot say "The tuple of 'Murata Makoto' is above the tuple of 'Matsumoto
Yukihiro'", nor can one say "The tuple of 'Yonezawa Akinori' is the first tuple."

Base relation variable and derived relation variable (view)


Relation variables (relvars) are classified into two classes: base relation variables and derived relation variables. By
applying a relational algebra expression or relational calculus expression to one or more relation variables, one new
relation value is derived.
A base relation variable is a source relation variable which is not derived from any other relation variables. In SQL,
a database language of relational database, the term base table can be roughly considered a base relation variable.
By using a Data Definition Language (DDL), it is able to define base relation variables. In SQL, by using CREATE
TABLE syntax, it is able to define base relation variables. The following is an example.
CREATE TABLE List_of_people (
ID INTEGER,
Name CHAR(40),
Address CHAR(200),
PRIMARY KEY (ID)
)
A derived relation variable is a relation variable which is derived from one or more relation variables by applying a
relational algebra expression or relational calculus expression. A View is considered a derived relation variable. A

Relation (database)
Data Definition Language (DDL) is used to define derived relation variables. In SQL, CREATE VIEW syntax is
used to define derived relation variables. The following is an example.
CREATE VIEW List_of_Okinawa_people AS (
SELECT ID, Name, Address
FROM List_of_people
WHERE Address LIKE '%, Okinawa'
)

References
[1] E. F. Codd (Oct 1972). "Further normalization of the database relational model". Data Base Systems. Courant Institute: Prentice-Hall.
ISBN013196741X. "R is a relation on these n sets if it is a set of elements of the form (d1, d2, ..., dj) where dj Dj for each j=1,2,...,n ."
[2] Codd, Edgar F (June 1970). "A Relational Model of Data for Large Shared Data Banks" (http:/ / www. seas. upenn. edu/ ~zives/ 03f/ cis550/
codd. pdf). Communications of the ACM (Association for Computing Machinery) 13 (6): 37787. doi:10.1145/362384.362685. . "The term
relation is used here in its accepted mathematical sense"
[3] Augustus De Morgan (1858). "On the Syllogism, No. III". Transactions of the Cambridge Philosophical Society 10: 208. "When two objects,
qualities, classes, or attributes, viewed together by the mind, are seen under some connexion, that connexion is called a relation."

Date, C. J. (1994). An Introduction to Database Systems (6 ed.). AddisonWesley. ISBN978-0-201-82458-2.


Date, C. J. (2005). Database in Depth : Relational Theory for Practitioners. Beijing: O'Reilly Media.
ISBN978-0-596-10012-4.

Functional dependency
A functional dependency is a constraint between two sets of attributes in a relation from a database.
Given a relation R, a set of attributes X in R is said to functionally determine another attribute Y, also in R, (written
X Y) if, and only if, each X value is associated with precisely one Y value. Customarily we call X the determinant
set and Y the dependent attribute. Thus, given a tuple and the values of the attributes in X, one can determine the
corresponding value of the Y attribute. In simple words, if X value is known, Y value is certainly known. For the
purposes of simplicity, given that X and Y are sets of attributes in R, X Y denotes that X functionally determines
each of the members of Yin this case Y is known as the dependent set. Thus, a candidate key is a minimal set of
attributes that functionally determine all of the attributes in a relation. The concept of functional dependency arises
when one attribute is dependent on another attribute and it also uniquely determines the other attribute.
(Note: the "function" being discussed in "functional dependency" is the function of identification.)
A functional dependency FD: X Y is called trivial if Y is a subset of X.
The determination of functional dependencies is an important part of designing databases in the relational model, and
in database normalization and denormalization. The functional dependencies, along with the attribute domains, are
selected so as to generate constraints that would exclude as much data inappropriate to the user domain from the
system as possible.
For example, suppose one is designing a system to track vehicles and the capacity of their engines. Each vehicle has
a unique vehicle identification number (VIN). One would write VIN EngineCapacity because it would be
inappropriate for a vehicle's engine to have more than one capacity. (Assuming, in this case, that vehicles only have
one engine.) However, EngineCapacity VIN, is incorrect because there could be many vehicles with the same
engine capacity.
This functional dependency may suggest that the attribute EngineCapacity be placed in a relation with candidate key
VIN. However, that may not always be appropriate. For example, if that functional dependency occurs as a result of
the transitive functional dependencies VIN VehicleModel and VehicleModel EngineCapacity then that would

88

Functional dependency

89

not result in a normalized relation.

Irreducible function depending set


A functional depending set S is irreducible if the set has the following three properties:
1. Each right set of a functional dependency of S contains only one attribute.
2. Each left set of a functional dependency of S is irreducible. It means that reducing any one attribute from left set
will change the content of S (S will lose some information).
3. Reducing any functional dependency will change the content of S.
Sets of Functional Dependencies(FD) with these properties are also called canonical or minimal.

Properties of functional dependencies


Given that X, Y, and Z are sets of attributes in a relation R, one can derive several properties of functional
dependencies. Among the most important are Armstrong's axioms, which are used in database normalization:
Subset Property (Axiom of Reflexivity): If Y is a subset of X, then X Y
Augmentation (Axiom of Augmentation): If X Y, then XZ YZ
Transitivity (Axiom of Transitivity): If X Y and Y Z, then X Z
From these rules, we can derive these secondary rules:
Union: If X Y and X Z, then X YZ
Decomposition: If X YZ, then X Y and X Z
Pseudotransitivity: If X Y and WY Z, then WX Z
Equivalent sets of functional dependencies are called covers of each other. Every set of functional dependencies has
a canonical cover.

Example
This example illustrates the concept of functional dependency. The situation modeled is that of college students
visiting one or more lectures in each of which they are assigned a teaching assistant (TA). Let's further assume that
every student is in some semester and is identified by a unique integer ID.
StudentID Semester

Lecture

TA

1234

Numerical Methods John

2380

Numerical Methods Peter

1234

Visual Computing

1201

Numerical Methods Peter

1201

Physics II

Amina

Simone

We notice that whenever two rows in this table feature the same StudentID, they also necessarily have the same
Semester values. This basic fact can be expressed by a functional dependency:
StudentID Semester.
Other nontrivial functional dependencies can be identified, for example:
{StudentID, Lecture} TA
{StudentID, Lecture} {TA, Semester}
The latter expresses the fact that the set {StudentID, Lecture} is a superkey of the relation.

Functional dependency

90

External links
Gary Burt (summer, 1999). "CS 461 (Database Management Systems) lecture notes" [1]. University of Maryland
Baltimore County Department of Computer Science and Electrical Engineering.
Jeffrey D. Ullman. "CS345 Lecture Notes" [2] (PostScript). Stanford University.
Osmar Zaiane (June 9, 1998). "CMPT 354 (Database Systems I) lecture notes" [3]. Simon Fraser University
Department of Computing Science.

References
[1] http:/ / www. cs. umbc. edu/ courses/ 461/ current/ burt/ lectures/ lec14/
[2] http:/ / www-db. stanford. edu/ ~ullman/ cs345notes/ slides01-1. ps
[3] http:/ / www. cs. sfu. ca/ CC/ 354/ zaiane/ material/ notes/ Chapter6/ node10. html

Multivalued dependency
In database theory, multivalued dependency is a full constraint between two sets of attributes in a relation.
In contrast to the functional dependency, the multivalued dependency requires that certain tuples be present in a
relation. Therefore, a multivalued dependency is a special case of tuple-generating dependency. The multivalued
dependency plays a role in the 4NF database normalization.

Formal definition
The formal definition is given as follows. [1]
Let

be a relation schema and let

(which can be read as


of tuples

and

in

and

multidetermines

(subsets). The multivalued dependency

) holds on

such that

if, in any legal relation

, there exist tuples

and

in

such that

In more simple words the above condition can be expressed as follows: if we denote by
values for
and

collectively equal to
exist in

, the tuples

and

, for all pairs

the tuple having

correspondingly, then whenever the tuples


should also exist in

Example
Consider this example of a database of teaching courses, the books recommended for the course, and the lecturers
who will be teaching the course:

Multivalued dependency

91

Course

Book

Lecturer

AHA

Silberschatz John D

AHA

Nederpelt

AHA

Silberschatz William M

AHA

Nederpelt

AHA

Silberschatz Christian G

AHA

Nederpelt

OSO

Silberschatz John D

OSO

Silberschatz William M

William M

John D

Christian G

Because the lecturers attached to the course and the books attached to the course are independent of each other, this
database design has a multivalued dependency; if we were to add a new book to the AHA course, we would have to
add one record for each of the lecturers on that course, and vice versa.
Put formally, there are two multivalued dependencies in this relation: {course}
{book} and equivalently
{course}
{lecturer}.
Databases with multivalued dependencies thus exhibit redundancy. In database normalization, fourth normal form
requires that either every multivalued dependency X
Y is trivial or for every nontrivial multivalued dependency
X
Y, X is a superkey.

Interesting properties
If
If
If

, Then
and
and

, Then
, then

The following also involve functional dependencies:


If
If

, then
and

, then

The above rules are sound and complete.


A decomposition of R into (X,Y) and (X,RY) is a lossless-join decomposition if and only if X
inR.

Y holds

Definitions
full constraint
A constraint which expresses something about all attributes in a database. (In contrast to an embedded
constraint.) That a multivalued dependency is a full constraint follows from its definition,as where it says
something about the attributes
.
tuple-generating dependency
A dependency which explicitly requires certain tuples to be present in the relation.
trivial multivalued dependency 1
A multivalued dependency which involves all the attributes of a relation i.e.
multivalued dependency implies, for tuples

and

trivial multivalued dependency 2


A multivalued dependency for which

, tuples

and

which are equal to

. A trivial
and

Multivalued dependency

92

References
[1] Silberschatz, Abraham; Korth, Sudarshan (2006). Database System Concepts (5th ed.). McGraw-Hill. p.295. ISBN0-07-124476-X.

External links
Multivalued dependencies and a new Normal form for Relational Databases (http://www.almaden.ibm.com/
cs/people/fagin/tods77.pdf) (PDF) - Ronald Fagin, IBM Research Lab

Join dependency
A join dependency is a constraint on the set of legal relations over a database scheme. A table T is subject to a join
dependency if T can always be recreated by joining multiple tables each having a subset of the attributes of T. If one
of the tables in the join has all the attributes of the table T, the join dependency is called trivial.
The join dependency plays an important role in the Fifth normal form, also known as project-join normal form,
because it can be proven that if you decompose a scheme
in tables
to
, the decomposition will be a
lossless-join decomposition if you restrict the legal relations on

to a join dependency on

called

.
Another way to describe a join dependency is to say that the set of relationships in the join dependency is
independent of each other.

Formal definition
Let

be a relation schema and let

The relation

be a decomposition of

satisfies the join dependency

if

.
.

A join dependency is trivial if one of the


is
itself.
Silberschatz, Korth. Database System Concepts, 1st Edition[1]

Example
Given a pizza-chain that models purchases in table Customer = { order-number, customer-name, pizza-name,
courier}. It is obvious that you can derive the following relations:
customer-name depends on order-number
pizza-name depends on order-number
courier depends on order-number
Since the relationships are independent you can say there is a join dependency as follows: *((order-number,
customer-name), (order-number, pizza-name), (order-number,courier)).
If each customer has his own courier however, you could have a join-dependency like this: *((order-number,
customer-name), (order-number, courier), (customer-name, courier), (order-number,pizza-name)), but
*((order-number, customer-name, courier), (order-number,pizza-name)) would be valid as well. This makes it
obvious that just having a join dependency is not enough to normalize a database scheme.

Join dependency

References
[1] Silberschatz, Korth. Database System Concepts, 1st Edition

Concurrency control
In information technology and computer science, especially in the fields of computer programming, operating
systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations
are generated, while getting those results as quickly as possible.
Computer systems, both software and hardware, consist of modules, or components. Each component is designed to
operate correctly, i.e., to obey to or meet certain consistency rules. When components that operate concurrently
interact by messaging or by sharing accessed data (in memory or storage), a certain component's consistency may be
violated by another component. The general area of concurrency control provides rules, methods, design
methodologies, and theories to maintain the consistency of components operating concurrently while interacting, and
thus the consistency and correctness of the whole system. Introducing concurrency control into a system means
applying operation constraints which typically result in some performance reduction. Operation consistency and
correctness should be achieved with as good as possible efficiency, without reducing performance below reasonable.
For example, a failure in concurrency control can result in data corruption from torn read or write operations.

Concurrency control in databases


Comments:
1. This section is applicable to all transactional systems, i.e., to all systems that use database transactions (atomic
transactions; e.g., transactional objects in Systems management and in networks of smartphones which typically
implement private, dedicated database systems), not only general-purpose database management systems
(DBMSs).
2. DBMSs need to deal also with concurrency control issues not typical just to database transactions but rather to
operating systems in general. These issues (e.g., see Concurrency control in operating systems below) are out of
the scope of this section.
Concurrency control in Database management systems (DBMS; e.g., Bernstein et al. 1987, Weikum and Vossen
2001), other transactional objects, and related distributed applications (e.g., Grid computing and Cloud computing)
ensures that database transactions are performed concurrently without violating the data integrity of the respective
databases. Thus concurrency control is an essential element for correctness in any system where two database
transactions or more, executed with time overlap, can access the same data, e.g., virtually in any general-purpose
database system. Consequently a vast body of related research has been accumulated since database systems have
emerged in the early 1970s. A well established concurrency control theory for database systems is outlined in the
references mentioned above: serializability theory, which allows to effectively design and analyze concurrency
control methods and mechanisms. An alternative theory for concurrency control of atomic transactions over abstract
data types is presented in (Lynch et al. 1993), and not utilized below. This theory is more refined, complex, with a
wider scope, and has been less utilized in the Database literature than the classical theory above. Each theory has its
pros and cons, emphasis and insight. To some extent they are complementary, and their merging may be useful.
To ensure correctness, a DBMS usually guarantees that only serializable transaction schedules are generated, unless
serializability is intentionally relaxed to increase performance, but only in cases where application correctness is not
harmed. For maintaining correctness in cases of failed (aborted) transactions (which can always happen for many
reasons) schedules also need to have the recoverability (from abort) property. A DBMS also guarantees that no
effect of committed transactions is lost, and no effect of aborted (rolled back) transactions remains in the related
database. Overall transaction characterization is usually summarized by the ACID rules below. As databases have

93

Concurrency control
become distributed, or needed to cooperate in distributed environments (e.g., Federated databases in the early 1990,
and Cloud computing currently), the effective distribution of concurrency control mechanisms has received special
attention.

Database transaction and the ACID rules


The concept of a database transaction (or atomic transaction) has evolved in order to enable both a well understood
database system behavior in a faulty environment where crashes can happen any time, and recovery from a crash to a
well understood database state. A database transaction is a unit of work, typically encapsulating a number of
operations over a database (e.g., reading a database object, writing, acquiring lock, etc.), an abstraction supported in
database and also other systems. Each transaction has well defined boundaries in terms of which program/code
executions are included in that transaction (determined by the transaction's programmer via special transaction
commands). Every database transaction obeys the following rules (by support in the database system; i.e., a database
system is designed to guarantee them for the transactions it runs):
Atomicity - Either the effects of all or none of its operations remain ("all or nothing" semantics) when a
transaction is completed (committed or aborted respectively). In other words, to the outside world a committed
transaction appears (by its effects on the database) to be indivisible, atomic, and an aborted transaction does not
leave effects on the database at all, as if never existed.
Consistency - Every transaction must leave the database in a consistent (correct) state, i.e., maintain the
predetermined integrity rules of the database (constraints upon and among the database's objects). A transaction
must transform a database from one consistent state to another consistent state (however, it is the responsibility of
the transaction's programmer to make sure that the transaction itself is correct, i.e., performs correctly what it
intends to perform (from the application's point of view) while the predefined integrity rules are enforced by the
DBMS). Thus since a database can be normally changed only by transactions, all the database's states are
consistent. An aborted transaction does not change the database state it has started from, as if it never existed
(atomicity above).
Isolation - Transactions cannot interfere with each other (as an end result of their executions). Moreover, usually
(depending on concurrency control method) the effects of an incomplete transaction are not even visible to
another transaction. Providing isolation is the main goal of concurrency control.
Durability - Effects of successful (committed) transactions must persist through crashes (typically by recording
the transaction's effects and its commit event in a non-volatile memory).
The concept of atomic transaction has been extended during the years to what has become Business transactions
which actually implement types of Workflow and are not atomic. However also such enhanced transactions typically
utilize atomic transactions as components.

Why is concurrency control needed?


If transactions are executed serially, i.e., sequentially with no overlap in time, no transaction concurrency exists.
However, if concurrent transactions with interleaving operations are allowed in an uncontrolled manner, some
unexpected, undesirable result may occur. Here are some typical examples:
1. The lost update problem: A second transaction writes a second value of a data-item (datum) on top of a first value
written by a first concurrent transaction, and the first value is lost to other transactions running concurrently
which need, by their precedence, to read the first value. The transactions that have read the wrong value end with
incorrect results.
2. The dirty read problem: Transactions read a value written by a transaction that has been later aborted. This value
disappears from the database upon abort, and should not have been read by any transaction ("dirty read"). The
reading transactions end with incorrect results.

94

Concurrency control
3. The incorrect summary problem: While one transaction takes a summary over the values of all the instances of a
repeated data-item, a second transaction updates some instances of that data-item. The resulting summary does
not reflect a correct result for any (usually needed for correctness) precedence order between the two transactions
(if one is executed before the other), but rather some random result, depending on the timing of the updates, and
whether certain update results have been included in the summary or not.
Most high-performance transactional systems need to run transactions concurrently to meet their performance
requirements. Thus, without concurrency control such systems can neither provide correct results nor maintain their
databases consistent.

Concurrency control mechanisms


Categories
The main categories of concurrency control mechanisms are:
Optimistic - Delay the checking of whether a transaction meets the isolation and other integrity rules (e.g.,
serializability and recoverability) until its end, without blocking any of its (read, write) operations ("...and be
optimistic about the rules being met..."), and then abort a transaction to prevent the violation, if the desired rules
are to be violated upon its commit. An aborted transaction is immediately restarted and re-executed, which incurs
an obvious overhead (versus executing it to the end only once). If not too many transactions are aborted, then
being optimistic is usually a good strategy.
Pessimistic - Block an operation of a transaction, if it may cause violation of the rules, until the possibility of
violation disappears. Blocking operations is typically involved with performance reduction.
Semi-optimistic - Block operations in some situations, if they may cause violation of some rules, and do not
block in other situations while delaying rules checking (if needed) to transaction's end, as done with optimistic.
Different categories provide different performance, i.e., different average transaction completion rates (throughput),
depending on transaction types mix, computing level of parallelism, and other factors. If selection and knowledge
about trade-offs are available, then category and method should be chosen to provide the highest performance.
The mutual blocking between two transactions (where each one blocks the other) or more results in a deadlock,
where the transactions involved are stalled and cannot reach completion. Most non-optimistic mechanisms (with
blocking) are prone to deadlocks which are resolved by an intentional abort of a stalled transaction (which releases
the other transactions in that deadlock), and its immediate restart and re-execution. The likelihood of a deadlock is
typically low.
Both blocking, deadlocks, and aborts result in performance reduction, and hence the trade-offs between the
categories.
Methods
Many methods for concurrency control exist. Most of them can be implemented within either main category above.
The major methods,[1] which have each many variants, and in some cases may overlap or be combined, are:
1. Locking (e.g., Two-phase locking - 2PL) - Controlling access to data by locks assigned to the data. Access of a
transaction to a data item (database object) locked by another transaction may be blocked (depending on lock type
and access operation type) until lock release.
2. Serialization graph checking (also called Serializability, or Conflict, or Precedence graph checking) - Checking
for cycles in the schedule's graph and breaking them by aborts.
3. Timestamp ordering (TO) - Assigning timestamps to transactions, and controlling or checking access to data by
timestamp order.
4. Commitment ordering (or Commit ordering; CO) - Controlling or checking transactions' chronological order of
commit events to be compatible with their respective precedence order.

95

Concurrency control
Other major concurrency control types that are utilized in conjunction with the methods above include:
Multiversion concurrency control (MVCC) - Increasing concurrency and performance by generating a new
version of a database object each time the object is written, and allowing transactions' read operations of several
last relevant versions (of each object) depending on scheduling method.
Index concurrency control - Synchronizing access operations to indexes, rather than to user data. Specialized
methods provide substantial performance gains.
Private workspace model (Deferred update) - Each transaction maintains a private workspace for its accessed
data, and its changed data become visible outside the transaction only upon its commit (e.g., Weikum and Vossen
2001). This model provides a different concurrency control behavior with benefits in many cases.
The most common mechanism type in database systems since their early days in the 1970s has been Strong strict
Two-phase locking (SS2PL; also called Rigorous scheduling or Rigorous 2PL) which is a special case (variant) of
both Two-phase locking (2PL) and Commitment ordering (CO). It is pessimistic. In spite of its long name (for
historical reasons) the idea of the SS2PL mechanism is simple: "Release all locks applied by a transaction only after
the transaction has ended." SS2PL (or Rigorousness) is also the name of the set of all schedules that can be generated
by this mechanism, i.e., these are SS2PL (or Rigorous) schedules, have the SS2PL (or Rigorousness) property.

Major goals of concurrency control mechanisms


Concurrency control mechanisms firstly need to operate correctly, i.e., to maintain each transaction's integrity rules
(as related to concurrency; application-specific integrity rule are out of the scope here) while transactions are running
concurrently, and thus the integrity of the entire transactional system. Correctness needs to be achieved with as good
performance as possible. In addition, increasingly a need exists to operate effectively while transactions are
distributed over processes, computers, and computer networks. Other subjects that may affect concurrency control
are recovery and replication.
Correctness
Serializability
For correctness, a common major goal of most concurrency control mechanisms is generating schedules with the
Serializability property. Without serializability undesirable phenomena may occur, e.g., money may disappear from
accounts, or be generated from nowhere. Serializability of a schedule means equivalence (in the resulting database
values) to some serial schedule with the same transactions (i.e., in which transactions are sequential with no overlap
in time, and thus completely isolated from each other: No concurrent access by any two transactions to the same data
is possible). Serializability is considered the highest level of isolation among database transactions, and the major
correctness criterion for concurrent transactions. In some cases compromised, relaxed forms of serializability are
allowed for better performance (e.g., the popular Snapshot isolation mechanism) or to meet availability requirements
in highly distributed systems (see Eventual consistency), but only if application's correctness is not violated by the
relaxation (e.g., no relaxation is allowed for money transactions, since by relaxation money can disappear, or appear
from nowhere).
Almost all implemented concurrency control mechanisms achieve serializability by providing Conflict serializablity,
a broad special case of serializability (i.e., it covers, enables most serializable schedules, and does not impose
significant additional delay-causing constraints) which can be implemented efficiently.

96

Concurrency control
Recoverability
See Recoverability in Serializability
Comment: While in the general area of systems the term "recoverability" may refer to the ability of a system to
recover from failure or from an incorrect/forbidden state, within concurrency control of database systems this term
has received a specific meaning.
Concurrency control typically also ensures the Recoverability property of schedules for maintaining correctness in
cases of aborted transactions (which can always happen for many reasons). Recoverability (from abort) means that
no committed transaction in a schedule has read data written by an aborted transaction. Such data disappear from the
database (upon the abort) and are parts of an incorrect database state. Reading such data violates the consistency rule
of ACID. Unlike Serializability, Recoverability cannot be compromised, relaxed at any case, since any relaxation
results in quick database integrity violation upon aborts. The major methods listed above provide serializability
mechanisms. None of them in its general form automatically provides recoverability, and special considerations and
mechanism enhancements are needed to support recoverability. A commonly utilized special case of recoverability is
Strictness, which allows efficient database recovery from failure (but excludes optimistic implementations; e.g.,
Strict CO (SCO) cannot have an optimistic implementation, but has semi-optimistic ones).
Comment: Note that the Recoverability property is needed even if no database failure occurs and no database
recovery from failure is needed. It is rather needed to correctly automatically handle transaction aborts, which may
be unrelated to database failure and recovery from it.
Distribution
With the fast technological development of computing the difference between local and distributed computing over
low latency networks or buses is blurring. Thus the quite effective utilization of local techniques in such distributed
environments is common, e.g., in computer clusters and multi-core processors. However the local techniques have
their limitations and use multi-processes (or threads) supported by multi-processors (or multi-cores) to scale. This
often turns transactions into distributed ones, if they themselves need to span multi-processes. In these cases most
local concurrency control techniques do not scale well.
Distributed serializability and Commitment ordering
See Distributed serializability in Serializability
As database systems have become distributed, or started to cooperate in distributed environments (e.g., Federated
databases in the early 1990s, and nowadays Grid computing, Cloud computing, and networks with smartphones),
some transactions have become distributed. A distributed transaction means that the transaction spans processes, and
may span computers and geographical sites. This generates a need in effective distributed concurrency control
mechanisms. Achieving the Serializability property of a distributed system's schedule (see Distributed serializability
and Global serializability (Modular serializability)) effectively poses special challenges typically not met by most of
the regular serializability mechanisms, originally designed to operate locally. This is especially due to a need in
costly distribution of concurrency control information amid communication and computer latency. The only known
general effective technique for distribution is Commitment ordering, which was disclosed publicly in 1991 (after
being patented). Commitment ordering (Commit ordering, CO; Raz 1992) means that transactions' chronological
order of commit events is kept compatible with their respective precedence order. CO does not require the
distribution of concurrency control information and provides a general effective solution (reliable, high-performance,
and scalable) for both distributed and global serializability, also in a heterogeneous environment with database
systems (or other transactional objects) with different (any) concurrency control mechanisms.[1] CO is indifferent to
which mechanism is utilized, since it does not interfere with any transaction operation scheduling (which most
mechanisms control), and only determines the order of commit events. Thus, CO enables the efficient distribution of
all other mechanisms, and also the distribution of a mix of different (any) local mechanisms, for achieving

97

Concurrency control
distributed and global serializability. The existence of such a solution has been considered "unlikely" until 1991, and
by many experts also later, due to misunderstanding of the CO solution (see Quotations in Global serializability). An
important side-benefit of CO is automatic distributed deadlock resolution. Contrary to CO, virtually all other
techniques (when not combined with CO) are prone to distributed deadlocks (also called global deadlocks) which
need special handling. CO is also the name of the resulting schedule property: A schedule has the CO property if the
chronological order of its transactions' commit events is compatible with the respective transactions' precedence
(partial) order.
SS2PL mentioned above is a variant (special case) of CO and thus also effective to achieve distributed and global
serializability. It also provides automatic distributed deadlock resolution (a fact overlooked in the research literature
even after CO's publication), as well as Strictness and thus Recoverability. Possessing these desired properties
together with known efficient locking based implementations explains SS2PL's popularity. SS2PL has been utilized
to efficiently achieve Distributed and Global serializability since the 1980, and has become the de facto standard for
it. However, SS2PL is blocking and constraining (pessimistic), and with the proliferation of distribution and
utilization of systems different from traditional database systems (e.g., as in Cloud computing), less constraining
types of CO (e.g., Optimistic CO) may be needed for better performance.
Comments:
1. The Distributed conflict serializability property in its general form is difficult to achieve efficiently, but it is
achieved efficiently via its special case Distributed CO: Each local component (e.g., a local DBMS) needs both to
provide some form of CO, and enforce a special vote ordering strategy for the Two-phase commit protocol (2PC:
utilized to commit distributed transactions). Differently from the general Distributed CO, Distributed SS2PL
exists automatically when all local components are SS2PL based (in each component CO exists, implied, and the
vote ordering strategy is now met automatically). This fact has been known and utilized since the 1980s (i.e., that
SS2PL exists globally, without knowing about CO) for efficient Distributed SS2PL, which implies Distributed
serializability and strictness (e.g., see Raz 1992, page 293; it is also implied in Bernstein et al. 1987, page 78).
Less constrained Distributed serializability and strictness can be efficiently achieved by Distributed Strict CO
(SCO), or by a mix of SS2PL based and SCO based local components.
2. About the references and Commitment ordering: (Bernstein et al. 1987) was published before the discovery of
CO in 1990. The CO schedule property is called Dynamic atomicity in (Lynch et al. 1993, page 201). CO is
described in (Weikum and Vossen 2001, pages 102, 700), but the description is partial and misses CO's essence.
(Raz 1992) was the first refereed and accepted for publication article about CO algorithms (however, publications
about an equivalent Dynamic atomicity property can be traced to 1988). Other CO articles followed. (Bernstein
and Newcomer 2009)[1] note CO as one of the four major concurrency control methods, and CO's ability to
provide interoperability among other methods.
Distributed recoverability
Unlike Serializability, Distributed recoverability and Distributed strictness can be achieved efficiently in a
straightforward way, similarly to the way Distributed CO is achieved: In each database system they have to be
applied locally, and employ a vote ordering strategy for the Two-phase commit protocol (2PC; Raz 1992, page 307).
As has been mentioned above, Distributed SS2PL, including Distributed strictness (recoverability) and Distributed
commitment ordering (serializability), automatically employs the needed vote ordering strategy, and is achieved
(globally) when employed locally in each (local) database system (as has been known and utilized for many years; as
a matter of fact locality is defined by the boundary of a 2PC participant (Raz 1992) ).

98

Concurrency control
Other major subjects of attention
The design of concurrency control mechanisms is often influenced by the following subjects:
Recovery
All systems are prone to failures, and handling recovery from failure is a must. The properties of the generated
schedules, which are dictated by the concurrency control mechanism, may have an impact on the effectiveness and
efficiency of recovery. For example, the Strictness property (mentioned in the section Recoverability above) is often
desirable for an efficient recovery.
Replication
For high availability database objects are often replicated. Updates of replicas of a same database object need to be
kept synchronized. This may affect the way concurrency control is done (e.g., Gray et al. 1996[2]).

References
Philip A. Bernstein, Vassos Hadzilacos, Nathan Goodman (1987): Concurrency Control and Recovery in
Database Systems [3] (free PDF download), Addison Wesley Publishing Company, 1987, ISBN 0-201-10715-5
Gerhard Weikum, Gottfried Vossen (2001): Transactional Information Systems [4], Elsevier, ISBN
1-55860-508-8
Nancy Lynch, Michael Merritt, William Weihl, Alan Fekete (1993): Atomic Transactions in Concurrent and
Distributed Systems [5], Morgan Kauffman (Elsevier), August 1993, ISBN 978-1-55860-104-8, ISBN
1-55860-104-X
Yoav Raz (1992): "The Principle of Commitment Ordering, or Guaranteeing Serializability in a Heterogeneous
Environment of Multiple Autonomous Resource Managers Using Atomic Commitment." [6] (PDF [7]),
Proceedings of the Eighteenth International Conference on Very Large Data Bases (VLDB), pp. 292-312,
Vancouver, Canada, August 1992. (also DEC-TR 841, Digital Equipment Corporation, November 1990)

Footnotes
[1] Philip A. Bernstein, Eric Newcomer (2009): Principles of Transaction Processing, 2nd Edition (http:/ / www. elsevierdirect. com/ product.
jsp?isbn=9781558606234), Morgan Kaufmann (Elsevier), June 2009, ISBN 978-1-55860-623-4 (page 145)
[2] Gray, J.; Helland, P.; ONeil, P.; Shasha, D. (1996). "The dangers of replication and a solution" (ftp:/ / ftp. research. microsoft. com/ pub/ tr/
tr-96-17. pdf). Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. pp.173182.
doi:10.1145/233269.233330. .
[3] http:/ / research. microsoft. com/ en-us/ people/ philbe/ ccontrol. aspx
[4] http:/ / www. elsevier. com/ wps/ find/ bookdescription. cws_home/ 677937/ description#description
[5] http:/ / www. elsevier. com/ wps/ find/ bookdescription. cws_home/ 680521/ description#description
[6] http:/ / www. informatik. uni-trier. de/ ~ley/ db/ conf/ vldb/ Raz92. html
[7] http:/ / www. vldb. org/ conf/ 1992/ P292. PDF

Concurrency control in operating systems


Multitasking operating systems, especially real-time operating systems, need to maintain the illusion that all tasks
running on top of them are all running at the same time, even though only one or a few tasks really are running at
any given moment due to the limitations of the hardware the operating system is running on. Such multitasking is
fairly simple when all tasks are independent from each other. However, when several tasks try to use the same
resource, or when tasks try to share information, it can lead to confusion and inconsistency. The task of concurrent
computing is to solve that problem. Some solutions involve "locks" similar to the locks used in databases, but they
risk causing problems of their own such as deadlock. Other solutions are Non-blocking algorithms.

99

Concurrency control

References
Andrew S. Tanenbaum, Albert S Woodhull (2006): Operating Systems Design and Implementation, 3rd Edition,
Prentice Hall, ISBN 0-13-142938-8
Silberschatz, Avi; Galvin, Peter; Gagne, Greg (2008). Operating Systems Concepts, 8th edition. John Wiley &
Sons. ISBN0-470-12872-0.

100

Article Sources and Contributors

Article Sources and Contributors


Database Source: http://en.wikipedia.org/w/index.php?oldid=518806445 Contributors: *drew, 05winsjp, 10285658sdsaa, 10metreh, 110808028 amol, 16@r, 206.31.111.xxx, 25or6to4,
28421u2232nfenfcenc, 28nebraska, 2D, 4twenty42o, 65.10.163.xxx, APH, Aaron Brenneman, Abhikumar1995, Addihockey10, Aditya gopal3, Admfirepanther, Adrian J. Hunter, Aepanico,
Afluegel, Ahodgkinson, Ahoerstemeier, Ahy1, Aitias, Aj.robin, Akamad, Al Wiseman, Alain Amiouni, Alansohn, Alasdair, Ale jrb, Allan McInnes, Allecher, Alpha Quadrant (alt), Alphax,
Alzpp, Amaraiel, Amd628, Anders Torlind, Andonic, Andre Engels, Andrewferrier, AndriuZ, Angela, Anikingos, AnjaliSinha, AnmaFinotera, Ann Stouter, AnonUser, Anonymous Dissident,
Antandrus, Antrax, Apparition11, Arbitrarily0, Arcann, Argon233, Arjun01, Armen1304, ArnoLagrange, Arthena, Arthur Rubin, Arved, ArwinJ, Asyndeton, AtheWeatherman, Atkinsdc,
AutumnSnow, Avenged Eightfold, AwamerT, Ayecee, AzaToth, Baa, Babbling.Brook, Barneca, Bbatsell, Bblank, Bcartolo, Bcontins, Beeblebrox, Beetstra, Beland, Ben Ben, Ben-Zin, Benni39,
Bentogoa, Bernd in Japan, Beta M, Betterusername, Bharath357, Bjcubsfan, Bkhouser, Blanchardb, BlindEagle, Blood Red Sandman, BluCreator, Bluemask, Bluerocket, BobStepno, Bobblewik,
Bogdangiusca, Bogey97, Boing! said Zebedee, Bongwarrior, Bowlderizer, Branzman, Brick Thrower, BrokenSphere, BryanG, Btilm, Bubba hotep, Burner0718, Buzzimu, Bwhynot14,
C12H22O11, CIreland, COMPFUNK2, CableCat, Calabe1992, Call me Bubba, Callanecc, Calliopejen1, Caltas, Calutuigor, Cambalachero, Cambapp, Cammo33, Camw, Can't sleep, clown will
eat me, CanisRufus, Canterbury Tail, Cantras, Capricorn42, Captain-n00dle, Captain-tucker, Carbonite, CardinalDan, Caster23, CasualVisitor, Cavanagh, Cenarium, CesarB, Cevalsi, Ceyjan,
Chaojoker, Chester Markel, Childzy, Chirpy, Chocolateboy, ChorizoLasagna, Chrax, Chris 73, Chris G, Chrislk02, Christophe.billiottet, Chriswiki, Chtuw, Chuckhoffmann, Chuunen Baka,
Clarince63, Clark89, Click23, Closedmouth, Colindolly, Colonies Chris, Cometstyles, Commander Keane, Comps, Constructive, Conversion script, Courcelles, Cpereyra, Cpl Syx, Cpuwhiz11,
Craftyminion, Craig Stuntz, Crashdoom, Credema, Crucis, Cryptic, Culverin, Cyan, Cyan Gardevoir, Cybercobra, Cyberjoac, CynicalMe, D. Recorder, DARTH SIDIOUS 2, DEddy, DFS454, DJ
Clayworth, DVD R W, DVdm, DamnRandall, Dan100, Dancayta, Dancter, Danhash, Daniel.Cardenas, DanielCD, Danieljamesscott, Danim, Dart88, Darth Mike, Darth Panda, Darthvader023,
Davewild, David Fuchs, David0811, Dbates1999, Dbfirs, DePiep, Dead3y3, DeadEyeArrow, DeadlyAssassin, Deathlasersonline, Decrease789, DeirdreGerhardt, Denisarona, DerBorg, DerHexer,
Deville, Dgw, Diamondland, DigitalEnthusiast, Discospinster, Djordjes, Djsasso, Dkastner, Doc glasgow, Doddsy1993, Dogposter, Donama, Doniago, DougBarry, Dougofborg, Doulos Christos,
Dreadstar, Dreamyshade, Drivenapart, Drumroll99, Duyanfang, Dwolt, Dysepsion, E23, Eagleal, Earlypsychosis, EarthPerson, EastTN, Echartre, Edgar181, Edgarde, Edivorce, Edward, Eeekster,
Ejrrjs, ElKevbo, Elwikipedista, Epbr123, Era7bd, Eric Bekins, Eric Burnett, Ericlaw02, Erikrj, Escape Orbit, Etxrge, EugeneZelenko, Everyking, Evildeathmath, Excirial, Exor674, Explicit,
Ezeu, FFGeyer, Fang Aili, FatalError, Favonian, Feedmecereal, FetchcommsAWB, Feydey, Fieldday-sunday, Filx, Finlay McWalter, Flewis, Flubeca, Fluffernutter, FlyingToaster, Fooker69,
Fortdj33, Foxfax555, Frankman, Franky21, Franl, Fratrep, Frsparrow, Fubar Obfusco, Furrykef, Fuzzie, F, G12kid, GDonato, GHe, GLaDOS, Gadfium, Gail, Galzigler, Garyzx, Giftlite,
Ginsengbomb, Girl2k, Gishac, Glacialfox, GlenPeterson, GnuDoyng, Gogo Dodo, GoingBatty, Gonfus, Gozzy345, Graeme Bartlett, GraemeL, Graham87, GrayFullbuster, GregWPhoto,
Gregfitzy, GregorB, Grim23, Grsmca, Grstain, Gsallis, Gscshoyru, Gwizard, Gzkn, Haakon, Hadal, HaeB, Hamtechperson, Hankhuck, HappyInGeneral, Harej, Hasek is the best, HeliXx,
Helixblue, Helloher, HexaChord, Heymid, Heysim0n, Hotstaff, Hugsandy, Huntthetroll, Hurricane111, HybridBoy, Hydrogen Iodide, IElonex!, Iced Kola, Igoldste, Imfargo, Imnotminkus, Imran,
Informatwr, Insineratehymn, Inspector 34, Ironman5247, Isfisk, Itafran2010, ItsZippy, Ixfd64, J.delanoy, JCLately, JForget, JLaTondre, JMRyan, Ja 62, JaGa, Jab843, Jackacon, JamesBWatson,
Jan1nad, Jasimab, Jasper Deng, Jauerback, Javert, Jaxl, Jay, Jb-adder, Jclemens, Jdlambert, JeffTan, JeffreyYasskin, Jennavecia, JephapE, Jerome Charles Potts, Jhfireboy, Jk2q3jrklse,
Jmanigold, Jni, JoanneB, Joel7687, John Vandenberg, John of Reading, Johnuniq, Jojalozzo, Jonathan Webley, JonathanFreed, Jondel, Jonearles, Jonwynne, Joshnpowell, Joshwa1234567890,
Journalist, Jschnur, Jstaniek, Jvhertum, Jwoodger, Jwy, KILLERKEA23, Kanonkas, Karlhahn, Karmafist, Katalaveno, Keenan Pepper, Keilana, Kekekecakes, Kellyk99, Kenny sh, Kevins,
KeyStroke, Khoikhoi, Kiand, Kimberly ayoma, Kimera Kat, King of Hearts, Kingius, Kingpin13, KingsleyIdehen, Kivar2, Kkailas, Knbanker, Koavf, Komal.Ar, KoshVorlon, KotetsuKat,
Kozmando, Kraftlos, Krashlandon, Kslays, Kubigula, Kukini, Kunaldeo, Kungfuadam, Kuru, Kushal one, Kvasilev, Kwiki, KyraVixen, Kzzl, L Kensington, LC, LaosLos, Larsinio, Latka,
Leaderofearth, Leandrod, LeaveSleaves, LeeHam2007, Leonnicholls07, LessHeard vanU, Levin, Levin Carsten, Lexo, Lflores92201, Lfstevens, Lguzenda, Lights, LindaEllen, Lingliu07,
Lingwitt, Linkspamremover, LittleOldMe, LittleWink, Llyntegid, Lod, Logan, Lotje, Lovefamosos, Lovelac7, Lowellian, Lradrama, Lsschwar, LuK3, Lucyin, Luizfsc, Luna Santin, M.badnjki,
M4gnum0n, MBisanz, MECiAf., MER-C, Machdohvah, Madhava 1947, Majorly, Malvikiran, Mandarax, Manikandan 2030, Mannafredo, Marasmusine, Mark Renier, MarkSutton,
MartinSpamer, Materialscientist, Mathewforyou, Mato, Matthewrbowker, Matticus78, Mattisse, Maty18, Maury Markowitz, Max Naylor, Maxferrario, Maxime.Debosschere, Maxmarengo,
Mayur, Mazca, Mboverload, McGeddon, Mdd, Meaghan, Megatronium, Melucky2getu, Mentifisto, Mephistophelian, Mercy11, Methnor, Mhkay, Michael Hardy, Michael Slone, Microchip08,
Mike Dillon, Mike Rosoft, Mike Schwartz, MikeSy, Mikeblas, Mikey180791, MilerWhite, Millermk, Milo99, Mindmatrix, Minimac, Minna Sora no Shita, Mkeranat, Moreschi, Morwen,
MrNoblet, Mrt3366, Mspraveen, Mugaliens, Mukherjeeassociates, Mulad, Mumonkan, Mushroom, Mxn, N1RK4UDSK714, N25696, NAHID, NSR, Nafclark, Namlemez, Nanshu, NathanBeach,
NawlinWiki, NetManage, Netizen, NewEnglandYankee, Ngpd, Nick, Nicoosuna, Niteowlneils, Nk, Noah Salzman, Noctibus, Noldoaran, Northamerica1000, Northernhenge, Nsaa, Nurg, Ocaasi,
Oda Mari, Odavy, Ohka-, Oho1, Oli Filth, Olinga, OllieFury, OnePt618, OrgasGirl, Oroso, OverlordQ, PJM, PaePae, Pak21, Parzi, PatrikR, Paul August, Paul Drye, Paul E Ester, Paul
Foxworthy, Paulinho28, Pcb21, Pdcook, Peashy, Pee Tern, PeeAeMKay, Pengo, PeregrineAY, Peruvianllama, Peter Karlsen, Peter.C, Pgk, Phantomsteve, Pharaoh of the Wizards, Phearlez,
PhilKnight, Philip Trueman, Philippe, Phoenix-wiki, Piano non troppo, Pillefj, Pingveno, Pinkadelica, Pjoef, Plrk, Pnm, Poeloq, Pol098, Poor Yorick, Poterxu, Praba tuty, Prari, Prashanthns,
PrePress, Preet91119, Proofreader77, Prunesqualer, Psaajid, Psb777, Puchiko, Pvjohnson, Pyfan, Quadell, Qwertykris, Qwyrxian, Qxz, R'n'B, R3miixasim, RIH-V, RadioFan, RadioKirk,
Railgun, Rakeki, Ravinjit, RayGates, RayMetz100, RazorXX8, Rdsmith4, Reaper Eternal, Refactored, Regancy42, Reidh21234, Rettetast, RexNL, Rhobite, Rich Farmbrough, Ricky81682,
Ringbang, Rishu arora11, Riverraisin, Rj Haseeb, Rjwilmsi, Robert Merkel, Robert Skyhawk, Robocoder, Robth, Romanm, Rotanagol, Rothwellisretarded, Roux, Rursus, Ruud Koot, Ryager,
Ryanslater, Ryanslater2, Ryulong, S.K., SAE1962, SDSWIKI, SJP, SWAdair, Sae1962, Saiken79, Salvio giuliano, Sam Barsoom, Sam Korn, SamJohnston, Samir, Sandman, Sango123,
Sarchand, SarekOfVulcan, SatuSuro, Saturdayswiki, Savh, ScMeGr, Sceptre, Seanust 1, Seaphoto, SebastianHelm, Serketan, Several Pending, Sewebster, Shadowjams, Shadowseas, Sheeana,
Shipmaster, Shirulashem, Siebren, Silly rabbit, Simeon, Simetrical, SimonMorgan, Sintaku, Sir Nicholas de Mimsy-Porpington, Sissi's bd, Siteobserver, Sjakkalle, Sjc, Skybrian, Slakr, Sleske,
SnoFox, Somchai1029, Sonett72, Sonia, Soosed, Soumark, SpK, Spartaz, Spazure, Spdegabrielle, SpikeTorontoRCP, SpuriousQ, Srdju001, Srikeit, Ssd, StaticVision, Stdazi, Stephen Gilbert,
Stevertigo, Stifle, Stirling Newberry, Storm Rider, Strongsauce, Stuhacking, Sucker666, Sudarevic, Suffusion of Yellow, SuperHamster, Supertouch, Supreme Deliciousness, Supten,
SwisterTwister, SymlynX, Sythy2, Tabletop, Tablizer, TakuyaMurata, TalkyLemon, Tasc, Tazmaniacs, Th1rt3en, Thatperson, The Anome, The Thing That Should Not Be, The Wiki Octopus,
The wub, TheGrimReaper NS, TheNewPhobia, Thedjatclubrock, Thehulkmonster, Theimmaculatechemist, Theodolite, Theory of deadman, Thingg, Think outside the box, Thinktdub,
Thomasryno, ThumbFinger, Thumperward, Tictacsir, Tide rolls, Tim Q. Wells, TimBentley, TittoAssini, Tobias Bergemann, Tomatronster, Tonydent, Toquinha, Tpbradbury, Treekids,
TrentonLipscomb, Trevor MacInnis, Triwbe, Troels Arvin, Trusilver, Tualha, Tudorol, Tuhl, Turlo Lomon, Turnstep, Twebby, Twelvethirteen, TwistOfCain, TwoTwoHello, Twsx, TyA,
UberScienceNerd, Ubiq, Ugebgroup8, Ulric1313, Ultraexactzz, Uncle Dick, Unyoyega, VNeumann, Vary, Velella, Versus22, Veryprettyfish, Vespristiano, Victor falk, Vikreykja, Vincent Liu,
Vipinhari, Vishnava, Visor, Vivacewwxu, VoxLuna, Vrenator, W mccall, W163, WOSlinker, Waggers, Waveguy, Wavelength, Weetoddid, Welsh, Werdna, Widefox, Wifione, Wik, Wiki alf,
Wiki tiki tr, Wikidrone, Wikipelli, WikiuserNI, Willking1979, Wimt, Windsok, Winterst, Wipe, Wmahan, Woohookitty, Woseph, Writeread82, Wulfila, Wwmbes, Wya 7890, Xhelllox, Xin0427,
Yossman007, ZenerV, Zhenqinli, Zhou Yu, Zipircik, Zippanova, ZooPro, Zro, Zundark, Zzuuzz, , , , 2838 anonymous edits
Entityrelationship model Source: http://en.wikipedia.org/w/index.php?oldid=518701197 Contributors: 16@r, AJackl, Ahy1, Alksentrs, Alleborgo, Amitchaudhary, Anarchitect, Andrejj, Ann
Stouter, Anshuk, Ardonik, Armoreno10, Arronax50, Arunloboforever, Asavari24, Axeltroike, Ayush Samantroy, Bencoder, Benthompson, BigSmoke, Bignose, Bilby, Bobo192, Bolo1729,
Borgx, BozMo, Brick Thrower, Bsodmike, CDV, Caltas, CapitalR, Causa sui, Cedric.claidiere, Centrx, Charlesbc, ChrisEich, Claygate, Cohesion, Colin Angus Mackay, Corb3126, Craigwb,
DStoykov, Danhuby, Danim, Datapolitical, Dburbank, DeadEyeArrow, Deragon, Diligentdogs, Diptanshu.D, Discospinster, Diza, Doctorambient, Dodo bird, Doscmaker, Doug Bell, Dream of
Nyx, Duke Ganote, Dumelow, ERfan111, EWikist, EagleFan, Edward, Egrabczewski, Eirikr, Ekillaby, Elsendero, Elwikipedista, Erics, Fan-1967, Fieldday-sunday, Finell, Firsfron, Forage,
Ftiercel, Fuper, Gebe, Godrickwok, Gonwin, GraemeL, GreyCat, Gronky, Gsmgm, HTS3000, Heightwatcher, HerrSchnapps, Hex, HexaChord, HisSpaceResearch, Hu, Hu12, Hugsandy, ITBlair,
Ian channing, Informatwr, Ixfd64, J.delanoy, JD554, JLaTondre, Ja 62, Jan Hidders, Jandalhandler, Jason Quinn, Jay, Jay Gatsby, Jeff3000, JerkerNyberg, Johncartmell, Johncurrier, Jolta,
Jonathan Webley, JorisvS, Jwilkinson, Kalotus, Karlcuya, Karnesky, Kate, Keilana, Kemorgan, KennethJ, KeyStroke, Khalid hassani, Khazar2, Kirklm, Klausness, Kliford, Koavf, Kocio, Kop,
Kubigula, Kushalbiswas777, Lambmj, Lancew, Legaia, Leggattst, Libcub, Ligulem, LockeShocke, Lprichar, MBisanz, Malcolmxl5, Mann jess, Manning Bartlett, Marekich, Mark Arsten, Mark
Renier, Master of Puppets, Materialscientist, Matthew 1130, Mcowpland, Mdd, Meegs, MelbourneStar, Mhkay, Michael Hardy, Michael miceli, Miriup, Misteror, Mjb, Motyka, Msnicki, NBthee,
NSLE, Nareshyalagala, Nickcarr, Nishadha, Nn123645, Northernhenge, Northumbrian, Nt777, Octahedron80, Ohnoitsjamie, OsamaBinLogin, Ottomachin, Patrick Deelen, Patriotic dissent, Paul
Bassin, Paulgeorgebassin, Paullewis4372, Pde, Perrydegner, Perter Nokia, Petr Dlouh, Plhale667, Pm master, Pmmulligan, Pooryorick, Puffin, Raffen, Ralph Corderoy, Rambaldi47, Raven4x4x,
Ravinjit, Rejeantremblay, Ren, Riccardo.fabris, RichardVeryard, Roadmr, RobertG, RobertRoggenbuck, Robsta, RockMFR, Ronjouch, Ronz, Rosattin, Rp, Rsrikanth05, Rvstephenson,
Sae1962, SarekOfVulcan, Saturday, Saurabha5, Saustin1, Sbachmn, ScottHJohnson, Sean Whitton, Slovakjoe, SoftwareDeveloper, Softwiki, Sstrauch, Stanislav Nowak, Stephenb,
SteveChervitzTrutane, SteveLichtman, Stuque, Stwalkerster, Sulai, Supten, Svirf, THEN WHO WAS PHONE?, Tdrewry, Texture, The undertow, TheMattrix, Thejoshwolfe, Threazy, Tide rolls,
Toyota prius 2, Tumble, Turnstep, TwoTwoHello, TypoBoy, Udit90, Udo Altmann, UnitedStatesian, Vespristiano, Vishnava, Vrenator, Wdhoke, Whadda, Willking1979, Wizgha, Wsandiego,
Wwmbes, Xilaworp, Yahya Abdal-Aziz, Zanaq, Zhenyu, Zondor, ZweiOhren, ^demon, 540 anonymous edits
Database design Source: http://en.wikipedia.org/w/index.php?oldid=507370466 Contributors: A. B., Agrumer, Allecher, Allen3, Alpha Quadrant, AnmaFinotera, Anna Lincoln, ArchCarrier,
Arthur Rubin, Binksternet, Bobo192, Bruce s r, Capricorn42, Codeczero, Cschutijser, Cyrius, DVdm, Danim, David Delony, DavidCHay, Dethomas, Disnarda, Dr. Zombieman, Dr1819, Dreikin,
Ejiskolo, ElKevbo, Elwikipedista, Falcon8765, Focus22, Fraggle81, Friendlydata, Giteshtrivedi, Grsmca, Handcover, Hrundi Bakshi, Jason Quinn, Jon Awbrey, Jwoodger, KeyStroke, Kku, LAX,
Lguzenda, LilHelpa, Lingliu07, M4gnum0n, Magioladitis, Maimai009, Mark Arsten, Mark Renier, Matt Beard, Mattbrundage, Mdd, Mentifisto, Mikimik, Miquonranger03, Mixer, Myasuda,
Neerajadsul, Ohnoitsjamie, Oxguy3, Puchiko, RMFan1, Ravenmichael, Ravinjit, Rbellika, Rednblu, RichardVeryard, Sanskritkanji, Seaphoto, Shamanx, SkyWalker, Smokris, Soap, Sobloku,
Tarotcards, The Letter J, Tkgd2007, Troels Arvin, Turgan, TyA, White Trillium, Xcentaur, Xhienne, Zzuuzz, 160 anonymous edits
Relational database Source: http://en.wikipedia.org/w/index.php?oldid=518158542 Contributors: *Kat*, 01001, 127, 217.162.105.xxx, 64.192.12.xxx, Abdull, Abolen, Adamcscott, Adamrush,
Admrboltz, Agateller, Ahoerstemeier, Alain Amiouni, Alansohn, Andre Engels, Angela, Anuja297, Appzter, Astheg, AutumnSnow, Banes, Beland, Beno1000, Bitnine, Bobo2000, Boothy443,
Booyabazooka, Bpalitaa, Brick Thrower, Bsdlogical, CALR, Calmer Waters, Calvernaz, Chris.Giles, Chrislk02, Conversion script, Craig Stuntz, Crosbiesmith, Cww, DARTH SIDIOUS 2,
DVdm, Dandv, Danim, Dannydaman9, Darth Mike, Dave6, David Delony, Dfeuer, DinosaursLoveExistence, Dionyziz, Dirk P Broer, Drgs100, Dschwart11, Dumbledad, EagleFan, Eik Corell, El

101

Article Sources and Contributors


C, ElKevbo, Emperorbma, Fabrictramp(public), FatalError, FayssalF, Ferkelparade, Fidimayor, Fieldday-sunday, Filiocht, Findling67, FlyingDoctor, Francs2000, Fratrep, Fred Bradstadt,
Freediving-beava, Frigotoni, Fuddle, Gaur1982, Gerbrant, Giftlite, Glane23, GoingBatty, Graham87, HJ Mitchell, Hapsiainen, Harold f, Herostratus, Hmrox, Hp-NJITWILL, I do not exist,
ILikeBeer, IRP, Ideogram, Iohannes Animosus, J.delanoy, JCLately, JLaTondre, JaGa, Jacobrothstein, Jan Hidders, Jitendraapi, Jncraton, John Vandenberg, Johnuniq, Jon Awbrey, Jwoodger,
Jna runn, K faiad, KingsleyIdehen, Klausness, KnowledgeOfSelf, Kostmo, Kraron, Kristensson, Krogstadt, Kuru, Lalapicklem, Larsinio, Leandrod, Lfstevens, Linlasj, Logthis, Looxix, Luna
Santin, MC MasterChef, MER-C, Mac, MainlyDigGrammar, Mandarax, Manop, Mark Renier, Mark T, Mav, Mblumber, Mckaysalisbury, Merlion444, Metroking, Michael Hardy, Michael
Hodgson, Mikeblas, MilerWhite, Mindmatrix, Msikma, NHRHS2010, Nannahara, Nanshu, Nisantha04, Niteowlneils, Nocohen, Ns.code, Odie5533, Odysseus1479, Olinga, Oursinees324,
OwenBlacker, Oxymoron83, Pablo323, Pdcook, Pearle, Philcha, Pietdesomere, Pinkadelica, Psb777, Psychcf, Quitchy, Rasmus Faber, RayGates, Rchertzy, Rfl, Riggz0rer, Romanm, Rrburke,
SandyGeorgia, Scouser0phil, Sequologist, Sfe1, Sgiovannini, Shinju, Sir Nicholas de Mimsy-Porpington, Sir Vicious, Slightlyusefulcat, Smjg, Solipsist, Sonett72, Specialbrad, Spiritia, Spudtater,
SqlPac, Stare at the sun, SteinbDJ, Steve Casburn, Supten, TJRC, Tcncv, Tcnuk, Ted Longstaffe, Teles, TheDJ, Thingg, Tobias Bergemann, Todd Vredevoogd, Tom harrison, Triddle, Triwbe,
Troels Arvin, Ttguy, Turnstep, Utcursch, Vespristiano, Wavelength, Wesley, Wolfraem, Wolfsbane2k, Xiong, Xphile2868, Zahid Abdassabur, Zipircik, 536 anonymous edits
Relational model Source: http://en.wikipedia.org/w/index.php?oldid=515822104 Contributors: 130.94.122.xxx, 62.114.199.xxx, Adamcscott, Altenmann, AndrewWTaylor, AndrewWarden,
AndyKali, AnonMoos, Arthur Rubin, Ashrust, Audiodude, AutumnSnow, Aytharn, BD2412, BMF81, Babbling.Brook, Bblfish, Beland, Bento00, Bobo192, BonsaiViking, Brick Thrower, Brion
VIBBER, Budloveall, CBM, Cadr, Cathy Linton, Cconnett, ChaosControl, Chessphoon, Chrisahn, Chrissi, Conti, Conversion script, Craig Stuntz, Crashoffer12345, Crosbiesmith, DARTH
SIDIOUS 2, Damian Yerrick, Danim, DaveVoorhis, David Eppstein, Derek Ross, Dreadstar, Drunken Pirate, EagleFan, Ehn, Elwikipedista, Emx, Enric Naval, Erik Garrison, Evildeathmath,
Furrykef, Fyrael, Gadfium, Gary D, Gary King, Giftlite, Grassnbread, Greenrd, Gregbard, GregorB, Gurch, Hans Adler, Helvetius, Hyacinth, Ideogram, Iluvitar, Immunize, Irishguy, Ixfd64,
J04n, JCLately, Jadedcrypto, Jalesh, Jan Hidders, Jarble, Jbolden1517, Jeff3000, JesseW, Jklowden, Jmabel, Joelm, Jon Awbrey, Jpbowen, Kassie, Kendrick Hang, Ketiltrout, Khalid hassani,
Kimchi.sg, Kjkolb, Klausness, Korrawit, Lahiru k, Larsinio, Leandrod, Leifbk, Lethe, Lfstevens, MER-C, Madjestic, Maokart444, MarXidad, Marc Venot, Mark Renier, Materialscientist, Matt
Deres, Mckaysalisbury, Mdd, Metaeducation, Mets501, Mhkay, Michael Hardy, MilerWhite, Mindmatrix, Moogwrench, Muntfish, NSash, Nad, Nascar1996, Neilc, Niteowlneils, NonDucor, Nsd,
Ocaasi, Ocrow, Ozten, Pablothegreat85, Paul Foxworthy, Pitix, Pmsyyz, Pol098, Quazak Zouski, R'n'B, Razorbliss, Rbrwr, Reedy, Reyk, RonaldKunenborg, Ronhjones, Rp, Rursus, Ruud Koot,
S.K., Sdorrance, Seraphim, SeventyThree, Sietse Snel, Simetrical, SimonP, Sonett72, Spartan-James, Spellcast, SpuriousQ, SqlPac, SteinbDJ, Stevertigo, THEN WHO WAS PHONE?, Tatrgel,
Teknic, The-G-Unit-Boss, Tjic, Tobias Bergemann, Tohobbes, Tony1, Toreau, Troels Arvin, Tualha, Turnstep, Vikreykja, Welsh, Wgsimon, Windharp, Winhunter, Woohookitty, Zklink, 285
anonymous edits
Binary relation Source: http://en.wikipedia.org/w/index.php?oldid=512222707 Contributors: AHMartin, Adrian.benko, Alansohn, Andycjp, Anonymous Dissident, Ardric47, Arthur Rubin,
AxelBoldt, BYS2, Boute, Brambleclawx, Bryan Derksen, CBM, CRGreathouse, Charles Matthews, Chocolateboy, Classicalecon, Constructive editor, Conversion script, Cybercobra, Dallashan,
DavidCBryant, Dbtfz, Djjrjr, Dmcq, Dominus, Dpv, Dreftymac, DuaneLAnderson, Egriffin, Eje211, Elwikipedista, EmilJ, Ernsts, Floquenbeam, Fresheneesz, Fropuff, Geero, Giftlite,
Hahahaha4, Hairy Dude, Hans Adler, Happy-melon, Henry Delforn (old), Ht686rg90, IdealOmniscience, Isheden, Isomorphic, Ixfd64, Jan Hidders, Jellystones, Jitse Niesen, Jlr, Jon Awbrey,
Jorge Stolfi, Joriki, Keenan Pepper, Koertefa, Koffieyahoo, Labus, LajujKej, Lambiam, Lerutit, Linas, MFH, Mark Renier, MathMartin, Mets501, Mfwitten, MiNombreDeGuerra, Michael Hardy,
Mjb, NearSetAccount, Netrapt, Obradovic Goran, Oleg Alexandrov, OnePt618, Owlgorithm, Patrick, Paul August, PaulTanenbaum, Pcap, Penumbra2000, Pigcatian, Plmday, Policron, Quarl,
Quux0r, Randall Holmes, Reindra, ResearchRave, Rlupsa, Roadrunner, Robin S, Royalguard11, SEIBasaurus, Sftd, Shanes, Slipstream, Snigbrook, SpaceFlight89, TakuyaMurata, Tarquin, Tide
rolls, Timwi, Tobias Bergemann, Tomo, Trovatore, Turms, Vanish2, Vipinhari, WillowW, Wshun, Xantharius, Xavic69, Yrodro, Yuval madar, Zundark, 116 anonymous edits
Database normalization Source: http://en.wikipedia.org/w/index.php?oldid=518100250 Contributors: 1exec1, A3 nm, ARPIT SRIVASTAV, Ahoerstemeier, Akamad, Akhristov, Alai,
Alasdair, Alest, Alpha 4615, Amr40, AndrewWTaylor, Antonielly, Anwar saadat, Apapadop, Arakunem, Archer3, Arcturus, Arthena, Ascend, AstroWiki, AubreyEllenShomo, Autocracy,
AutumnSnow, Azhar600-1, BMF81, Babbling.Brook, Bernard Franois, Bewildebeast, Billben74, Billpennock, BillyPreset, Black Eagle, Blade44, Blakewest, Blanchardb, Bloodshedder,
Blowdart, BlueNovember, BlueWanderer, Bongwarrior, Boson, Bovineone, BradBeattie, Brick Thrower, BrokenSegue, Bruceshining, Bschmidt, Bugsbunny1611, BuzCo, CLW, Callavinash1,
Can't sleep, clown will eat me, Chairboy, Chrislk02, Citral, Cl22333, CodeNaked, Conversion script, Creature, Crenner, Crosbiesmith, DARTH SIDIOUS 2, Damian Yerrick, DanMS, Dancraggs,
Danim, Danlev, Datasmid, David Colbourn, DavidConrad, DavidHOzAu, Davidhorman, Dean001, Decrease789, Demosta, Denisarona, DerHexer, Dfass, Dflock, Discospinster, Doc vogt,
DocRuby, Docu, Don Hammond, Doud101, Dreftymac, Drowne, Dthomsen8, Duke Ganote, Ed Poor, Edward Z. Yang, Eghanvat, Elcool83, Elwikipedista, EmmetCaulfield, Emperorbma,
Encognito, Enric Naval, Epepke, Eric Burnett, Escape Orbit, Ethan, Evilyuffie, Falcon8765, Farquaadhnchmn, Fathergod, FauxFaux, Fieldday-sunday, Fireman biff, Flewellyn, Fluffernutter,
Fmjohnson, Fraggle81, Fred Bradstadt, Furrykef, Gadfium, GateKeeper, Gimboid13, Gk5885, Gogo Dodo, Gottabekd, Gregbard, GregorB, Groganus, Gustavb, Guybrush, Gusegn, Gwr2004,
Hadal, Hairy Dude, Hanifbbz, Hapsiainen, HbJ, Hbf, Heracles31, HiDrNick, History2007, Hoo man, Hu12, Hydrogen Iodide, Ianblanes, IceUnshattered, Imre Fabian, Inquam, Intgr, Jadvinia,
Jakew, JamesBWatson, Jamesday, Jamesjusty, Jan Hidders, Japo, Jason Quinn, Javert16, Jdlambert, Jgro, Jjjjjjjjjj, Jklin, Joness59, Joseph Dwayne, Jpatokal, Jpo, Justin W Smith, KAtremer,
KathrynLybarger, Keane2007, Keegan, KevinOwen, KeyStroke, Keyvez, Kgwikipedian, Kingpin13, Klausness, Kushalbiswas777, L Kensington, L'Aquatique, LOL, Larsinio, Lawrence Cohen,
Leandrod, Lee J Haywood, Legless the oaf, Leleutd, Leotohill, Lerdthenerd, Les boys, Lethe, Libcub, Lifeweaver, Linhvn88, LittleOldMe, Longhair, Lssilva, Lujianxiong, Lulu of the
Lotus-Eaters, Lumingz, Luna Santin, M4gnum0n, MER-C, Magantygk, Manavkataria, Mark Renier, Marknew, MarownIOM, MartinHarper, Masterstupid, Materialscientist, Matmota, Matthew
1130, Mckaysalisbury, Metaeducation, Michael Hardy, Michalis Famelis, Michealt, Microtony, Mike Rosoft, Mikeblas, Mikeo, Mindmatrix, Miss Madeline, Mjhorrell, Mo0, Modeha, Mooredc,
Mpd, Mr Stephen, MrDarcy, Nabav, NawlinWiki, Nick1nildram, NickCT, NoahWolfe, Noisy, Northamerica1000, Nsaa, NubKnacker, Obradovic Goran, Ocrow, OliverMay, Opes, Oxymoron83,
Pagh, Peachey88, Pearle, Perfectblue97, Pete142, Pharaoh of the Wizards, Phil Boswell, Philip Trueman, Pie Man 360, Plastic rat, Polluxian, Prakicov, ProveIt, Purplepiano, Quarl, RB972,
RadioFan, Railgun, Rathgemz, Rdsmith4, Rdummarf, Reedy, Regancy42, Reinyday, Remy B, Reofi, RichF, Rjwilmsi, Robert McClenon, Robomaeyhem, Rockcool19, Rodasmith, Romke,
Ronfagin, Rp, Rumplefish, Ruud Koot, Ryulong, Sam Hocevar, Sasha.sheinberg, SchuminWeb, ScottJ, Scwlong, Seaphoto, Sfnhltb, Shadowjams, Shakinglord, Shawn wiki, Shreyasjoshis,
Shyamal, Silpi, Simeon, Simetrical, Sixpence, Skritek, Smjg, Smurfix, Snezzy, Snigbrook, Socialservice, Sonett72, Soulpatch, Soumyasch, Spacesoon, Sstrader, Stannered, Starwiz, Stephen e
nelson, Stephenb, SteveHL, Stifle, Stolkin, Sue Rangell, Superjaws, Sydneyw, Sylvain Mielot, Szathmar, Taw, Tbhotch, Tcamacho, Tedickey, Teknic, Tgantos, The Thing That Should Not Be,
The undertow, The1physicist, Tide rolls, Titofhr, Tobias Bergemann, Toddst1, Tom Lougheed, Tom Morris, Tommy2010, Toxicwaste288, Traxs7, Troels Arvin, Turnstep, Twinney12, Tyc20,
Unforgettableid, Upholder, Utcursch, Vald, Valdor65, Velella, VinceBowdren, Vladsinger, Vodak, Voidxor, Waggers, Wakimakirolls, Wavelength, Wexcan, WikiPuppies, WikipedianYknOK,
Wildheat, Wilfordbrimley, Wilsondavidc, Winterst, Wjhonson, Woohookitty, WookieInHeat, Xiong Chiamiov, Xiroth, Yong-Yeol Ahn, Zedla, Zeyn1, Zhenqinli, Zzuuzz, , 1253
anonymous edits
First normal form Source: http://en.wikipedia.org/w/index.php?oldid=518836121 Contributors: Aeonx, Alansohn, Alxndr, Ambuj.Saxena, Bernard Ladenthin, BillyPreset, Boson, Brianga,
Brick Thrower, Burner0718, Closedmouth, Crosbiesmith, Davidhorman, Dfass, Dreftymac, Eallik, Ebraminio, Eibcga, Foxyshadis, Funnyfarmofdoom, General Wesc, GermanX, GregorB,
Gwernol, Hamidrizeh, Heathcliff, Hobsonlane, Isnow, Jacobolus, Jason Quinn, Jgzheng, John of Reading, Klilidiplomus, Kwetal, LarRan, Lordmwesh, M.r santosh kumar., Mahlon, Mark Renier,
Mfpinhal, Montchav, Morra, Mystagogue, Nabav, NawlinWiki, ReformatMe, Vegpuff, VictorAnyakin, VinceBowdren, Whitmerj, , 168 anonymous edits
Second normal form Source: http://en.wikipedia.org/w/index.php?oldid=518458517 Contributors: Ak786, Allens, Apugazh, Benjamin.Cramphorn, Bernard Ladenthin, Boson, Btilm,
Carlhoerberg, Chrislk02, Crosbiesmith, DARTH SIDIOUS 2, DVdm, Don Hammond, ESkog, Ebraminio, GermanX, Glane23, GregorB, Haffasoul, Ijliao, Jason Quinn, Javert16, JianzhouZhou,
JimpsEd, Mark Renier, Materialscientist, Mordashov, Nabav, Sanchitideas, Shreyasjoshis, SqlPac, Uncle Dick, VinceBowdren, Whitmerj, , 87 anonymous edits
Third normal form Source: http://en.wikipedia.org/w/index.php?oldid=517451053 Contributors: Alvin-cs, Amalthea, Anabus, Arcturus, Azrich, Bernard Ladenthin, Blahma, Boson, Bxn1358,
CapitalR, Centrx, Codeculturist, DVdm, Diego Moya, Don Hammond, Dorfl, Ebraminio, Edward Z. Yang, Electriccatfish2, Furrykef, Garde, GermanX, Gingerjoos, Gwen-chan, Ijliao, Jason
Quinn, Jcsalterego, Jitse Niesen, Joseph Dwayne, Jswhitten, Kitkatbeard, Leasabp, Michalp, Michealt, MsHyde, Nabav, Natural Cut, Ollie, Pinethicket, Semaphorite, Shreyasjoshis, Sleske,
Someusername222, THEN WHO WAS PHONE?, Thingg, Toyota prius 2, USConsLib, Unara, Vegpuff, VinceBowdren, Vlad2000Plus, Wavelength, Whitmerj, Wikimiro, Willking1979,
Wyadbb, 94 anonymous edits
BoyceCodd normal form Source: http://en.wikipedia.org/w/index.php?oldid=500051408 Contributors: Allens, Anugrah atreya, Athirubansm, Bernard Ladenthin, Boson, Briangregory2000,
Chitransh saxena, Christian75, CiudadanoGlobal, Ebraminio, Eggman64, Elizium23, Fctseng, Fieldday-sunday, Hairy Dude, Island Monkey, JForget, Jgzheng, JimpsEd, Leflyman, Mikeblas,
Nabav, Nay Min Thu, NeerajKawathekar, Niddriesteve, Njsg, Obradovic Goran, Oxymoron83, P.kmetski, ProbePLayer, Quarl, Raztus, Simetrical, Smurfix, Solomon423, SqlPac, Su30,
Torzsmokus, Twarther, Uzume, VinceBowdren, Whitepaw, Yachtsman1, ZenSaohu, , 80 anonymous edits
Fourth normal form Source: http://en.wikipedia.org/w/index.php?oldid=514805202 Contributors: Akerans, Ashimjgec08, Bernard Ladenthin, Britannica, Ebraminio, Fetchcomms, Geeoharee,
GermanX, Jason Quinn, Jmabel, Mark Renier, Nabav, Northamerica1000, Patrick, Savh, Selfworm, VinceBowdren, Vjosullivan, WikHead, Winterst, 29 anonymous edits
Fifth normal form Source: http://en.wikipedia.org/w/index.php?oldid=513381881 Contributors: Andy M. Wang, Bernard Ladenthin, Brick Thrower, Cool Blue, Dsn.naruka, Dugo, Ebraminio,
FineganCJ, Flying Panda, GermanX, Igor Yalovecky, Jason Quinn, Libcub, MarcosWozniak, Mark Renier, Nabav, Quarl, RonaldKunenborg, Siryendor, SqlPac, Stamfest, Systemparadox, Tide
rolls, TonyTheTiger, VinceBowdren, 40 anonymous edits
Domain/key normal form Source: http://en.wikipedia.org/w/index.php?oldid=508185658 Contributors: Bernard Ladenthin, Dcoetzee, Emurphy42, GermanX, Gigs, Gregbard, Jmabel, Mark
Renier, Mugaliens, Nabav, Ott2, RonaldKunenborg, SqlPac, VinceBowdren, Whitepaw, 19 anonymous edits
Sixth normal form Source: http://en.wikipedia.org/w/index.php?oldid=511111497 Contributors: Boson, DePiep, Emurphy42, Esran, Favonian, GregorB, Jason Quinn, Mark Renier, Nabav,
Quarl, RBarryYoung, Roenbaeck, RonaldKunenborg, Rp, 14 anonymous edits

102

Article Sources and Contributors


Relation (database) Source: http://en.wikipedia.org/w/index.php?oldid=517036792 Contributors: Asocall, AutumnSnow, Crosbiesmith, Ed Poor, Fratrep, Georgeryp, Icairns, Lfstevens,
MaD70, Mark Renier, MusiKk, NickCT, Rob Bednark, Subversive.sound, Universalss, 11 anonymous edits
Functional dependency Source: http://en.wikipedia.org/w/index.php?oldid=518356288 Contributors: 1ForTheMoney, Armagedescu, AutumnSnow, BartVB, Cdrdata, Citral, Crosbiesmith,
D6bmg, EagleFan, Ejrh, Eric22, Favonian, Fragment, Ganesh121292, GeorgeBills, Graham87, Grassnbread, Hut 8.5, Igor Yalovecky, Iridiumcao, J. M., Jan Hidders, Jleedev, Jwp, KelvSYC,
KeyStroke, Kushalbiswas777, LOL, MaBoehm, Mark Renier, MartinWaite, Matthiaspaul, MaxDel, Michael Hardy, Michaelcomella, Mroberts297, Notheruser, Odsh, Pdfpdf, Porges, Poromenos,
Qetuth, Ruud Koot, Slavy13, Spamduck, Thebulbs, VinceBowdren, Waitak, Wilson44691, 90 anonymous edits
Multivalued dependency Source: http://en.wikipedia.org/w/index.php?oldid=518358625 Contributors: AutumnSnow, Cdrdata, Citral, Dweller, Emma li mk, Gromuald, Justin W Smith, Kilopi,
Mark Renier, Michael Hardy, Michaelcomella, Mkagenius, Poa, ProveIt, R'n'B, VinceBowdren, 32 anonymous edits
Join dependency Source: http://en.wikipedia.org/w/index.php?oldid=507767926 Contributors: Boson, Hairy Dude, History2007, RonaldKunenborg, Signalhead, 6 anonymous edits
Concurrency control Source: http://en.wikipedia.org/w/index.php?oldid=516604541 Contributors: 2GooD, Acdx, Adrianmunyua, Augsod, Bdesham, Brick Thrower, CanisRufus, CarlHewitt,
Christian75, Clausen, Comps, Craig Stuntz, DavidCary, Furrykef, Gdimitr, GeraldH, JCLately, Jesse Viviano, Jirislaby, John of Reading, JonHarder, Jose Icaza, Karada, KeyStroke, Kku, Leibniz,
M4gnum0n, Magioladitis, Malbrain, Mark Renier, Mgarcia, Mindmatrix, Miym, Nealcardwell, Nguyen Thanh Quang, Peak, Poor Yorick, Reedy, Rholton, Ruud Koot, Siskus, Smallman12q, The
Anome, Thingg, Thoreaulylazy, TonyW, Touko vk, Tumble, Victor falk, Vincnet, Wbm1058, Wikidrone, YUL89YYZ, 84 anonymous edits

103

Image Sources, Licenses and Contributors

Image Sources, Licenses and Contributors


Image:ER Diagram MMORPG.png Source: http://en.wikipedia.org/w/index.php?title=File:ER_Diagram_MMORPG.png License: GNU Free Documentation License Contributors: Original
uploader was TheMattrix at en.wikipedia
File:Entity Relationship metamodel.png Source: http://en.wikipedia.org/w/index.php?title=File:Entity_Relationship_metamodel.png License: Creative Commons Zero Contributors:
Ottomachin
Image:Erd-entity-relationship-example1.svg Source: http://en.wikipedia.org/w/index.php?title=File:Erd-entity-relationship-example1.svg License: Creative Commons ShareAlike 1.0 Generic
Contributors: Chanueting
Image:Erd-entity-with-attribute.svg Source: http://en.wikipedia.org/w/index.php?title=File:Erd-entity-with-attribute.svg License: Creative Commons Attribution-Sharealike 2.5 Contributors:
Original uploader was Bigsmoke at en.wikipedia
Image:erd-relationship-with-attribute.png Source: http://en.wikipedia.org/w/index.php?title=File:Erd-relationship-with-attribute.png License: Creative Commons Attribution-Sharealike 2.5
Contributors: Original uploader was Bigsmoke at en.wikipedia
Image:Erd-id-as-primary-key.png Source: http://en.wikipedia.org/w/index.php?title=File:Erd-id-as-primary-key.png License: Creative Commons Attribution-Sharealike 2.5 Contributors:
Original uploader was Bigsmoke at en.wikipedia
Image:ERD Representation.svg Source: http://en.wikipedia.org/w/index.php?title=File:ERD_Representation.svg License: Public Domain Contributors: Benthompson
Image:ERD-artist-performs-song.svg Source: http://en.wikipedia.org/w/index.php?title=File:ERD-artist-performs-song.svg License: Public Domain Contributors: Original uploader was
Bignose at en.wikipedia
Image:Relational database terms.svg Source: http://en.wikipedia.org/w/index.php?title=File:Relational_database_terms.svg License: Public Domain Contributors: User:Booyabazooka
File:Relational Model.svg Source: http://en.wikipedia.org/w/index.php?title=File:Relational_Model.svg License: Public Domain Contributors: U.S. Department of Transportation
vectorization:
File:Relational key.png Source: http://en.wikipedia.org/w/index.php?title=File:Relational_key.png License: Public Domain Contributors: LionKimbro
File:Relational model concepts.png Source: http://en.wikipedia.org/w/index.php?title=File:Relational_model_concepts.png License: GNU Free Documentation License Contributors:
User:AutumnSnow
File:Update anomaly.svg Source: http://en.wikipedia.org/w/index.php?title=File:Update_anomaly.svg License: Public Domain Contributors: Nabav,
File:Insertion anomaly.svg Source: http://en.wikipedia.org/w/index.php?title=File:Insertion_anomaly.svg License: Public domain Contributors: en:User:Nabav, User:Stannered
File:Deletion anomaly.svg Source: http://en.wikipedia.org/w/index.php?title=File:Deletion_anomaly.svg License: Public domain Contributors: en:User:Nabav, User:Stannered

104

License

License
Creative Commons Attribution-Share Alike 3.0 Unported
//creativecommons.org/licenses/by-sa/3.0/

105

You might also like