Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
142 views

Chapter 1-Database System Introduction

This document provides an introduction to database systems. It defines key terms like data, information, knowledge, database, database management system, and data hierarchy. It discusses the purpose of databases and importance of database design. It also outlines typical database system environments and what database management systems can do, like defining data structures, loading and manipulating data, processing queries concurrently, and maintaining the database.

Uploaded by

radika thapa
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
142 views

Chapter 1-Database System Introduction

This document provides an introduction to database systems. It defines key terms like data, information, knowledge, database, database management system, and data hierarchy. It discusses the purpose of databases and importance of database design. It also outlines typical database system environments and what database management systems can do, like defining data structures, loading and manipulating data, processing queries concurrently, and maintaining the database.

Uploaded by

radika thapa
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

Introduction to

Database Systems
"The numbers have no way of speaking for themselves. We speak for them.
We imbue them with meaning." —Statistician Nate Silver in the book 
The Signal and the Noise

Compiled and presented by:


Subarna Shrestha.
Lectures

Lecture slides in ppt format will be posted shortly before or after


the lecture
are to complement the lectures
Many issues discussed in the lectures will be covered in the
exams
hence try to attend lectures regularly
Will not cover ALL materials on the slides
attending lectures will tell you which is covered and which is
not
Terminologies
Data are plain facts. Think of data as a "raw material“- it needs to be
processed before it can be turned into something useful.
Data can be any character, text, word, number, and, if not put into
context, means little or nothing to a human. 

Information is data that has been processed in such a way as to be


meaningful to the person who receives it.
Characteristics of information are Timely, Accuracy, Completeness .

Knowledge
Human mind purposefully organized the information and evaluate it to
produce knowledge. In other words the ability of the person recalls or
uses his information and experience is known as knowledge.
For example,
"386" is data,
"your marks are 386" is information,
And
"It is result of your hard work" is knowledge.
Terminologies cont…
Data, Information and Knowledge
Terminologies cont…
Database(A large collection of data.)
•is collection of related data and its metadata organized in a
structured format for optimized information management
•Examples: databases of customers, products,...
•A database usually models (some part of) a real- world enterprise.
Entities (e.g., students, courses)
Relationships (e.g., John Doe is taking DBS)

Database Management System (DBMS)


•is a software that enables easy creation, access, and modification of
databases for efficient and effective database management
•Many vendors: IBM, Sybase, Oracle, Microsoft, etc

Database System
is an integrated system of hardware, software, people, procedures,
and data that define and regulate the collection, storage,
management, and use of data within a database environment
Data Hierarchy- the systematic organization of data
Data Hierarchy-cont…
Database System versus File System
DBMS File Processing System
Minimal data redundancy problem in
Data Redundancy problem exits
DBMS
Data Inconsistency does not exist Data Inconsistency exist here
Accessing database is easier Accessing is comparatively difficult
Data is scattered in various files and files
The problem of data isolation is not
may be of different format, so data
found in database
isolation problem exists
Transactions like insert, delete, view, In file system, transactions are not
updating, etc are possible in database possible
Concurrent access and recovery is Concurrent access and recovery is not
possible in database possible
Security of data Security of data is not good
A database manager (administrator) A file manager is used to store all
stores the relationship in form of relationships in directories in file
Why learn about databases?

 It used to be about boring things: employee records, bank records,


etc.
 Today, the field covers all aspects of working with data:
 Web search

 Data mining

 Scientific and medical databases

 Integrating information

 Databases are behind almost everything you do on the Web


 Google searches

 Queries at Amazon, eBay, etc.

 Trip planning (expedia etc)


What is Data Management?
• Find data (search and query)
 Query: Find the average enrollment in database courses at

BICTE?
 How could we find this using a conventional
search within file system?
 Do we get what we want?

 Why is this hard?

 How could we find this using a Database


Management System (DBMS)?
• Update or modify data
• Ensure data consistency
• Protect data
 from unauthorized access (access control)

 from failures (recovery)

 from other programs or users (concurrency control)


Database Management System

- manages interaction between end users and database

Database Systems: Design, Implementation, & Management: Rob & Coronel


Database System Environment

 Hardware
 Software
- OS
- DBMS
- Applications

 People
 Procedures
 Data

Database Systems: Design, Implementation, & Management: Rob & Coronel


Simplified database system environment
Database: Why
Purpose of Database
•Optimizes data management
•Transforms data into information
Importance of Database Design
•Defines the database’s expected use
-different approach needed for different types of databases
•Avoid data redundancy & ensure data integrity
-data is accurate and verifiable
•Poorly designed database generates errors
-leads to bad decisions
-can lead to failure of organization
Functions of DBMS/Database System
•Stores data and related data entry forms, report definitions, etc.
•Hides the complexities of relational database model from the user
-facilitates the construction/definition of data elements and their
relationships
-enables data transformation and presentation
•Enforces data integrity
•Implements data security management
-access, privacy, backup & restoration
Database: How
•Planning & Analysis
-Assess
•Goal of the organization
•Database environment
•existing hardware, software, raw data, data processing
procedures
-Identify
Database needs
•what database can do to further the goal of the organization
User needs and characteristics
•who the users are, what they want to do, how they envision
doing it
Database system requirements
•what the database system should do to satisfy the database
and user needs
•Design
From conceptual design to a detailed system specification
•Implementation
Create the database
•Maintenance
Troubleshoot, update, streamline the database
Database: User-centered
Perspective
The user is always right. If there is a problem with the use of the
system, the system is the problem, not the user.

Compliance
The user has the right to a system that performs exactly as promised.

Instruction
The user has the right to easy-to-use instructions (user guides, online
or contextual help, error messages) for understanding and utilizing a
system to achieve desired goals and recover efficiently and gracefully
from problem situations.

Usability
The user should be the master of software and hardware technology,
not vice-versa. Products should be natural and intuitive to use.
Database: Data Models

Importance
Abstraction of complex real-word data structures in relative simple
(graphical) representations
Facilitate interaction among the designer, the applications
programmer, and the end user

Basic Building Blocks


Entity
thing about which data are to be collected and stored
Attribute
a characteristic of an entity
Relationship
describes an association among entities
Constraint
restrictions placed on the data
Typical DBMS Functionality
 Define a particular database in terms of its data types, structures,
and constraints
 Construct or Load the initial database contents on a secondary
storage device
 Manipulating the database:
 Retrieval: Querying, generating reports

 Modification: Insertions, deletions and updates to its content

 Accessing the database through Web applications

 Processing and Sharing by a set of concurrent users and application


programs – yet, keeping all data valid and consistent
 Other features:
 Protection or Security measures to prevent unauthorized access
 Presentation and Visualization of data
 Maintaining the database and associated programs over the
lifetime of the database application
What Can a DBMS Do for Us?
•Data Definition Language – DDL
A syntax similar to a computer programming language for defining data
structures, especially database schemas.
•Data Manipulation Language – DML
Query Language
•Storage management
•Transaction Management
concurrency control
Recovery

Automate a lot of boring/mundane operations on data


so that we don’t have to program over and over
so that we can write complex data manipulations in just a few lines, so that we can
concentrate on app logics
Make execution very fast
so that it scales up to very large data sets
Make concurrent access/modification possible
so that many users can use the data at the same time

Think buying a plane ticket! Can you do it without a DBMS?


Why Use a DBMS?
 Self-describing nature of a database system:

 A DBMS catalog stores the description of a particular database

(e.g. data structures, types, and constraints)


 The description is called meta-data.

 Data independence


You don’t need to know the implementation of the database to
access the data

Applications insulated from how data is structured and stored
 change the order of tuples

 add or modify other columns

 add or modify indexes

Note that query does not change when physical structure changes
One of the most important benefits of using a DBMS
Why Use a DBMS? Cont…

 Efficient access
 queries are optimized.
 Reduced application development time
 Queries can be expressed declaratively, we do not need to
indicate how to execute them
 Data integrity and security
 Some constraints on the data are enforced
automatically.
 Data Consistency

 Data Constraints:
 All students must have a student ID (sID)

 No two students can have the same sID (uniqueness)

 A student may only have one grade per course

 Etc.
Why Use a DBMS? Cont…
 Concurrent access, recovery from crashes
 Many users can access/update the database at the same

time without any interference.


 Concurrent execution of user programs is essential for good

DBMS performance.
 Because disk accesses are frequent, and relatively slow

 Interleaving actions of different user programs can lead to

inconsistency:
 A cheque is cleared while account balance is being

computed.
 DBMS ensures that such problems do not arise: users can

behave as if they were using a single- user system.


 Speed – even when the data is huge, i.e.

 IRS: 150 TB (1 TB ≈ 1012 B)

 Yahoo: 2 PB (1 PB ≈ 1015 B)

 National Energy Research Scientific Computing Center (USA):


3.5 PB
Why Use a DBMS? Cont…
Data Abstraction
•Many views,
•single logical schema and physical schema.
•Views (external schemas) describe how users see the data.
•Logical schema defines logical structure.
•Physical schema describes the files and indexes used.

Example: University Database View 1 View 2 View 3


 Conceptual schema:

Logical Schema
Student(sid: string, name: string, login: string, age: integer, gpa:real)
 Course(cid: string, cname:string, credits:integer)
 Enrolled(sid:string, cid:string, grade:string) Physical Schema
• describes data in terms of the data model of the DBMS
 Physical schema:
 Relations stored as unordered files.
 Index on first column of Students.
 External Schema (View):
 Course_info(cid:string,enrollment:integer)
Describing Data: What is a Data Model?
 Mathematical representation of data
 relational model = tables;

 semistructured model = trees/graphs.

AND
 Operations on data
 Constraints

 A schema is a description of a particular collection of data, using a


given data model.
 The relational model of data is the most widely used model today.
 Main concept: relation, basically a table with rows and columns.
 Every relation has a schema, which describes the columns, or
fields.
Example Instance of Student Relation

sID Name Login Age GPA

53666 Jones Jones@cs 20 3.2

45453 Smith Smith@ai 19 3.1

 Columns are attributes


 Rows are tuples
The SQL Query Language
 Find all students who are 20 years old

SELECT *
FROM Students WHERE age = 20

sID Name Login Age GPA


53666 Jones Jones@cs 20 3.2
Database Users
 Three groups:
 End users
 Database administrators
 Database developers
Historical Development of Database
Technology
 Early Database Applications:
 The Hierarchical and Network Models were introduced
in the mid 1960s and dominated during the 1970s.
 Relational Model based Systems:
 Relational model was originally introduced in 1969
(40th anniversary is celebrated this year!) by E.F. Codd
at IBM.
 Relational DBMS Products emerged in the early
1980s.
Historical Development of Database
Technology (cont'd)
 Object-oriented and emerging applications:
 Object-Oriented Database Management Systems
(OODBMSs) were introduced in the late 1980s and
early 1990s to cater to the need of complex data
processing - but failed to take off.

 Many relational DBMSs have incorporated object


database concepts, leading to a new category called
object-relational DBMSs (ORDBMSs)
 Extended relational systems add further capabilities
(e.g. for multimedia data, XML, and other data types)
Historical Development of Database
Technology (cont'd)
 Data on the Web and E-commerce Applications:
 Web contains data in HTML with links among
pages.
 This has given rise to a new set of applications
and E-commerce is using new standards like XML
(eXtended Markup Language).
 Script programming languages such as PHP and
JavaScript allow generation of dynamic Web
pages that are partially generated from a
database
 Also allow database updates through Web pages
Evolution of Data Models
Timeline

1960s 1970s 1980s 1990s 2000+

File-based

Hierarchical
Object-oriented
Network

Relational Web-based
Entity-Relationship
Database: Historical Roots

Manual File System


to keep track of data
used tagged file folders in a filing cabinet
organized according to expected use
e.g. file per customer
easy to create, but hard to
locate data
aggregate/summarize data

Computerized File System


to accommodate the data growth and information need
manual file system structures were duplicated in the computer
Data Processing (DP) specialists wrote customized programs to
write, delete, update data (i.e. management)
extract and present data in various formats (i.e. report)
File System: Example

Database Systems: Design, Implementation, & Management: Rob & Coronel


File System: Weakness

Weakness
“Islands of data” in scattered file systems.
Problems
Duplication
same data may be stored in multiple files
Inconsistency
same data may be stored by different names in different format
Rigidity
requires customized programming to implement any changes
cannot do ad-hoc queries

Implications
Waste of space
Data inaccuracies
High overhead of data manipulation and maintenance
File System: Problem Case

CUSTOMER file AGENT file SALES file

A_Name (15 char) A_Name (20 char) AGENT (20 char)

Carol Johnson Carol T. Johnson Carol J. Smith

- inconsistent field name, field size


- inconsistent data values
- data duplication
Database System vs. File System

Database Systems: Design, Implementation, & Management: Rob & Coronel


Hierarchical Database

Background
Developed to manage large amount of data for complex manufacturing
projects
e.g., Information Management System (IMS)
IBM-Rockwell joint venture
clustered related data together
hierarchically associated data clusters using pointers

Hierarchical Database Model


Assumes data relationships are hierarchical
One-to-Many (1:M) relationships
Each parent can have many children
Each child has only one parent
Logically represented by an upside down tree
Hierarchical Database: Example

Database Systems: Design, Implementation, & Management: Rob & Coronel


Hierarchical Database: Pros & Cons
Advantages
Conceptual simplicity
groups of data could be related to each other
related data could be viewed together
Centralization of data
reduced redundancy and promoted consistency

Disadvantages
Limited representation of data relationships
did not allow Many-to-Many (M:N) relations
Complex implementation
required in-depth knowledge of physical data storage
Structural Dependence
data access requires physical storage path
Lack of Standards
limited portability
Network Database

Objectives
Represent more complex data relationships
Improve database performance
Impose a database standard

Network Database Model


Similar to Hierarchical Model
Records linked by pointers
Composed of sets
Each set consists of owner (parent) and member (child)
Many-to-Many (M:N) relationships representation
Each owner can have multiple members (1:M)
A member may have several owners
Network Database: Example

Database Systems: Design, Implementation, & Management: Rob & Coronel


Network Database: Pros & Cons

Advantages
More data relationship types
More efficient and flexible data access
“network” vs. “tree” path traversal
Conformance to standards
enhanced database administration and portability

Disadvantages
System complexity
require familiarity with the internal structure for data access
Lack of structural independence
small structural changes require significant program changes
Relational Database

Problems with legacy database systems


Required excessive effort to maintain
Data manipulation (programs) too dependent on physical file structure
Hard to manipulate by end-users
No capacity for ad-hoc query (must rely on DB programmers).

Evolution in Data Organization


E. F. Codd’s Relational Model proposal
Separated the notion of physical representation (machine-view)
from logical representation (human-view)
Considered ingenious but computationally impractical in 1970

Relational Database Model


Dominant database model of today
Eliminated pointers and used tables to represent data
Tables
flexible logical structure for data representation
a series of row/column intersections
related by sharing common entity characteristic(s)
Relational Database: Example
 Provides a logical “human-level” view of the data and associations among groups of data
(i.e., tables)

Customer_ID Customer_Account Agent_ID


1224 4556 23
1225 4558 25

Agent_ID Last_Name First_Name Phone


23 Sturm David 334-5678
25 Long Kyle 556-3421

Customer_ID Last_Name First_Name Phone Account_Balance


1224 Vira Dyne 678-9987 1223.95
1225 Davies Tricia 556-3342 234.25
Relational Database: Pros & Cons

Advantages
Structural independence
Separation of database design and physical data storage/access
Easier database design, implementation, management, and use
Ad hoc query capability with Structured Query Language (SQL)
SQL translates user queries to codes

Disadvantages
Substantial hardware and system software overhead
more complex system
Poor design and implementation is made easy
ease-of-use allows careless use of RDBMS
Entity Relationship Model

Peter Chen’s Landmark Paper in 1976


“The Relationship Model: Toward a Unified View of Data”
Graphical representation of entities and their relationships

Entity Relationship (ER) Model

Based on Entity, Attributes & Relationships


Entity is a thing about which data are to be collected and stored
e.g. EMPLOYEE
Attributes are characteristics of the entity
e.g. SSN, last name, first name
Relationships describe an associations between entities
i.e. 1:M, M:N, 1:1

Complements the relational data model concepts


Helps to visualize structure and content of data groups
entity is mapped to a relational table
Tool for conceptual data modeling (higher level representation)

Represented in an Entity Relationship Diagram (ERD)


Formalizes a way to describe relationships between groups of data
E-R Diagram: Chen Model

Entity
represented by a rectangle with its name in capital
letters.

Relationships
represented by an active or passive verb inside the
diamond that connects the related entities.

Connectivities
i.e., types of relationship
written next to each entity box.

Database Systems: Design, Implementation, & Management: Rob & Coronel


E-R Diagram: Crow’s Foot Model

Entity
represented by a rectangle with its name in
capital letters.

Relationships
represented by an active or passive verb
that connects the related entities.

Connectivities
indicated by symbols next to entities.
2 vertical lines for 1
“crow’s foot” for M

Database Systems: Design, Implementation, & Management: Rob & Coronel


E-R Model: Pros & Cons

Advantages

Exceptional conceptual simplicity


easily viewed and understood representation of database
facilitates database design and management

Integration with the relational database model


enables better database design via conceptual modeling

Disadvantages
Incomplete model on its own
Limited representational power
cannot model data constraints not tied to entity relationships
e.g. attribute constraints
cannot represent relationships between attributes within entities
No data manipulation language (e.g. SQL)

Loss of information content


Hard to include attributes in ERD
Object-Oriented Database

Semantic Data Model (SDM)


Modeled both data and their relationships in a single structure (object)
Developed by Hammer & McLeod in 1981

Object-oriented concepts became popular in 1990s


Modularity facilitated program reuse and construction of complex structures
Ability to handle complex data types (e.g. multimedia data)
Object-Oriented Database Model (OODBM)
Maintains the advantages of the ER model but adds more features
Object = entity + relationships (between & within entity)
consists of attributes & methods
attributes describe properties of an object
methods are all relevant operations that can be performed on an object
self-contained abstraction of real-world entity
Class = collection of similar objects with shared attributes and methods
e.g. EMPLOYEE class = (employ1 object, employ2 object, …)
organized in a class hierarchy
e.g. PERSON > EMPLOYEE, CUSTOMER
Incorporates the notion of inheritance
attributes and methods of a class are inherited by its descendent classes
OO Database Model vs. E-R Model
OODBM:
- can accommodate relationships within a object
- objects to be used as building blocks for autonomous structures

Database Systems: Design, Implementation, & Management: Rob & Coronel


Object-Oriented Database: Pros &
Cons
Advantages
Semantic representation of data
fuller and more meaningful description of data via object
Modularity, reusability, inheritance
Ability to handle
complex data
sophisticated information requirements

Disadvantages
Lack of standards
no standard data access method
Complex navigational data access
class hierarchy traversal
Steep learning curve
difficult to design and implement properly
More system-oriented than user-centered
High system overhead
slow transactions
Web Database

Internet is emerging as a prime business tool


Shift away from models (e.g. relational vs. O-O)
Emphasis on interfacing with the Internet

Characteristics of “Internet age” databases


Flexible, efficient, and secure Internet access
Support for complex data types & relationships
Seamless interfaces with multiple data sources and structures
Ease of use for end-user, database architect, and database administrator
Simplicity of conceptual database model
Many database design, implementation, and application development tools
Powerful DBMS GUI
In 2-tier architecture
application program directly interacts with the database. 
Advantages of 2-tier Architecture
•Easy to understand as it directly communicates with the database.
•Requested data can be retrieved very quickly, when there is less number of users.
•Easy to modify – any changes required, directly requests can be sent to database
•Easy to maintain – When there are multiple requests, it will be handled in a queue
and there will not be any chaos.
Disadvantages of 2-tier architecture:
•It would be time consuming, when there is huge number of users. All the requests will
be queued and handed one after another. Hence it will not respond to multiple users
at the same time.
•This architecture would little cost effective.
3-tier Architecture
3-tier architecture is the most widely used database architecture. It can be viewed as below.
Presentation layer / User layer is the layer where user uses the database. He does not have
any knowledge about underlying database. He simply interacts with the database as though
he has all data in front of him. You can imagine this layer as a registration form where you will
be inputting your details.  Did you ever guessed, after pressing ‘submit’ button where the data
goes? No right? You just know that your details are saved. This is the presentation layer where
all the details from the user are taken, sent to the next layer for processing.
Application layer is the underlying program which is responsible for saving the details that you
have entered, and retrieving your details to show up in the page. This layer has all the business
logics like validation, calculations and manipulations of data, and then sends the requests to
database to get the actual data. If this layer sees that the request is invalid, it sends back the
message to presentation layer.  It will not hit the database layer at all.
Data layer or Database layer is the layer where actual database resides. In this layer, all the
tables, their mappings and the actual data present. When you save you details from the front
end, it will be inserted into the respective tables in the database layer, by using the programs
in the application layer. When you want to view your details in the web browser, a request is
sent to database layer by application layer. The database layer fires queries and gets the data.
These data are then transferred to the browser (presentation layer) by the programs in the
application layer.
Advantages of 3-tier architecture:
•Easy to maintain and modify. Any changes requested will not affect any other data in the
database. Application layer will do all the validations.
•Improved security. Since there is no direct access to the database, data security is increased.
There is no fear of mishandling the data. Application layer filters out all the malicious actions.
•Good performance. Since this architecture cache the data once retrieved, there is no need to
hit the database for each request. This reduces the time consumed for multiple requests and
hence enables the system to respond at the same time.
Disadvantages 3-tier Architecture
•it is little more complex and little more effort is required in terms of hitting the database.
Extending Database Capabilities
 New functionality is being added to DBMSs in
the following areas:
 Scientific Applications
 XML (eXtensible Markup Language)
 Image Storage and Management
 Audio and Video Data Management
 Data Warehousing and Data Mining
 Spatial Data Management
 Time Series and Historical Data Management
 Interoperability, integrating data from different sources

You might also like