Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
10 views

Dbms Intro

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Dbms Intro

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Chapter 1: Introduction to

Databases

Slides adapted from Database System Concepts – 6th Edition


© Silberschatz, Korth and Sudarshan
What is a DBMS?

▪ DBMS = Database Management System

▪ Database: A large integrated collection of data.

▪ DBMS contains information about a particular enterprise


– Collection of interrelated data
– Set of programs to access the data
– An environment that is both convenient and efficient to use
Who Uses a DBMS?

▪ In short: everyone
– Banking: transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Online retailers: order tracking, customized recommendations
– Manufacturing: production, inventory, orders, supply chain
– Human resources: employee records, salaries, tax deductions

▪ How many databases have you used so far today?


University Database Example

▪ Application program examples


– Add new students, instructors, and courses
– Register students for courses, and generate class rosters
– Assign grades to students, compute grade point averages (GPA) and generate
transcripts

▪ In the early days, database applications were built directly on top of file systems
Drawbacks of Using File Systems to Store Data

▪ Data redundancy and inconsistency


– Multiple file formats, duplication of information in different files

▪ Difficulty in accessing data


– Need to write a new program to carry out each new task

▪ Data isolation
– Multiple files and formats

▪ Integrity problems
– Integrity constraints (e.g., account balance > 0) become “buried” in program code
rather than being stated explicitly
– Hard to add new constraints or change existing ones
Drawbacks of Using File Systems to Store Data (Cont.)

▪ Atomicity of updates
– Failures may leave database in an inconsistent state with partial updates carried out
– Example: Transfer of funds from one account to another should either complete or not
happen at all

▪ Concurrent access by multiple users


– Concurrent access needed for performance
– Uncontrolled concurrent accesses can lead to inconsistencies

▪ Security problems
– Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems
Why Use a DBMS?

▪ Data independence
▪ Efficient access
▪ Reduced application development time
▪ Uniform data administration
▪ Data integrity and security
▪ Concurrent access
▪ Recovery from crashes
Why Study Databases?

▪ Data is useless without the tools to extract information from the data (queries)
– “Optimal” pricing of an airline ticket

▪ Datasets are increasing in diversity and volume.


– Websites, digital libraries, interactive video, Human Genome project, mobile
applications
– Need for DBMS is exploding

▪ Databases touch most of CS


– OS, languages, theory, AI, multimedia, logic, …
Levels of Abstraction

▪ Physical Level: Describes how a record (e.g., student) is stored.

▪ Logical Level: Describes data stored in database, and the data relationships.

type instructor = record


ID: string;
name: string;
dept_name: string;
salary: integer;
end;

▪ View Level: Application programs hide details of data types. Views can also hide
information (such as an employee’s salary) for security purposes.
View of Data

▪ Physical schema describes the files and


indexes used.
▪ Logical schema defines logical structure
▪ External schema (views) describe how
users see the data
▪ Many external schemas,
1 conceptual (logical) schema &
1 physical schema.

An architecture for a database system


Instances and Schemas

▪ Similar to types and variables in programming languages

▪ Schema: The logical structure of the database


– Example: The database consists of information about a set of students and
instructors and the relationship between them
– Analogous to type information of a variable in a program
– Physical schema: Database design at the physical level
– Logical schema: Database design at the logical level

▪ Instance: The actual content of the database at a particular point in time


– Analogous to the value of a variable
Data Models

▪ A collection of tools for describing


– Data
– Data relationships
– Data semantics
– Data constraints
▪ Entity-Relationship data model (mainly for database design)
▪ Different data models
– Relational model
– Object-based data models (Object-oriented and Object-relational)
– Semi-structured data model (XML)
– Network model
– Hierarchical model
Relational Model

▪ Relational model (Chapter 2) Columns

▪ Example of tabular data in the relational model

Rows
A Sample Relational Database
Data Manipulation Language (DML)

▪ Language for accessing and manipulating the data organized by the appropriate data
model
– DML also known as query language

▪ Two classes of languages


– Procedural – user specifies what data is required and how to get those data
– Declarative (non-procedural) – user specifies what data is required without
specifying how to get those data

▪ SQL is the most widely used query language


Data Definition Language (DDL)

▪ Specification notation for defining the database schema


Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))

▪ DDL compiler generates a set of table templates stored in a data dictionary


▪ Data dictionary contains metadata (i.e., data about data)
– Database schema
– Integrity constraints: Primary key, referential integrity
– Authorization
SQL

▪ SQL: A widely used non-procedural language

Example: Find the ID and building of instructors in the Physics dept.


select instructor.ID, department.building
from instructor, department
where instructor.dept_name = department.dept_name and
department.dept_name = ‘Physics’

▪ Application programs generally access databases through one of


– Language extensions to allow embedded SQL
– Application program interface (e.g., ODBC/JDBC) which allow SQL queries to be sent
to a database
Database Design

The process of designing the general structure of the database:

▪ Logical Design – Deciding on the database schema. Database design requires that we
find a “good” collection of relation schemas.
– Business decision – What attributes should we record in the database?
– Computer Science decision – What relation schemas should we have and how
should the attributes be distributed among the various relation schemas?

▪ Physical Design – Deciding on the physical layout of the database


Database Design?

▪ Is there any problem with this design?


Design Approaches

▪ Normalization Theory
– Formalize what designs are bad, and test for them

▪ Entity Relationship Model


– Models an enterprise as a collection of entities and relationships
• Entity: A “thing” or “object” in the enterprise that is distinguishable from other
objects
– Described by a set of attributes
• Relationship: An association among several entities
– Represented diagrammatically by an entity-relationship diagram
The Entity-Relationship Model

▪ Entity Relationship Model


– Models an enterprise as a collection of entities and relationships
• Entity: A “thing” or “object” in the enterprise that is distinguishable from other objects
– Described by a set of attributes
• Relationship: An association among several entities
– Represented diagrammatically by an entity-relationship diagram
Storage Management

▪ Storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries submitted to
the system.

▪ The storage manager is responsible to the following tasks:


– Interaction with the file manager
– Efficient storing, retrieving and updating of data

▪ Issues:
– Storage access
– File organization
– Indexing and hashing
Query Processing

1. Parsing and translation


2. Optimization
3. Evaluation
Query Processing (Cont.)

▪ Alternative ways of evaluating a given query


– Equivalent expressions
– Different algorithms for each operation

▪ Cost difference between a good and a bad way of evaluating a query can be enormous

▪ Need to estimate the cost of operations


– Depends critically on statistical information about relations which the database must
maintain
– Need to estimate statistics for intermediate results to compute cost of complex
expressions
Transaction Management

▪ What if the system fails?


▪ What if more than one user is concurrently updating the same data?

▪ A transaction is a collection of operations that performs a single logical function in a


database application

▪ Transaction-management component ensures that the database remains in a


consistent (correct) state despite system failures (e.g., power failures and operating
system crashes) and transaction failures.

▪ Concurrency-control manager controls the interaction among the concurrent


transactions, to ensure the consistency of the database.
Lots of People use DBMS ...

▪ DBMS vendors

▪ DB application programmers
– E.g. smart webmasters

▪ Database administrator (DBA)


– Designs logical /physical schemas
– Handles security and authorization
– Data availability, crash recovery
– Database tuning as needs evolve
Must understand how a DBMS works!
Database Users and Administrators

Database
Overall System
Architecture
Database Architecture

▪ The architecture of a database systems is greatly influenced by the underlying computer


system on which the database is running:
– Centralized
– Client-server
– Parallel (multi-processor)
– Distributed
History of Database Systems
▪ 1950s and early 1960s:
– Data processing using magnetic tapes for storage
• Tapes provided only sequential access
– Punched cards for input

▪ Late 1960s and 1970s:


– Hard disks allowed direct access to data
– Network and hierarchical data models in widespread use
– Ted Codd defines the relational data model
• Would win the ACM Turing Award for this work
• IBM Research begins System R prototype
• UC Berkeley begins Ingres prototype
– High-performance (for the era) transaction processing
History of Database Systems (cont.)
▪ 1980s:
– Research relational prototypes evolve into commercial systems
• SQL becomes industrial standard
– Parallel and distributed database systems
– Object-oriented database systems

▪ 1990s:
– Large decision support and data-mining applications
– Large multi-terabyte data warehouses
– Emergence of Web commerce

▪ Early 2000s:
– XML and XQuery standards
– Automated database administration

▪ Later 2000s:
– Giant data storage systems
• Google BigTable, Yahoo PNUTS, Amazon, ..
CYU

▪ Which of these are more suitable for storing in a DBMS rather than files in an OS? Select
all that apply.
a) Historical stock market prices
b) Grades for students at the university
c) Source code for a program
d) Contents of a textbook
CYU

▪ When is relational model appropriate for representing data?


a) When the data can be expressed in the form of tables
b) For text files
c) For representing object-oriented models with inheritance, etc.
Summary

▪ DBMS is used to maintain, query large datasets

▪ Benefits include recovery from system crashes, concurrent access, quick application
development, data integrity and security

▪ Levels of abstraction give data independence

▪ DBAs hold responsible, interesting, well-paid jobs

▪ DBMS R&D is one of the most exciting areas in CS

You might also like