Dbms Intro
Dbms Intro
Databases
▪ In short: everyone
– Banking: transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Online retailers: order tracking, customized recommendations
– Manufacturing: production, inventory, orders, supply chain
– Human resources: employee records, salaries, tax deductions
▪ In the early days, database applications were built directly on top of file systems
Drawbacks of Using File Systems to Store Data
▪ Data isolation
– Multiple files and formats
▪ Integrity problems
– Integrity constraints (e.g., account balance > 0) become “buried” in program code
rather than being stated explicitly
– Hard to add new constraints or change existing ones
Drawbacks of Using File Systems to Store Data (Cont.)
▪ Atomicity of updates
– Failures may leave database in an inconsistent state with partial updates carried out
– Example: Transfer of funds from one account to another should either complete or not
happen at all
▪ Security problems
– Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems
Why Use a DBMS?
▪ Data independence
▪ Efficient access
▪ Reduced application development time
▪ Uniform data administration
▪ Data integrity and security
▪ Concurrent access
▪ Recovery from crashes
Why Study Databases?
▪ Data is useless without the tools to extract information from the data (queries)
– “Optimal” pricing of an airline ticket
▪ Logical Level: Describes data stored in database, and the data relationships.
▪ View Level: Application programs hide details of data types. Views can also hide
information (such as an employee’s salary) for security purposes.
View of Data
Rows
A Sample Relational Database
Data Manipulation Language (DML)
▪ Language for accessing and manipulating the data organized by the appropriate data
model
– DML also known as query language
▪ Logical Design – Deciding on the database schema. Database design requires that we
find a “good” collection of relation schemas.
– Business decision – What attributes should we record in the database?
– Computer Science decision – What relation schemas should we have and how
should the attributes be distributed among the various relation schemas?
▪ Normalization Theory
– Formalize what designs are bad, and test for them
▪ Storage manager is a program module that provides the interface between the low-
level data stored in the database and the application programs and queries submitted to
the system.
▪ Issues:
– Storage access
– File organization
– Indexing and hashing
Query Processing
▪ Cost difference between a good and a bad way of evaluating a query can be enormous
▪ DBMS vendors
▪ DB application programmers
– E.g. smart webmasters
Database
Overall System
Architecture
Database Architecture
▪ 1990s:
– Large decision support and data-mining applications
– Large multi-terabyte data warehouses
– Emergence of Web commerce
▪ Early 2000s:
– XML and XQuery standards
– Automated database administration
▪ Later 2000s:
– Giant data storage systems
• Google BigTable, Yahoo PNUTS, Amazon, ..
CYU
▪ Which of these are more suitable for storing in a DBMS rather than files in an OS? Select
all that apply.
a) Historical stock market prices
b) Grades for students at the university
c) Source code for a program
d) Contents of a textbook
CYU
▪ Benefits include recovery from system crashes, concurrent access, quick application
development, data integrity and security