Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DB_notes

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 51

The Worlds of

Database Systems
CHAPTER 1
1.1 The Evolution of Database
Systems
 What is database System?
 What are its components?
DATABASE SYSTEM
A database system is a application or platform used to create, manage,
and organize data. It allows for efficient storage, retrieval, and
manipulation of data in a structured manner.
 COMPONENTS:
1. Database
2. DBMS
3. Users
4. Query Language
1.1 The Evolution of Database Systems
 Database:
A database is an organized collection of data that is stored
and managed to allow for easy access, retrieval, and
manipulation.
 DBMS:
The software that manages the database, allowing users
to interact with and manipulate the data.
 Users:
The individuals or applications that interact with the
database and DBMS.
 Query Language:
A language like SQL (Structured Query Language) used to
interact with the database.
Features of DBMS (Database
management system):
 The key features of a DBMS include:
1. Allow users to create databases and define their logical
structure using a data-definition language.
2. Enable users to query and modify the data through a query or
data-manipulation language.
3. Support storage of large amounts of data with efficient
access for queries and modifications.
4. Durability: Ensures recovery of the database in case of
failures, errors, or misuse.
5. Isolation: Control access to data from many users at once,
without allowing unexpected interactions among users.
6. Atomicity: Set of actions in a database happens all at once
or not at all.
Examples of DBMS:

Following are the examples of DBMS:


• MySQL
• Microsoft SQL Server
• Microsoft Access
• ORACLE
1.1.1 Early Database
Management Systems
 In the late 1960s, the first commercial database management system
(DBMS) was developed. Before the development of DBMS, File
systems were used to store data.
 Why do we need DBMS? We need a Database Management System
(DBMS) because traditional file systems have several limitations in
handling, organizing, and retrieving data.
Problems with File Systems:

• Data Searching
• Lack of Data Security
• Limited Multi-User Support
• No Efficient Data Access
• Data Redundancy
• Limited Storage
How DBMS Improved Things?

DBMS solved many of these problems. They allowed data to be:


• Searched and accessed efficiently
• Stored safely
• Managed for multiple users to avoid conflicts
• Maintained with data Integrity
• Stored larger volumes of data
1.1.1 Early Database
Management Systems
 Early data models:
The first database systems required programmers to think about how
data was stored physically. These systems used specific data models to
describe the organization of data:
• Hierarchical Model
• Network Model
 Challenges with Early Models:
• No High-Level Query Language
• Effort-Intensive Programming
1.1.1 Early Database
Management Systems
Applications of DBMS:
• Banking system
• Airline reservation systems
• E-commerce Websites
• YouTube
• Etc
1.1.2 Relational Database
ASystems
relational database is a type of database that organizes and stores data in a
structured format using tables, which consist of rows and columns. It is one of
the most commonly used types of database management systems.
 KEY POINTS:
• Data is stored in tables, also called relations.
• Each table has rows (records) and columns (attributes).
• A key is a special field (or combination of fields) in a table that is used to
uniquely identify records (rows).
• SQL(Structured Query Language) is used to interact with relational
databases.
• Handles large datasets effectively.
• By 1990, relational databases became the standard for managing data.
However, as technology advanced, new challenges and methods for
managing data appeared.
SYSTEMS:
Smaller Systems Bigger Systems
 Hundreds of gigabytes can now fit on  Companies manage a large amount of
a small, inexpensive disk. data daily, which is even in Terabytes
 It's possible to run a DBMS on a (1012) or Perabytes (1015).
personal computer, making it  This data is stored in custom systems
affordable and accessible to almost tailored for quick searching instead of
everyone. using regular traditional databases.
 Relational databases, once used only  Pictures need much more storage than
on large computers, are now text. While a short text file can take up
available for small devices like only a few kilobytes, an image requires
laptops and even smartphones. significantly more space.
• They have become as common as  Peer to peer model offer another way of
tools like spreadsheets and word
managing data. These systems consist
processors
of many computers, each holding a
• These documents are often tagged small amount of data. Together, they
using XML (extensible Markup create a massive network capable of
Language), which organizes data in a storing and sharing large files efficiently
flexible way.
Information Integration

 Information: the ordered data


 integration: to join or combine
1.1.5 Information Integration
Information integration means to combine data contained in
related dataset
 Example:
Consider a large company with several divisions, like sales, HR, and
manufacturing. Each division has its own database for storing
information about employees, products, or processes. These databases
might have been created independently, so they may use different
software, formats, and terms. For example, one department may call
employees “staff” and other may call them “workers”; one database
stores employee data as ID’s and other as names.
PROBLEM:

The company might want to see an overall picture, like how many
employees it has or what products it makes, but the differences in
databases make this difficult. Older applications rely on these
databases, so they can’t just throw them away and start fresh.
SOLUTIONS:

 Data Warehouses
 Middleware (Mediators)
Solutions:
Data Warehouses Middleware
(Mediators)
All the data from different
databases is copied into one Middleware works as a translator
central database. While between the databases. It lets
users search and analyze data as
copying, the data is cleaned
if it’s in one system, even though
and standardized so it’s still in separate databases.
everything uses the same Middleware supports an
terms and formats. integrated model of data of
various databases, while
translating between this model
and the actual models used by
Hence, using the middleware approach
eachisdatabase.
clearly an easier &
efficient approach.
1.2 Overview of a Database
Management System
To advance in this concept
we need to remember few
things:
 Single boxes represent
different components of
the system,
 Double boxes represent
in-memory data
structures.
 Solid lines show how
both control and data
move through the
system.
 Dashed lines show
data movement only.
1.2 Overview of a Database
Management System
 Users and Application Programs: These request data from the
database or make changes to the data stored in it.

 Database Administrator (DBA): This is the person in charge of


managing the structure of the database, also known as the schema.
1.2.1 Data-Definition Language
Commands
 Data-Definition Language (DDL) commands are used to define and manage the
structure of a database. These commands help create, modify, and delete the
database's schema, which is Crthe blueprint of how data is organized.
 Some common tasks done using DDL commands include:
Creating Tables: You can define a new table by specifying its name, columns,
and their data types.
Altering Tables: If you need to add, remove, or modify columns in an existing
table, you use DDL commands to make those changes.
Dropping Tables: When a table is no longer needed, you can delete it
completely using a DDL command.
 Main purpose :to set up the framework for storing and managing data
efficiently. These commands are typically used by database administrators (DBAs) or
developers to organize the database before adding actual data. These schema-
altering data-definition language (DDL) commands are parsed(the process of
analyzing and breaking down a piece of data) by a DDL processor and passed to the
execution engine, which then goes through the index /file/ record manager to alter
the metadata( the schema information for the database)
1.2.2 Overview of Query
Processing
First of all, what is a QUERY?
1.2.2 Overview of Query
Processing
QUERY is a question asked from database which results in a change in
the data of database. There are many types of Queries.
 Queries can be of many types .
 Modification queries (used to update, insert, or delete data in the
database)
 Retrieval queries (used to fetch data from the database)
1.2.2 Overview of Query
Processing
Query processing involves three basic steps:
 Parsing & Translation
 Optimization
 Evaluation
Data Manipulation Language
(DML).
 This is a set of commands that allows them to work with the actual
data stored in the database.
 It doesn’t change the structure of data, only focuses on updating,
inserting ,deleting or retrieving data from the database.
Answering the Query:
 Parsing and Optimizing the Query:
When a user runs a query, it is parsed (broken down into understandable parts)
and optimized (improved for efficiency) by a query compiler.The compiler creates a
query plan, which is a list of steps for the system to follow to process the query.
 Execution of the Query Plan:
The execution engine takes the query plan and requests small pieces of data
(like rows or records) from the resource manager.The resource manager understands how
and where the data is stored, including its structure and any helpful index files for quick
access.
 Data Handling by the Buffer Manager:
The buffer manager fetches the requested data from the disk (secondary
storage) into memory (buffers) for processing.Data is transferred in chunks called pages or
disk blocks.
 Interaction with the Storage Manager:
The buffer manager communicates with the storage manager to retrieve the
needed data from the disk.The storage manager either uses operating system commands
or directly interacts with the disk controller to get the required data efficiently.
Transaction processing:

 What are transactions?


Transaction processing:
Transactions are the group of database operations that must be
treated as single unit.
 It basically ensures that either all actions happen or none happen
which is the atomicity property.
 These actions are purely isolated and durable.
 The property of durability ensures that once a transaction is
complete, its effects must be saved permanently, even if the system
crashes afterward
Transaction processing:
Transaction processing is further divided into two parts .
Concurrency-Control Manager Logging and Recovery
Manager:
(or Scheduler):
 Makes sure transactions are
atomic and isolated, so they  Ensures durability by keeping
don't affect each other or track of recent changes and
cause errors when running recovering data if the system
together fails.
Logging:
In order to assure durability, every change in the database is logged
separately on disk. The log manager follows one of several policies
designed to assure that no matter when a system failure or "crash"
occurs, a recovery manager will be able to examine the log of changes
and restore the database to some consistent state. The log manager
initially writes the log in buffers and negotiates with the buffer manager
to make sure that buffers are written to disk (where data can survive a
crash) at appropriate times.
Why is Logging Necessary for Durability?
 The database system uses the log to identify what operations were
performed.
 The recovery process uses the log to:
 Redo completed transactions: Ensure their effects are fully applied to the
database.
 Undo incomplete transactions: Rollback changes from transactions that
didn’t finish.
Concurrency control:

Transactions must appear to execute in isolation. But in most systems,


there will in truth be many transactions executing simultaneously. So,
locks are maintained on database by scheduler so that there will be no
conflict among transactions. The scheduler affects the execution of
queries and other database operations by forbidding the execution
engine from accessing locked parts of the database.
Deadlock Resolution:
A deadlock is a situation in a computer system where two or more
processes are unable to proceed because each is waiting for a resource
that the other process holds. This results in a standstill where no process
can make progress.
When Deadlock Occurs?
 Process A holds Resource 1 and requests Resource 2 (held by Process
B).
 Process B holds Resource 2 and requests Resource 1 (held by Process
A).
 Neither process can proceed, resulting in a deadlock.
This type of situation commonly arises in multi-threaded applications,
database systems, or operating systems where processes compete for
limited resources like memory, files, or CPU time.
Transaction and its Properties:

 Transaction involves operations like adding,deleting or updating


data. Properly implemented transactions are commonly said to meet
the "ACID test".
The ACID properties include:

Atomicity
Consistency
Isolation
Durability
Terminologies to Know

 Blocks:  Buffers:
A block refers to the smallest A buffer is a designated
unit of data that can be read from or
area in main memory (RAM) where
written to the disk .Block corresponds
to a fixed-size chunk of data on the data from disk blocks is
storage device (magnetic disk), temporarily stored before
typically ranging from 512 bytes to a processing.
few kilobytes.
 Buffer Manager:  Storage Manager:
The buffer manager is a It is the job of the
software component of the DBMS storage manager to control the
responsible for allocating and placement of data on disk and its
managing buffers in main memory. movement between disk and main
memory.
1.2.3 Storage and Buffer
Management:
In a database system, most data is stored on secondary storage, like
hard drives or SSDs.
 Why Secondary storage devices?
1.2.3 Storage and Buffer
Management:
In a database system, most data is stored on secondary storage, like
hard drives or SSDs.
 Why Secondary storage devices?
1. Because it can hold large amounts of data, unlike main memory
(RAM), which is smaller and temporary.
2. However To do any useful operation (e.g., searching or updating
data), the data must first be loaded into main memory (RAM),
because the CPU cannot directly work with data on disk.
1.2.3 Storage and Buffer
Management:
The buffer manager is responsible for dividing the available main
memory into buffers, which are page-sized regions into which disk
blocks can be transferred. Thus, all DBMS components that need
information from the disk will interact with the buffers and the buffer
manager, either directly or through the execution engine. The kinds of
information that various components may need include:
1. Data: the contents of the database itself.
2. Metadata: the database schema that describes the structure of, and
constraints on, the database.
3. Log Records: information about recent changes to the database;
these support durability of the database.
4. Statistics: information gathered and stored by the DBMS about data
properties such as the sizes of, and values in. various relations or other
components of the database.
5. Indexes: data structures that support efficient access to the data.
Storage Hierarchy From Diagram:
• Main memory with buffer pools
• Secondary storage (disk) with data
blocks
• Data movement between layers
Transaction properties
Revise, we learnt about ACID Test.
 The ACID properties include:
 ATOMICITY
 CONSISTENCY
 ISOLATION
 DURABILITY
Atomicity
A transaction is atomic, meaning it is an all-or-nothing operation. Either
all the changes made during the transaction are committed, or none are.
If any part of the transaction fails, all changes are rolled back.
Example:
 If the transaction involves two steps:

1. Deducting $100 from Account A.

2. Adding $100 to Account B.


If the first operation succeeds (deducting from Account A), but the
second operation fails (adding to Account B), the database will rollback
the entire transaction, and no money will be deducted from Account A or
added to Account B. This ensures the atomicity of the transaction.
Consistency
Consistency refers to the property that ensures the database transitions from one valid state to
another after a transaction is executed. In other words, after a transaction, the database must
be in a state that adheres to all defined rules, constraints, and business logic.

Example:
Before Transaction:
 Account A has $500.
 Account B has $300.
Transaction:
 Debit $100 from Account A.
 Credit $100 to Account B.
After Transaction:
 Account A should have $400 (after debit).
 Account B should have $400 (after credit).
 Now, let’s say that the transaction is supposed to ensure:
Account A cannot have a negative balance.
Account B cannot have more money than allowed (let’s say the maximum limit is $500).
 Ifthe transaction is consistent, after transferring $100, Account A will have $400 and Account
B will have $400. Both accounts are valid and consistent with the rules.If there was a rule
violation, for example, if the transaction tried to take out more money from Account A than it
had (e.g., $600 instead of $100), the transaction would violate consistency, and the DBMS would
reject the transaction.
Isolation:

 Transactions are isolated from each other. One transaction should not
affect the execution of another transaction, even if they occur
concurrently.in DBMS, isolation is implemented through locking.
 How It Works:
Isolation ensures that intermediate states of a transaction are not
visible to other transactions. Each transaction appears to execute in
isolation, even if multiple transactions are running concurrently.
Isolation:
Without Isolation (Initial With Isolation :
state):  Transaction 1 acquires lock and
 Transaction 1 reads balance reads balance ($1000)
($1000)  Transaction 2 must wait (shown
 Transaction 2 reads balance by delayed position)
($1000) before T1 finishes  Transaction 1 writes new
 Transaction 1 writes new balance ($800) and releases
balance ($800) lock
 Transaction 2 writes new  Transaction 2 can now proceed,
balance ($700) reads $800
 The $200 deduction from T1  Transaction 2 writes final
is lost! balance ($500)
 Final balance is wrong ($700  Correct final state achieved
instead of $500)
Durability:
Durability ensures that once a transaction is committed, its changes are
permanent, even in the face of failures like power outages or crashes. It
can be achieved when transaction logs record changes before
committing them and changes are written to non-volatile storage (e.g.,
disk or SSD) before the transaction completes.
Example:
 A transaction deposits $500 into an account. Once committed.
 The database ensures the $500 is stored in durable storage.
 Even if the system crashes immediately after, the $500 will be
available upon recovery.
1.2.5 Query Processing:

It is the job of query compiler (software component of DBMS) to process


the query. The query compiler, which translates the query into an
internal form called a query plan. Query plan is a sequence of operations
to be performed on the data. Query compiler has three major
components:
 Query Parser
 Query Preprocessor
 Query Optimizer
1.2.5 Query Processing:
 Query Parser:
Converts SQL text to a structured tree
 Query Preprocessor:
Checks query validity and converts to algebraic form .It
performs semantic checks on the query by checking:
Do tables exist?
Do columns exist?
Are data types compatible?

 Query Optimizer:
Finds most efficient execution plan
Query Optimizer:
 Finds most efficient execution plan. It transforms the initial query plan
into the best available sequence of operations on the actual data. The
optimizer uses information about the database, such as:
 Metadata: Describes the structure of the database, including tables,
columns, indexes, and constraints.
 Statistics: Information about the data, like the number of rows in a
table, the distribution of values in a column, or the presence of
unique values.
Click icon to add picture

Query Processing Analysis


Execution Engine:

The execution engine is responsible for carrying out the steps


outlined in the selected query plan. It works closely with other
DBMS components, either directly or by using buffers. Its job
includes loading the required data from the database into buffers
so it can process and manipulate it. Additionally, it coordinates
with the scheduler to ensure it doesn't access locked data and
with the log manager to ensure all database changes are
properly recorded.
THANK YOU
AIMA IMRAN 2024-CS-155
ASRA SHAHEEN 2024-CS-139
LAIBA ASGHAR 2024-CS-128

You might also like