Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lectures - 11 Transactions Logging

Download as pdf or txt
Download as pdf or txt
You are on page 1of 52

Transactions

Stats (Out of 90+2)


○ Median: 76.5, Mean: 73.9, High: 92, Standard Deviation: 10.5
○ Regrade requests open for 1 week (next Tuesday)

Exam
Grades
Feedback Themes
- TAs were ultra-responsive during the “24 hours”
- Test was too long. We’ll recalibrate for Finals
- Most liked 24 hour flex time
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’

SQL INSERT INTO SmallProduct(name, price)


Writes SELECT pname, price
FROM Product
WHERE price <= 0.99

DELETE Product
WHERE price <=0.99
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’

SQL INSERT INTO SmallProduct(name, price)


Writes SELECT pname, price
FROM Product
Readfrom another table
WHERE price <= 0.99

DELETE Product
WHERE price <=0.99
How?
Mobile Game
Report & Share

Example Real-Time
DBMS
Business/Product
Analysis

Game App User Events


DB

DB v0
Q1: 1000 users/sec? Q7: How to model/evolve game data? Q4: Which user cohorts?
Q2: Offline? Q8: How to scale to millions of users? Q5: Next features to build?
(Recap lectures) Q3: Support v1, v1’ versions? Q9: When machines die, restore game Experiments to run?
state gracefully? Q6: Predict ads demand?

App designer Systems designer Product/Biz designer


How?
Mobile Game
Report & Share

Example Real-Time
DBMS
Business/Product
Analysis

Game App User Events


DB

DB v0
Q1: 1000 users/sec? Q7: How to model/evolve game data? Q4: Which user cohorts?
Q2: Offline? Q8: How to scale to millions of users? Q5: Next features to build?
(Recap lectures) Q3: Support v1, v1’ versions? Q9: When machines crash, restore Experiments to run?
game state gracefully? Q6: Predict ads demand?

App designer Systems designer Product/Biz designer


1. Why Transactions?

Today’s
Lecture 2. Transactions

3. Properties of Transactions: ACID

4. Logging
Example

Unpack
ATM DB:

Transaction
Read Balance Read Balance
Give money vs Update Balance
Update Balance Give money
Visa does > 60,000 TXNs/sec with users & merchants

Want your 4$ Starbucks transaction to wait for a stranger’s 10k$ bet in Las Vegas ?
⇒ Transactions can (1) be quick or take a long time, (2) unrelated to you
Transactions are at the core of
-- payment, stock market, banks, ticketing
-- Gmail, Google Docs (e.g., multiple people editing)
Money Money (@4:29 am day+1)

Example

Monthly
bank
interest
transaction
‘T-Monthly-423’
Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run

UPDATE Money
SET Balance = Balance * 1.1
Money Money (@4:29 am day+1)
Example

Monthly
bank
interest
transaction
Cost to update all data
100M bank accounts → 100M seeks? (worst case)

Performance (@10 msec/seek, that’s 1 million secs)

Problem1: SLOW :(
Money Money (@10:45 am)

Example
??

Monthly ?? Did T-Monthly-423 complete?


?? Which tuples are bad?
bank
interest ??

transaction
Case1: T-Monthly-423 crashed
‘T-Monthly-423’ Case2: T-Monthly-423 completed
With crash Monthly Interest 10% 4002 deposited 20$ at 10:45 am
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run
Network outage at 10:29 am,
System access at 10:45 am
Problem 2: Wrong :(
15
Primary data structures/algorithms

LOGS LOCKS

Big Scale

Roadmap
?????
1. Why Transactions?

Today’s
Lecture 2. Properties of Transactions: ACID

3. Logging
Transactions: Basic Definition

A transaction (“TXN”) is a sequence of In the real world, a TXN


either happened
one or more operations (reads or completely or not at all
writes) which reflects a single real-world (e.g., you withdrew 100$
transition. from bank. Or not.)

START TRANSACTION
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’
COMMIT
Transactions in SQL

• In “ad-hoc” SQL, each statement = one transaction

• In a program, multiple statements can be grouped together as a transaction

START TRANSACTION
UPDATE Bank SET amount = amount – 100
WHERE name = ‘Bob’
UPDATE Bank SET amount = amount + 100
WHERE name = ‘Joe’
COMMIT
Motivation for Transactions
Group user actions (reads & writes) into Transactions helps with two goals:

1. Recovery & Durability: Keep the data consistent and durable.


Despite system crashes, user canceling TXN part way, etc.
This lecture!

Idea: Use LOGS. Support to “commit” or “rollback” TXNs

2. Concurrency: Get better performance by parallelizing TXNs


without creating ‘bad data.’ Despite slow disk writes and reads.

Next lecture

Idea: Use LOCKS. Run several user TXNs concurrently.


Example 1: Protection against crashes / aborts

Scenario: Make a CheapProducts table, from a Products table

Client 1:
INSERT INTO CheapProduct(name, price)
SELECT pname, price
FROM Product Crash / abort!
WHERE price <= 0.99

DELETE Product
WHERE price <=0.99

What goes wrong?


Client 1:
START TRANSACTION
INSERT INTO CheapProduct(name, price)
SELECT pname, price
FROM Product
WHERE price <= 0.99

DELETE Product
WHERE price <=0.99
COMMIT

Now we’d be fine! We’ll see how / why this lecture


Example 2: Multiple users: single statements

Client 1: [at 10:01 am] Client 2: [at 10:01 am]


UPDATE Product UPDATE Product
SET Price = Price – 1.99 SET Price = Price*0.5
WHERE pname = ‘Gizmo’ WHERE pname=‘Gizmo’

Two managers attempt to discount products at same time -

What could go wrong?


Client 1: START TRANSACTION Client 2: START TRANSACTION
UPDATE Product UPDATE Product
SET Price = Price – 1.99 SET Price = Price*0.5
WHERE pname = ‘Gizmo’ WHERE pname=‘Gizmo’
COMMIT COMMIT

Now works like a charm- we’ll see how / why next lecture…
3. Properties of Transactions
1. Atomicity

2. Consistency
What you will
learn about in 3. Isolation
this section
4. Durability
ACID: Atomicity

• TXN is all or nothing


• Commits: all the changes are made

• Aborts: no changes are made


ACID: Consistency

• The tables must always satisfy user-specified integrity constraints


• E.g., Account number is unique, Sum of debits and of credits is 0

• How consistency is achieved:


• Programmer writes a TXN to go from one consistent state to a
consistent state
• System makes sure that the TXN is atomic (e.g., if EXCEPTION, rolls
back)
ACID: Isolation

• A TXN executes concurrently with other TXNs

• Effect of TXNs is the same as TXNs running one after another

Conceptually,
• similar to OS “sandboxes”
• E.g. TXNs can’t observe each other’s “partial updates”
ACID: Durability

• The effect of a TXN must persist after the TXN


• And after the whole program has terminated
• And even if there are power failures, crashes, etc.

• ⇒ Write data to durable IO (e.g., disk)


ACID Summary

• Atomic
• State shows either all the effects of TXN, or none of them
• Consistent
• TXN moves from a state where integrity holds, to another where integrity
holds
• Isolated
• Effect of TXNs is the same as TXNs running one after another
• Durable
• Once a TXN has committed, its effects remain in the database
A Note: ACID is one popular option!

• Many debates over ACID, both historically and currently

• Some “NoSQL” DBMSs relax ACID

• In turn, now “NewSQL” reintroduces ACID compliance to


NoSQL-style DBMSs…

⇒ Usually, depends on what consistency and performance your


application needs

ACID is an extremely important & successful paradigm,


but still debated!
4. Atomicity & Durability via Logging
Conceptual
Idea:
Trip to Europe

1. Make TODO list. Buy 2. Actual Visit


tickets
(Much longer than buying tickets)
Recall (on disks)

▹ Sequential reads FASTER than random reads


▹ Sequential writes (aka “appends”) FASTER than random writes

Big Idea Big Idea: LOGs (or log files or ledger)


▹ Any value that changes? Append to LOG!
LOGS! ■ LOG is a compact “todo” list of data updates
▹ Intuition:
(aka TODO/ ■ Data pages: (a) Update in RAM (fast) (b) Update on disk
ledger) later (slow)
■ LOGs: ( c) Append “todo” in LOGs and (d) control when
you Flush LOGs to disk

Many kinds of LOGs. We’ll study a few key ones!


1. How to make/use LOGs?

What you will


learn about in
this section
2. How to make it fast? (Mess with memory
and disk)
Basic Idea: (Physical) Logging

Idea:
• Log consists of an ordered list of Update Records
• Log record contains UNDO information for every update!
<TransactionID, &reference, old value, new value>
(e.g., key)

What DB does?
• Owns the log “service” for all applications/transactions.
• Appends to log. Flush when necessary — force writes to disk

This is sufficient to UNDO any transaction!


Money Money (@4:29 am day+1) WA Log (@4:29 am day+1)

Example
Update
Records

Monthly
bank Commit
interest Record

transaction
‘T-Monthly-423’

Full run
Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run

START TRANSACTION
UPDATE Money
SET Amt = Amt * 1.10
COMMIT
Money Money (@10:45 am) WA Log (@10:29 am)

Example
??

Monthly ??
??
bank
interest ??

transaction
TXN ‘T-Monthly-423’
Did T-Monthly-423 complete?
With crash Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Which tuples are bad?

Takes 24 hours to run Case1: T-Monthly-423 was crashed


Network outage at 10:29 am, Case2: T-Monthly-423 completed. 4002
System access at 10:45 am deposited 20$ at 10:45 am

Can you infer from RED log records?


Money (@10:45 am) Money (after recovery) WA Log (@10:29 am)

Example

Monthly
bank
interest
transaction
System recovery (after 10:45 am)

1. Rollback uncommitted transactions


- Restore old values from WAL Log (if any)
Recovery 2.
- Notify developers about aborted TXN
Redo Recent transactions (w/ new values)
3. Back in business; Redo (any pending) transactions
1. How to make/use LOGs?

What you will


learn about in
this section
2. ⇒ How to make it fast? (Mess with
memory and disk)
A picture of logging

Log is a file (like any


A=7 Log data table)
1. Pages updated in

B=5
Main Memory RAM
2. Flushed as DB
blocks on disk
(sequential I/O)

A=7 “Flushing to disk”


= writing to disk
Data on Disk Log on Disk from main
memory
A picture of logging

T: R(A=7), W(A=13) [Update Record]


[T reads A=7, writes A=13] <Tid, &A, 7,13>
Operation
T A=13 Log recorded in update
log in main

B=5
Main Memory memory!

A=7
Data on Disk Log on Disk
Why do we need logging for
atomicity?
• Could we just write TXN updates to disk only once whole TXN
complete?
• Then, if abort / crash and TXN not complete, it has no effect- atomicity!
• With unlimited memory and time, this could work…

• ⇒ We need to log partial results of TXNs because of:


• Memory constraints (e.g. , billions of updates)
• Time constraints (what if one TXN takes very long?)

We need to write partial results to disk!


…And so we need a LOG to (maybe) undo these partial results!
What is the correct way to LOG to disk?

• We’ll look at the Write-Ahead Logging (WAL) protocol

• We’ll see why it works by looking at other protocols which are incorrect!

Remember: Key idea is to ensure durability


while maintaining our ability to “undo”!
Write-Ahead Logging (WAL)
TXN Commit Protocol
Write-ahead Logging (WAL)
Commit Protocol
[Update Record]

Commit after we’ve written


COMMIT
T: R(A), W(A) <Tid, &A, 7,13> record
log to disk but before
we’ve written data to
disk…
T A=13 Log-RAM

B=5
Main Memory
OK, Commit!

A=7
Data on Disk Log-Disk
Write-ahead Logging (WAL)
Commit Protocol
Commit after we’ve written
T: R(A), W(A) log to disk but before
we’ve written data to
disk… this is WAL!
T A=13

Main Memory
OK, Commit!

If we crash now, is T
<Tid, &A, 7,13> durable?
Yes
A=7
A=13
USE THE LOG!
Data on Disk Log-Disk
Write-Ahead Logging (WAL)

Algorithm: WAL

For each tuple update, write Update Record into LOG-RAM

Follow two Flush rules for LOG


● Rule1: Flush Update Record into LOG-Disk before → Durability
corresponding data page goes to storage
● Rule2: Before TXN commits,
- Flush all Update Records to LOG-Disk
- Flush COMMIT Record to LOG-Disk
→ Atomicity

Transaction is committed once COMMIT


record is on stable storage

Time > 0 Time > 0

Flush Update TXN


Flush COMMIT Record
Data Flush
Record to LOG-Disk to LOG-Disk COMMIT
Rule1: For each tuple update Rule2: Before TXN commits
after
shouldhappen
all records ateflushe
to logdisk
Incorrect Commit Protocol #1
Let’s try committing
before we’ve written
either data or LOG to
T: R(A), W(A) A: 7→13 disk…

T A=13 Log-RAM
OK, Commit!
B=5
Main Memory
If we crash now, is T
durable?
No
Lost T’s update!
A=7
Data on Disk Log-Disk
Incorrect Commit Protocol #2
Let’s try committing after
we’ve written data but
before we’ve written LOG
T: R(A), W(A) A: 7→13 to disk…

T A=13 Log-RAM
OK, Commit!
B=5
Main Memory
If we crash now, is T
durable? Yes! Except…

A=13 How do we know


whether T was
Data on Disk Log-Disk
committed??

loseinformation onleg
Money Money (@4:29 am day+1) WAL (@4:29 am day+1)
Example

Monthly
bank
interest
transaction
Cost to update all data Cost to Append to log
100M bank accounts → 100M seeks? (worst + 1 seek to get ‘end of log’
case) + write 100M log entries sequentially

Performance (@10 msec/seek, that’s 1 Million secs) (fast!!! < 10 sec)

[Lazily update data on disk later,


when convenient.]

Speedup for TXN Commit


1 Million secs vs 10 sec!!!
Logging Summary

• If DB says TX commits, TX effect remains after database


crash

• DB can undo actions and help us with atomicity

• This is only half the story…

You might also like