Lectures - 11 Transactions Logging

Transactions
Stats (Out of 90+2)

○ Median: 76.5, Mean: 73.9, High: 92, Standard Deviation: 10.5
○ Regrade requests open for 1 week (next Tuesday)
Exam
Grades
Feedback Themes
- TAs were ultra-responsive during the “24 hours”
- Test was too long. We’ll recalibrate for Finals
- Most liked 24 hour flex time
UPDATE Product
SET Price = Price – 1.99
WHERE pname = ‘Gizmo’
SQL INSERT INTO SmallProduct(name, price)

Writes SELECT pname, price
FROM Product
WHERE price <= 0.99
DELETE Product
WHERE price <=0.99
UPDATE Product
SQL INSERT INTO SmallProduct(name, price)

Writes SELECT pname, price
FROM Product
Readfrom another table
WHERE price <= 0.99
DELETE Product
WHERE price <=0.99
How?
Mobile Game
Report & Share
Example Real-Time
DBMS
Business/Product
Analysis
Game App User Events

DB
DB v0
Q1: 1000 users/sec? Q7: How to model/evolve game data? Q4: Which user cohorts?
Q2: Offline? Q8: How to scale to millions of users? Q5: Next features to build?
(Recap lectures) Q3: Support v1, v1’ versions? Q9: When machines die, restore game Experiments to run?
state gracefully? Q6: Predict ads demand?
App designer Systems designer Product/Biz designer

How?
Mobile Game
Report & Share
Example Real-Time
DBMS
Business/Product
Analysis
Game App User Events

DB
DB v0
Q1: 1000 users/sec? Q7: How to model/evolve game data? Q4: Which user cohorts?
Q2: Offline? Q8: How to scale to millions of users? Q5: Next features to build?
(Recap lectures) Q3: Support v1, v1’ versions? Q9: When machines crash, restore Experiments to run?
game state gracefully? Q6: Predict ads demand?
App designer Systems designer Product/Biz designer

1. Why Transactions?
Today’s
Lecture 2. Transactions
3. Properties of Transactions: ACID
4. Logging
Example
Unpack
ATM DB:
Transaction
Read Balance Read Balance
Give money vs Update Balance
Update Balance Give money
Visa does > 60,000 TXNs/sec with users & merchants
Want your 4$ Starbucks transaction to wait for a stranger’s 10k$ bet in Las Vegas ?
⇒ Transactions can (1) be quick or take a long time, (2) unrelated to you
Transactions are at the core of
-- payment, stock market, banks, ticketing
-- Gmail, Google Docs (e.g., multiple people editing)
Money Money (@4:29 am day+1)
Example
Monthly
bank
interest
transaction
‘T-Monthly-423’
Monthly Interest 10%
4:28 am Starts run on 100M bank accounts
Takes 24 hours to run
UPDATE Money
SET Balance = Balance * 1.1
Money Money (@4:29 am day+1)
Example
Monthly
bank
interest
transaction
Cost to update all data
100M bank accounts → 100M seeks? (worst case)
Performance (@10 msec/seek, that’s 1 million secs)
Problem1: SLOW :(
Money Money (@10:45 am)
Example
??
Monthly ?? Did T-Monthly-423 complete?

?? Which tuples are bad?
bank
interest ??
transaction
Case1: T-Monthly-423 crashed
‘T-Monthly-423’ Case2: T-Monthly-423 completed
With crash Monthly Interest 10% 4002 deposited 20$ at 10:45 am
Network outage at 10:29 am,
System access at 10:45 am
Problem 2: Wrong :(
15
Primary data structures/algorithms
LOGS LOCKS
Big Scale
Roadmap
?????
1. Why Transactions?
Today’s
Lecture 2. Properties of Transactions: ACID
3. Logging
Transactions: Basic Definition
A transaction (“TXN”) is a sequence of In the real world, a TXN

either happened
one or more operations (reads or completely or not at all
writes) which reflects a single real-world (e.g., you withdrew 100$
transition. from bank. Or not.)
START TRANSACTION
UPDATE Product
COMMIT
Transactions in SQL
• In “ad-hoc” SQL, each statement = one transaction
• In a program, multiple statements can be grouped together as a transaction
START TRANSACTION
UPDATE Bank SET amount = amount – 100
WHERE name = ‘Bob’
UPDATE Bank SET amount = amount + 100
WHERE name = ‘Joe’
COMMIT
Motivation for Transactions
Group user actions (reads & writes) into Transactions helps with two goals:
1. Recovery & Durability: Keep the data consistent and durable.

Despite system crashes, user canceling TXN part way, etc.
This lecture!
Idea: Use LOGS. Support to “commit” or “rollback” TXNs
2. Concurrency: Get better performance by parallelizing TXNs

without creating ‘bad data.’ Despite slow disk writes and reads.
Next lecture
Idea: Use LOCKS. Run several user TXNs concurrently.

Example 1: Protection against crashes / aborts
Scenario: Make a CheapProducts table, from a Products table
Client 1:
INSERT INTO CheapProduct(name, price)
SELECT pname, price
FROM Product Crash / abort!
WHERE price <= 0.99
DELETE Product
WHERE price <=0.99
What goes wrong?

Client 1:
START TRANSACTION
INSERT INTO CheapProduct(name, price)
SELECT pname, price
FROM Product
WHERE price <= 0.99
DELETE Product
WHERE price <=0.99
COMMIT
Now we’d be fine! We’ll see how / why this lecture

Example 2: Multiple users: single statements
Client 1: [at 10:01 am] Client 2: [at 10:01 am]

UPDATE Product UPDATE Product
SET Price = Price – 1.99 SET Price = Price*0.5
WHERE pname = ‘Gizmo’ WHERE pname=‘Gizmo’
Two managers attempt to discount products at same time -
What could go wrong?

Client 1: START TRANSACTION Client 2: START TRANSACTION
UPDATE Product UPDATE Product
SET Price = Price – 1.99 SET Price = Price*0.5
WHERE pname = ‘Gizmo’ WHERE pname=‘Gizmo’
COMMIT COMMIT
Now works like a charm- we’ll see how / why next lecture…
3. Properties of Transactions
1. Atomicity
2. Consistency
What you will
learn about in 3. Isolation
this section
4. Durability
ACID: Atomicity
• TXN is all or nothing

• Commits: all the changes are made
• Aborts: no changes are made

ACID: Consistency
• The tables must always satisfy user-specified integrity constraints

• E.g., Account number is unique, Sum of debits and of credits is 0
• How consistency is achieved:

• Programmer writes a TXN to go from one consistent state to a
consistent state
• System makes sure that the TXN is atomic (e.g., if EXCEPTION, rolls
back)
ACID: Isolation
• A TXN executes concurrently with other TXNs
• Effect of TXNs is the same as TXNs running one after another
Conceptually,
• similar to OS “sandboxes”
• E.g. TXNs can’t observe each other’s “partial updates”
ACID: Durability
• The effect of a TXN must persist after the TXN

• And after the whole program has terminated
• And even if there are power failures, crashes, etc.
• ⇒ Write data to durable IO (e.g., disk)

ACID Summary
• Atomic
• State shows either all the effects of TXN, or none of them
• Consistent
• TXN moves from a state where integrity holds, to another where integrity
holds
• Isolated
• Effect of TXNs is the same as TXNs running one after another
• Durable
• Once a TXN has committed, its effects remain in the database
A Note: ACID is one popular option!
• Many debates over ACID, both historically and currently
• Some “NoSQL” DBMSs relax ACID
• In turn, now “NewSQL” reintroduces ACID compliance to

NoSQL-style DBMSs…
⇒ Usually, depends on what consistency and performance your

application needs
ACID is an extremely important & successful paradigm,

but still debated!
4. Atomicity & Durability via Logging
Conceptual
Idea:
Trip to Europe
1. Make TODO list. Buy 2. Actual Visit

tickets
(Much longer than buying tickets)
Recall (on disks)
▹ Sequential reads FASTER than random reads

▹ Sequential writes (aka “appends”) FASTER than random writes
Big Idea Big Idea: LOGs (or log files or ledger)

▹ Any value that changes? Append to LOG!
LOGS! ■ LOG is a compact “todo” list of data updates
▹ Intuition:
(aka TODO/ ■ Data pages: (a) Update in RAM (fast) (b) Update on disk
ledger) later (slow)
■ LOGs: ( c) Append “todo” in LOGs and (d) control when
you Flush LOGs to disk
Many kinds of LOGs. We’ll study a few key ones!

1. How to make/use LOGs?
What you will

learn about in
this section
2. How to make it fast? (Mess with memory
and disk)
Basic Idea: (Physical) Logging
Idea:
• Log consists of an ordered list of Update Records
• Log record contains UNDO information for every update!
<TransactionID, &reference, old value, new value>
(e.g., key)
What DB does?
• Owns the log “service” for all applications/transactions.
• Appends to log. Flush when necessary — force writes to disk
This is sufficient to UNDO any transaction!

Money Money (@4:29 am day+1) WA Log (@4:29 am day+1)
Example
Update
Records
Monthly
bank Commit
interest Record
transaction
‘T-Monthly-423’
Full run
Monthly Interest 10%
START TRANSACTION
UPDATE Money
SET Amt = Amt * 1.10
COMMIT
Money Money (@10:45 am) WA Log (@10:29 am)
Example
??
Monthly ??
??
bank
interest ??
transaction
TXN ‘T-Monthly-423’
Did T-Monthly-423 complete?
With crash Monthly Interest 10%
Which tuples are bad?
Takes 24 hours to run Case1: T-Monthly-423 was crashed

Network outage at 10:29 am, Case2: T-Monthly-423 completed. 4002
System access at 10:45 am deposited 20$ at 10:45 am
Can you infer from RED log records?

Money (@10:45 am) Money (after recovery) WA Log (@10:29 am)
Example
Monthly
bank
interest
transaction
System recovery (after 10:45 am)
1. Rollback uncommitted transactions

- Restore old values from WAL Log (if any)
Recovery 2.
- Notify developers about aborted TXN
Redo Recent transactions (w/ new values)
3. Back in business; Redo (any pending) transactions
1. How to make/use LOGs?
What you will

learn about in
this section
2. ⇒ How to make it fast? (Mess with
memory and disk)
A picture of logging
Log is a file (like any

A=7 Log data table)
1. Pages updated in
B=5
Main Memory RAM
2. Flushed as DB
blocks on disk
(sequential I/O)
A=7 “Flushing to disk”

= writing to disk
Data on Disk Log on Disk from main
memory
A picture of logging
T: R(A=7), W(A=13) [Update Record]

[T reads A=7, writes A=13] <Tid, &A, 7,13>
Operation
T A=13 Log recorded in update
log in main
B=5
Main Memory memory!
A=7
Data on Disk Log on Disk
Why do we need logging for
atomicity?
• Could we just write TXN updates to disk only once whole TXN
complete?
• Then, if abort / crash and TXN not complete, it has no effect- atomicity!
• With unlimited memory and time, this could work…
• ⇒ We need to log partial results of TXNs because of:

• Memory constraints (e.g. , billions of updates)
• Time constraints (what if one TXN takes very long?)
We need to write partial results to disk!

…And so we need a LOG to (maybe) undo these partial results!
What is the correct way to LOG to disk?
• We’ll look at the Write-Ahead Logging (WAL) protocol
• We’ll see why it works by looking at other protocols which are incorrect!
Remember: Key idea is to ensure durability

while maintaining our ability to “undo”!
Write-Ahead Logging (WAL)
TXN Commit Protocol
Write-ahead Logging (WAL)
Commit Protocol
[Update Record]
Commit after we’ve written

COMMIT
T: R(A), W(A) <Tid, &A, 7,13> record
log to disk but before
we’ve written data to
disk…
T A=13 Log-RAM
B=5
Main Memory
OK, Commit!
A=7
Data on Disk Log-Disk
Write-ahead Logging (WAL)
Commit Protocol
Commit after we’ve written
T: R(A), W(A) log to disk but before
we’ve written data to
disk… this is WAL!
T A=13
Main Memory
OK, Commit!
If we crash now, is T
<Tid, &A, 7,13> durable?
Yes
A=7
A=13
USE THE LOG!
Write-Ahead Logging (WAL)
Algorithm: WAL
For each tuple update, write Update Record into LOG-RAM
Follow two Flush rules for LOG

● Rule1: Flush Update Record into LOG-Disk before → Durability
corresponding data page goes to storage
● Rule2: Before TXN commits,
- Flush all Update Records to LOG-Disk
- Flush COMMIT Record to LOG-Disk
→ Atomicity
Transaction is committed once COMMIT

record is on stable storage
Time > 0 Time > 0
Flush Update TXN

Flush COMMIT Record
Data Flush
Record to LOG-Disk to LOG-Disk COMMIT
Rule1: For each tuple update Rule2: Before TXN commits
after
shouldhappen
all records ateflushe
to logdisk
Incorrect Commit Protocol #1
Let’s try committing
before we’ve written
either data or LOG to
T: R(A), W(A) A: 7→13 disk…
T A=13 Log-RAM
OK, Commit!
B=5
Main Memory
durable?
No
Lost T’s update!
A=7
Incorrect Commit Protocol #2
Let’s try committing after
we’ve written data but
before we’ve written LOG
T: R(A), W(A) A: 7→13 to disk…
T A=13 Log-RAM
OK, Commit!
B=5
Main Memory
durable? Yes! Except…
A=13 How do we know

whether T was
committed??
loseinformation onleg
Money Money (@4:29 am day+1) WAL (@4:29 am day+1)
Example
Monthly
bank
interest
transaction
Cost to update all data Cost to Append to log
100M bank accounts → 100M seeks? (worst + 1 seek to get ‘end of log’
case) + write 100M log entries sequentially
Performance (@10 msec/seek, that’s 1 Million secs) (fast!!! < 10 sec)
[Lazily update data on disk later,

when convenient.]
Speedup for TXN Commit

1 Million secs vs 10 sec!!!
Logging Summary
• If DB says TX commits, TX effect remains after database

crash
• DB can undo actions and help us with atomicity
• This is only half the story…

Lectures - 11 Transactions Logging

Uploaded by

Copyright:

Available Formats

Lectures - 11 Transactions Logging

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lectures - 11 Transactions Logging

Uploaded by

Copyright:

Available Formats

Transactions

Stats (Out of 90+2)

SQL INSERT INTO SmallProduct(name, price)

SQL INSERT INTO SmallProduct(name, price)

Game App User Events

App designer Systems designer Product/Biz designer

Game App User Events

App designer Systems designer Product/Biz designer

3. Properties of Transactions: ACID

Performance (@10 msec/seek, that’s 1 million secs)

Monthly ?? Did T-Monthly-423 complete?

A transaction (“TXN”) is a sequence of In the real world, a TXN

• In “ad-hoc” SQL, each statement = one transaction

• In a program, multiple statements can be grouped together as a transaction

1. Recovery & Durability: Keep the data consistent and durable.

Idea: Use LOGS. Support to “commit” or “rollback” TXNs

2. Concurrency: Get better performance by parallelizing TXNs

Idea: Use LOCKS. Run several user TXNs concurrently.

Scenario: Make a CheapProducts table, from a Products table

What goes wrong?

Now we’d be fine! We’ll see how / why this lecture

Client 1: [at 10:01 am] Client 2: [at 10:01 am]

Two managers attempt to discount products at same time -

What could go wrong?

• TXN is all or nothing

• Aborts: no changes are made

• The tables must always satisfy user-speciﬁed integrity constraints

• How consistency is achieved:

• A TXN executes concurrently with other TXNs

• Effect of TXNs is the same as TXNs running one after another

• The effect of a TXN must persist after the TXN

• ⇒ Write data to durable IO (e.g., disk)

• Many debates over ACID, both historically and currently

• Some “NoSQL” DBMSs relax ACID

• In turn, now “NewSQL” reintroduces ACID compliance to

⇒ Usually, depends on what consistency and performance your

ACID is an extremely important & successful paradigm,

1. Make TODO list. Buy 2. Actual Visit

▹ Sequential reads FASTER than random reads

Big Idea Big Idea: LOGs (or log ﬁles or ledger)

Many kinds of LOGs. We’ll study a few key ones!

What you will

This is sufficient to UNDO any transaction!

Takes 24 hours to run Case1: T-Monthly-423 was crashed

Can you infer from RED log records?

1. Rollback uncommitted transactions

What you will

Log is a file (like any

A=7 “Flushing to disk”

T: R(A=7), W(A=13) [Update Record]

• ⇒ We need to log partial results of TXNs because of:

We need to write partial results to disk!

• We’ll look at the Write-Ahead Logging (WAL) protocol

Remember: Key idea is to ensure durability

Commit after we’ve written

For each tuple update, write Update Record into LOG-RAM

Follow two Flush rules for LOG

Transaction is committed once COMMIT

Time > 0 Time > 0