13. Transactions, Concurrency, and Recovery Flashcards
(6 cards)
Transaction:
A sequence of operations on a database that is treated as a single logical unit of work and recovery. It must either complete entirely or have no effect at all. Initiated with BEGIN, ended with COMMIT (success) or ROLLBACK (failure/undo).
ACID Properties:
Desirable properties ensuring data integrity and reliability.
◦ Atomicity: “All or Nothing”. Ensures that if a failure occurs before commit, all changes are undone (rolled back).
◦ Consistency: A transaction transforms the database from one consistent state (all constraints met) to another. State might be inconsistent during the transaction.
◦ Isolation: Concurrent transactions do not interfere with each other. Each transaction appears to execute as if it were running alone. Prevents viewing uncommitted changes (“dirty reads”) and overwriting other transactions’ changes (“lost updates”).
◦ Durability: Once a transaction is committed, its changes are permanent and survive subsequent failures (system crashes, power loss, etc.). Guaranteed by logging and recovery mechanisms.
Concurrency Problems (due to insiffucient Isolation):
◦ Lost Update: A transaction’s update is overwritten by another concurrent transaction before the first one commits.
◦ Temporary Update (Dirty Read): A transaction reads data modified by another transaction that has not yet committed. If the first transaction then rolls back, the data read by the second transaction is invalid.
◦ Incorrect Summary (Non-repeatable Read variation): A transaction calculates an aggregate (like SUM) over data that is being modified by concurrent transactions, leading to an inconsistent result.
Transaction Schedules:
An ordering of operations from multiple concurrent transactions.
◦ Serial Schedule: All operations of one transaction execute consecutively before the next transaction starts. Guaranteed to produce a correct result.
◦ Serializable Schedule: A non-serial schedule that produces the same result as some serial schedule of the same transactions.
◦ Conflict: Two operations conflict if they are from different transactions, access the same item, and at least one is a write. Conflicts can lead to non-serializable schedules.
Concurrency Control:
Mechanisms to ensure serializability despite concurrent access.
◦ Locking: Acquiring locks on data items (tables, rows) before accessing them.
▪ Read (Shared - S) Lock: Multiple transactions can hold an R lock on the same item.
▪
Write (Exclusive - X) Lock: Only one transaction can hold a W lock; conflicts with both R and W locks.
◦ Two-Phase Locking (2PL): A protocol where a transaction acquires locks in an expanding phase and releases them in a shrinking phase, never acquiring new locks after releasing one. Guarantees serializability. Strict 2PL (most common) holds write locks until commit/rollback. Locking prevents lost updates, dirty reads, and incorrect summaries.
◦ Deadlock: Two or more transactions are in a waiting state, each waiting for a lock held by one of the others, creating a cycle. Detected by a wait-for graph (cycles indicate deadlock). Prevention methods include timestamp-based schemes (Wait-Die, Wound-Wait) or timeouts.
◦ Isolation Levels: SQL standard specifies levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable) that trade off performance for strictness. Higher levels prevent more concurrency problems. Read Committed prevents dirty reads, Repeatable Read prevents incorrect summaries. Serializable is the strictest.
Database Recovery:
Restoring the database to a consistent state after a failure, ensuring durability and atomicity. Necessary because main memory contents are lost on failure. Redo committed transactions, undo incomplete transactions.
◦ Transaction Log (System Log): A sequential file on disk recording all database changes. Contains entries like start/commit/rollback of transactions, write operations (with old/new values). Crucial for recovery.
◦ Commit Point: The point where all transaction operations ran and their effects are logged, ensuring durability. A commit record is logged.
◦ Write-Ahead Logging (WAL): Protocol requiring log entries (specifically UNDO information - BFIM) to be flushed to disk before the corresponding data changes (AFIM) are written to disk. Ensures recovery information is available.
◦ Buffer Flushing Strategies: When data buffers are written to disk.
▪ Steal (Immediate Update): DBMS can write modified uncommitted data to disk. Requires UNDO during recovery.
▪ No-Steal (Deferred Update): DBMS only writes modified data to disk on commit. NO-UNDO needed during recovery.
▪ Force: On commit, all transaction’s modified buffers must be flushed. NO-REDO needed.
▪ No-Force (Lazy Update): Flushing can occur after commit. REDO needed.
▪ DBMSs often use Steal + No-Force.
◦ Checkpoints: Periodic events that flush modified buffers to disk and record active transactions in the log. Reduces the amount of log scanning needed for recovery.
◦ Recovery Algorithms: Procedures run after failure. NO-UNDO/REDO for deferred update (redo committed transactions forward from checkpoint). UNDO/REDO for immediate update (undo active transactions backward, redo committed transactions forward from checkpoint).
◦ Backups: Full and incremental backups are needed for catastrophic failures (e.g., log disk failure). Recovery involves restoring backups and applying log data.
◦ Distributed Transactions: Involve operations on multiple sites. Requires protocols like Two-Phase Commit (2PC) to ensure atomicity and durability across sites.