Data Intensive Ch7 - Transactions Flashcards
Why we need transactions - what can go wrong
- Database software/hardware can fail in the middle of write
- App can crash in the middle of series of operations (f.ex. due to out of memory)
- network interruption can cut off app from db
- several clients can write to db at the same time and overwrite each other writes
- Client may read some partial, unfinished update
- Race conditions between concurrent client writes can cause bugz
Transaction
Mechanism to simplify handling “what can go wrong” scenarios
Way for an application to group several reads & writes into one logical unit - all operations are executed as one operation which either
- succeeds (commit)
- fails (abort, rollback)
If failed - app can safely retry
App need not to worry about partial failure
“Mechanizm, ktory pozwala nie martwic sie np. w sytuacji jesli musimy wykonac serie operacji np. odjac kwote z konta A i dodac do konta B tym, ze w srodku wydarzy sie katastrofa”
“Simplify the programming model for apps accessing db”
ACID
Set of safety guarantees provided by transactions
Now marketing term because one implementation might not equal another (isolation is often amigious)
Atomicity
Atomic - cannot be broken down into smaller parts
NOT about concurrency (other client sees half finished transaction)!
It’s about “All or nothing” - ability to abort transaction on error and having all writes discarded
Abortability more than atomicity
Consistency
App-specific notion of DB being in “good-state”
Idea - data has certain statements (INVARIANTS) that must be always true (credits + debits are always balance)
It’s app responsibility to define constraints, DB only enforces them and stores data
It’s property of the application actually
Joe Hellerstein remark - C in ACID was tossed in to make the acronym work
Isolation
Concurrency issues arise if several clients access the same record at the same time
Concurrently executing transactions cannot “step on each other’s toes”
Serializability - each transaction can pretend it’s the only one running on the entire DB; DB ensures on commit the final result is the same as if the actually run one after another even if it’s not true
In practice - it’s expensive so weaker isolation levels are used
Durability
226
Promise the committed transaction will not lose any data even if failure occurs
Usually involves using write-ahead log (in case data structures on disk are corrupted)
In replicated db usually means that value was replicated to N nodes
Perfect durability is not possible - what if all disks and backups are destroyed at the same time?
Dirty read
229
Violation of isolation
One trx reads another’s uncommitted writes
- Trx updates multiple objects - dirty read means trx may see only some of them updated (user sees unread emails but not the updaated counter)
- Trx aborts - any writes are rolled back - dirty read - other trx can see data that’s never actually committed
Example: listing unread emails and storing counter of unread emails separately
Why retrying aborted transaction is not perfect error handling mechanism
- Trx succeeds but network fails to ACK - client would think it failed and retry causes trx to be performed twice (requires app level de-duplication, unique ids for trxs)
- if error was due to overload retrying will only add fuel to the fire (use exponential backoff and limit retries)
- some retries are pointless (constraint violation)
- if trx has side effects outside of db they can happen even if failed trx was aborted (sending emails)
- client process fails during retrying - data it was trying to write is lost anyway…
Why concurrency bug are hard to find by testing (so you better use appropriate isolation levels)
They are rare - you need to be unlucky with the timing. Hard to repro.
It’s hard to reason about concurrency
Weak isolation levels
Non-serializable isolation level that can be used in practice when serializable is too costly for performance
Read committed
The most basic trx lvl
No dirty reads (see only committed data)
No dirty writes (can only overwrite committed data)
Default setting in Postgres and many
Implementation of read committed:
Against dirty writes - most commonly - row-lvl locks held until trx is committed/aborted; other trx must wait
Most DBs use MVCC as they support snapshot-isolation anyway
Dirty reads - use the same lock, aquire and release after reading (read can’t happen when object is dirty)
Doesn’t work well - short, read-only trx can get stuck waiting for long running trx to complete;
Most DBs return old committed value until trx holding the lock commits.
No dirty writes
For concurrent updates to the same object we can assume later write overwrites the earlies. What if earlier is part of trx not committed yet though - dirty write happens. Prevention - delay later write until trx with first one is committed
- trx updating multiple objects - Alice and Bob buys a car which requires 2 db writes. Bob wins at listing, Alice at invoices. str 236 7.5 fig
- Read committed does not prevent race condition for counter increment
Actually fun fact in PG read commited reads the latest commited value
https://www.postgresql.org/files/developer/concurrency.pdf
What is read skew
Example of nonrepeatable read.
Can happen in read commited isolation lvl.
Cannot be tolerated for:
- backups
We make copy of entire DB. During the process writes are still accepted. It’s not acceptable to have some parts of db in old and some in new state. After restore the snapshot would be wrong.
- Analytical queries, integrity checks
Must see consistent database snapshot or will report invalid results
For the above: snapshot isolation is recommended
Example:
Alice has 2 accounts, 500 each, 1000 total.
Some trx transfers 100 from one to another.
Alice is unlucky to read balance of one account before transfer trx starts and the other’s after it’s done. So she sees 900 dollars total (or 1100, depends which account is read first - “from” or “to” one).
How to implement snapshot isolation lvl?
dirty writes:
Write locks to prevent
Dirty reads:
DB keeps several different commited versions of an object/row
Transactions see consistent db state at point of time snapshot was taken
Multi version concurrency control MVCC
MVCC
Multi version concurrency control
How implemented in PG:
1. Transaction is given unique, always-increasing ID
2. When trx writes to DB data is tagged with ID of writer
3. created_by internal field for each row in table
4. deleted_by internal field for each row in table for deletes - initially empty, when delete happens trx ID of requesting trx is there; garbage collection removes rows marked for deletion after row is inaccessible
5. Update is translated into delete & create
page 240 figure 7-7
Trivia: Postgres transaction IDs
Unique Always increasing 32bit integers Overflow after ~4kkk transactions PG's vacuum process cleans up to prevent overflow from affecting the data
Snapshot isolation visiblity rules for consistent snapshot
When trx starts - get list of all concurrent trx which are in progress.
Their later-committed changes will be ignored
Writes made by aborted trx are ignored.
Writes made by trx with greater trx id are ignored.
Other writes visible.
If row was deleted by not-commited yet trx or one started later - row is still visible to the running trx
If row was created by not-commited yet trx or one started later - row is not visible
Dealing with index updates with MVCC
- Index points to all version of an object
Index query filters out versions not visible to current trx
Garbage collection removes versions no longer visible rows and index entries
Optimizations exists - if different versions of an object fits the same page - write it there -> no need to update idx
- Append-only/copy-on-write B-Trees
Creates a copy of each modified page
Parent pages up to the root are copied and updated
Pages not affected are left untouched
Every trx creates new B-tree roots
Background process for compaction/GC is required
Lost updates
Concurrent write-write conflict
Read value -> modify it -> write back
Incrementing a counter
- Easy - atomic write operators
Dataset Keywords example
-> Adding an element to the list in json document (which jsonb column essentially is)
Lost updates prevention
Atomic write operators
update set x = x + 1 where …; usually works
Explicit locking
Select for update
Auto detection of lost updates
Allow trx to execute in parallel, abort trx which commit would cause lost update. Client must retry read-modify-write cycle
PG Snapshot isolation does this
Compare-and-set
Update only happens if value to compare (which update was based on) is still current value
Conflict resolution, replication
Siblings (multiple conflicting versions returned to client)
Last write wins
Auto-merging (increment counters, Riak)
Write skew
Example:
2 doctors on call.
Both take a leave at the same time.
Consistency check min 1 doc on call won’t work as it’s based on snapshot
Other examples:
- booking a room (select bookings for the room in given period of time and update booking if one does not exist)
- Chaning username (unique username taken by 2 users concurrently)
Not a lost update or dirty write - modifies 2 separate rows
Trxs reads some objects and modifies some of them (different trx different object)
Usually there’re no multi-object constraints (can be simulated trigger or materialized view)
Serializable isolation level is one remedy
Explicit lock on rows involved is the second best
Generalization pattern:
- select made to check some condition on multiple matching rows
- app code business logic based on the result
- insert/update/delete if app goes ahead
Phantom row skew
In write skew problem when select returns no results and in the meantime a new row that would be returned is added.
What’s phantom:
One trx changes the result of a serach query of another.
Snapshot isolation prevents phantoms for read-only queries, with read-write we have a write-skew potential
Issues with isolation levels
Hard to understand and INCONSISTENTLY IMPLEMENTED in different DBs (reapeatable read is ambigious)
In large apps it’s difficult to tell if running given trx at given isolation level is safe (other business processes happen concurrently and might screw it over)
Static analysis may help but it’s meh
Testing for concurrency - hard and nondeterministic
Serializable isolation is the only foolproof option