Long Exam 2 - Distributed Systems Flashcards
What is a distributed system?
A collection of autonomous computing elements that appear as a single coherent system with autonomous computing elements (nodes).
What is it meant by a collection of autonomous nodes?
Each node is autonomous and has its own notion of time without a global clock. This however leads to fundamental synchronization and coordination problems.
What is an overlay network?
Each node in the collection communicated only with the other nodes in the system.
What are the two types of overlay networks?
Structured (well-defined set of neighbors through trees and rings) and unstructured (randomly select other nodes)
What are the four goals of a distributed system?
- sharing of resources
- distribution transparency
- openness
- scalability
What are the three types of scalability?
- size scalability
- geographical scalability
- administrative scalability
How to design fault-tolerant systems?
- Identify all possible faults
- Detect and contain the fault
- Handle the fault
What is the acronym RAID for?
Redundant Array of Inexpensive Disks
What is RAID 1?
- mirroring
- can recover form single-disk failure
- requires 2N disks
What is RAID 4?
- dedicated parity disk
- can recover from single-disk failure
- requires N+1 disk
- performance benefits if you stripe a single file across multiple data disks
- all writes hit the parity disk
What is RAID 5?
- spread out parity
- can recover from single-disk failure
- requires N+1 disk
- performance benefits if you stripe a single file across multiple data disks
- writes are spread across disks
What is isolation?
Occurs either completely before or completely after every other concurrent threads
What is the golden rule to achieve atomicity?
Never modify the only copy.
How to make renaming shadow copies atomic?
By using single-sector writes.
What is a shadow copy?
Shadow copies work because they perform updates/changes on a copy and automatically install a new copy using an atomic operation
What are the shortcomings of shadow copies?
- Hard to generalize to multiple files/directories
- Require copying the entire file for even small changes
- Haven’t even dealt with concurrency
What are transactions?
Transactions provide both atomicity and isolation. Each transaction will appear to have run to completion or not at all. When multiple transactions are run concurrently, it will appear as if they were run sequentially.
What are the three types of records used in a log?
UPDATE records include old and new values of a variable. COMMIT records specify that transaction committed. ABORT records specify that transaction aborted.
What is the drawback of using cell storage for logging?
The writes are okay but we write to disk twice instead of once. Recover is also slow as we have to scan the entire log.
What is the drawback for using cache for logging?
Recovery takes longer as the log grows. Truncating the log may help by flushing all cached updates to cell storage and writing a checkpoint record.
When does two operations conflict?
Two operations conflict if they operate on the same object and at least one of them is a write.
What is conflict serializability?
A schedule is conflict serializable if the order of all of its conflict is the same as the order of the conflict in some sequential schedule.
What is two-phase locking?
- Each shared variable has a lock
- Before any operation on a variable, the transaction must acquire the corresponding lock
- After a transaction releases a lock, it may not acquire any other locks
What are two phases in two-phase locking?
- Acquire phase, where transactions acquire locks. New locks on items can be acquired but none can be released;
- Release phase, where transactions release locks. existing locks can be released but no new locks can be acquired.