Distributed Systems Flashcards
(96 cards)
What are the advantages for organizing concurrent computation as threads instead of processes?
Low overhead on creation
Low overhead for context switching
Allows for easier sharing of resources
What are the advantages for organizing concurrent computation as processes instead of threads?
Allows for scalability and fault-tolerance (processes are more independent)
What is involved in converting a single, multi-threaded application into a collection of single-threaded processes. What could change about the application’s implementation and why?
- Shared resources are not shared when converted to processes, so this will need to be managed in some way.
- Communication will change to pipes/sockets. Computation that runs concurrently may need to be changed based on the way it is implemented with the threads.
- Overhead will need to be considered on startup of multiple processes and how many can run at once.
- Method of concurrent computation may need to be changed (barrier, signals, etc. instead of waiting on threads to finish)
Describe some low-level message framing strategies.
Fixed length
- you know the exact size of each message
- can easily determine if you have failed to receive the entire message
- could increase network traffic as more messages need to be sent from component to component
Variable length
- allows for more message flexibility
- need a way to determine the end of the message
- still may need to send multiple messages if a message is larger than the network allows or component allows
What are quorum-based decisions?
A minimum number of votes before a distributed transaction is carried out. Typically this is more than half of the nodes on the system.
What is a race condition?
The system’s behavior is dependent upon the timing of the logic. This can lead to inconsistencies (different results depending upon execution time) in results that the system provides.
What is the difference between two-phase and strict two-phase locking?
Both acquire a lock before critical code is executed and release a lock when it has completed critical code. Strict two-phase disallows the release of a lock until a “moderator” sends a signal for release.
How would switching from proxy to broker simplify an application?
It would not simplify the application and it could make it more complicated. A broker is more generic than a proxy and while it could lead to more flexibility fro messages within the system, it is typically more difficult to implement this kind of flexibility. You could potentially gain functionality if moving from a single proxy and nothing else to a single broker and nothing else.
How would switching from broker to proxy simplify an application?
Less flexibility in messaging would make the component’s interfaces easier to implement. The proxy is all about access control so that is all it is required to do. You could potentially lose functionality if moving from a single broker with nothing else to a single proxy and nothing else.
What is the CAP theorem?
You cannot have all three in a distributed system
Consistency: Every read gets the most recent write
Availability: Every request receives a non-error response
Partition tolerance: The system continues to operate despite an arbitrary number of drops
What are some issues with network-based communication?
Performance - latency and data transfer rate
Scalability - what hardware/software is needed to scale
Security - typical network-based security concerns
What is RPC?
Remote Procedure Call
What are some issues with RPC?
- Call by value or call by reference?
By value - a copy is passed
By reference - the actual value is passed - Byte ordering for parameters (big or little endian)
What is the difference in big- and little-endian?
Big endian - most significant bytes are first
Little endian - least significant bytes are first
What byte-ordering is important in distributed systems?
Network order - always big endian
Host byte order - depends on the machine
Access point definition
a means of access to an entity
Address definition
a location for an access point
Name definition
a string that references an entity
What is flat naming?
Names are unstructured bit strings that have no obvious connection to their referents
Example
MAC address
What is structured naming?
Names composed of multiple, supporting names
Example
IP addresses
What is attribute-based naming?
Names derived from the attributes of their referents (directory service). Given a service category, find a node that provides that service.
Example
LDAP
How to provide fault tolerance in a distributed system?
Exploit redundancy
Hot swap (backup that is up to date with the current component) ready to go on failure Distributed information (one or more components can fail but the system can be reconstructed from the remaining components)
What is page-based DSM?
Distributed shared memory based on paging.
Why used DSM over RPC?
- Entire distributed system is logically one muti-threaded application
- No need to worry about passing parameters from machine to machine