622 Flashcards
(208 cards)
Eight pleasant thoughts
-The network is reliable
-The network is secure
-The network is homogenous
-The topology does not change
-Latency is zero
-Bandwidth is infinite
-Transport cost is zero
-There is one administrator
What is a distributed system?[Tanenbaum]
A collection of independent computers that appears to its users as a single coherent system
Three key characteristics [Tanenbaum]
Multiple machines are autonomous
Software lets users see a single system
System easy to expand without user noticing
What is a distributed system?[Webopedia]
A type of computing in which different components and objects comprising an application can be located on different computers connected to a network.
Key requirement:set of standards that specify how objects communicate with one another(e.g. CORBA, DCOM, REST, …).
[Wikipedia] distributed computing:
decentralized and parallel computing, using two or more computers communicating over a network to accomplish a common objective or task.
Note:The types of hardware, programming languages, operating systems and other resources may vary drastically. It is similar to computer clustering with the main difference being a wide geographic dispersion of the resources.
Challenges of DS
-latency of communication
-coordination
-shared resources and mutual exclusion
-ordering, deadlock and live-lock
-timing
-adaptation to change
-failures, soft faults, and optimization
-service discovery and configuration
-heterogeneity and third-party software
-scalability and evolution
-security and privacy
-trust on machines, software, communications & other users
Advantages of DS
- processing capacity
- fault tolerant, evolving, scalable
-explicit control, preferences
Replicas
Often useful to have same task performed by multiple components so all have to fail for task to fail
What if data must be shared between components?
Often one component is “master” (aka “original” or “authoritative version”)
The other components are copies from the master
This may be apportioned, e.g., a component may be a master for just some portion like “names A-K”
Confusion in counting the number of “replicas”:
Some might not include the “master” in the count of replicas
Some might include the “master” (“replicas of each other”)
Make sure you know if the “master” is included
how to solve for number of replicas
If we assume independence and:
F = Probability that one replica fails in time period, F≠1
n = (natural) number of components (e.g., replicas including master). Thus F^n is the probability all n will fail simultaneously
G = Goal, permitted probability of total system failure where all n replicas fail (including original)
F^n ≤ G or
n ≥ (log G)/(log F)
Independence assumption
These calculations assume independent failures
Reasonable model for many hardware failures
Software failures often not independent
Knight & Leveson [1986] found via experiment that software faults are not independent
Thus “N-version programming” doesn’t lead to the reliability increase you might predict
It can be helpful, but less than you’d think
Caching: Special case of replication
Make cop(ies) of a resource (data)
Often happens on demand
Other replication approaches often planned & executed in advance
Challenges of DSexample: replication has downsides
buy more hardware
administration costs
software upgrades
load balancing
performance overhead
more complex software
consistency problems
sometimes tolerable
hiding access
hide differences in data representation and how a resource is accessed
(conversion of complex fortmats. latency vs fidelity of access)
hiding location
hide where a resource is located (trusted hosts, different performance, difference capabilities and network access)
migration
hide that a resource may move to another location (trusted hosts, different performance, difference capabilities and network access)
relocation
hide that a resource may be moved to another location while in use (trusted hosts, different performance, difference capabilities and network access)
replication
hide the fact that several copies of a resource exist (select server based on QoS)
concurrency
hide that a resource may be shared by several competitive users(cannot hide sharing of resources: they’re consumed , data is modified by others)
failure
hide the failure and recovery of a resource(unexplained behavior)
persistence
hide whether a (software) resource is in memory or on disk(someone needs to decide whether an object is persistent and commit it to disk)
awareness and adaptation
separate decisions from (controllable) mechanisms
Some measures (per Neumann) for system scaleability
Size
Users & resources
Geographical
May lie far apart
Administrative
May span many
independent
administrative
organizations
Decentralized algorithms
No machine has complete information about the system state.
Machines make decisions based only on local information.
Failure of one machine does not ruin the algorithm.
There is no implicit assumption that a global clock exists
Asynchronous communication
Hiding communication latencies important for geographical scaleability
Max speed is speed of light in vacuum (~3.00×108 m/s)
Information transfer through material normally less
Physical components have other performance latencies
Software takes time to execute once it receives data
Sending information and waiting for reply is synchronous communication
Alternative: Asynchronous – send information, don’t wait for reply