Chapter 1: Introduction, Motivation & Overview Flashcards

1
Q

Working definition for this course Distributed Systems

A

A distributed system is a system that is comprised of several physically disjoint compute resources interconnected by a network.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Leslie Lamport’s anecdotal remark

A

• “A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why build a distributed system?

A
  • Centralized system is simpler in all respects
  • Scalability limitations
  • Single point of failure
  • Availability and redundancy
  • Many resources are inherently distributed
  • Many resources used in a shared fashion
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Client‐server model: Examples

A
  • Clients and servers are often separated by network, but may also be running on the same machine
  • Clients initiate request, server awaits client requests
  • Example servers: Web server, database server, ftp server, name server, print server, mail server, file server, compute server (software servers vs. physical server nodes)

• Example clients: Web browser, email clients, chat clients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Client‐server model: Implementation challenges

A
  • Software server architecture
  • Authentication, access control, encryption, …
  • Concurrent processing of client requests
  • Concurrent access to shared resources
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

DS Challenges

A
  • How to keep server replicas consistent?
  • How to detect failures?
  • How many failures can a given design tolerate?
  • What kind of failures can it tolerate?
  • How to recover from failures?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Tiered architecture

A
  • Client - Web Server - DB
  • Web server, application server, database server
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multi‐tiered architecture

A
  • Data persistence tier <–>
  • Business logic tier <–>
  • Presentation tier <–>
  • Client <–>
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

N‐tiered architecture

A
  • Client‐server architecture style
  • Requests pass through tiers in a linear fashion
  • Logical and physical separation of functions
  • Typical function of each tier
    • Presentation (user interface)
    • Application processing (business logic)
    • Data management (data persistence)
  • Predominantly used is the 3‐tiered architecture
  • Fosters flexibility, re‐usability, modularity, separation of concerns
  • Tiers can be more independently modified, upgraded and replaced
  • Layers vs. tiers
    • Logical structuring of software vs.
    • Physical structure of infrastructure
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DNS: The Domain name system

A
  • Cornerstone of the Internet (like a phone book)
  • Maps domain names to IP addresses
    • www.example.com to IP address of host serving this domain
  • A distributed database of name servers
  • E.g. ,used by clients (Web/email) to resolve names
  • Developed to replace a centralized resolution scheme
    • Early example of a distributed system
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

DNS Name resolutoin

A
  1. What is the IP address of some‐webserver.com?
    Please reply to my IP address
  2. Q: Where can I find the IP address of some‐webserver.com?
  3. A: I don‘t know but .com Namespace should have the answer.
  4. Q: What is the IP address of some‐webserver.com
  5. A: Primary DNS Server of some‐ webserver.com knows it.
  6. Q: What is the IP address of some‐webserver.com?
  7. A: Here is the IP address of some‐webserver.com
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Web data platform

A
  • A.k.a. key‐value store (NO SQL)
  • Emerged around 2004 with Google’s BigTable,
  • Facebook’s Cassandra, Yahoo!’s PNUTS etc.
  • Data model based on keys associated with
  • K/V stores are not new, but scale of deployment and use was unprecedented
  • Backs Web properties of major Internet companies
  • Meant to manage Peta bytes of data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Summary: Distribute systems examples

A
  • Client‐server model
  • Multi‐tiered enterprise architectures
  • Cyber‐physical systems
    • Power grid and smart power grid – Cellular networks
    • ATM and banking networks
  • Large‐scale distributed systems
  • Distributed application
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Characteristics of distributed systems

A
  • Reliable
  • Fault‐tolerant
  • Highly available
  • Recoverable
  • Consistent
  • Scalable
  • Predictable performance
  • Secure
  • Heterogeneous
  • Open
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Reliability

A
  • Probability of a system to perform its required functions under stated conditions for a specified period of time
  • To run continuously without failure
  • Expressed as MTBF, failure rate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Availability & high‐availability

A
  • Proportion of time a system is in a functioning state, i.e., can be used, also as 1 – unavailable
  • Ratio of time usable over entire time
    • Informally, uptime / (uptime + downtime)
    • System that can be used 100 hrs out of 168 hrs has availability of 100/168
    • System could be up, but not usable (outage!)
  • Specified as decimal or percentage
    • Five nines is 0.99999 or 99.999% available
    • 1x9 = 36,5 days, 4x9 = 52,56 min 6x9 = 31,5s
17
Q

Availability is not equal to reliability

A
  • System going down 1 ms every 1 hr has an availability of more than 99.9999%
    • Highly available, but also highly unreliable
  • A system that never crashes, but is taken

down for two weeks

* **Highly reliable**, but only about **96% available**
18
Q

Middleware NN

A

Middleware comprises services and abstractions that facilitate the design, development, and deployment of distributed applications in heterogeneous, networked environments.

Example abstractions: Remote invocation, messaging, publish/subscribe, TP monitor, locking service, etc.

Examples: DCE, CORBA, RMI, JMS, Web services, etc.

  • Constitutes building blocks
  • Captures common functionalities
    • Message passing, remote invocation
    • Message queuing, publish/subscribe
    • Transaction processing
    • Naming, directory, security provisions
    • Fault‐tolerance, consistent views
    • Replication, availability
  • Deals with interoperability
  • Deals with system integration
  • Not directly covered in this course
19
Q

Middleware stack NN

A

Application layer

Middleware layer

Networking layers (Transport, etc.)

20
Q

Fallacies of distributed systems design

A
  • Assumptions (novice )designers of distributed systems often make that turn out to be false
  • Originated in 1994 by Peter Deutsch, SunFellow, Sun Microsystems
  • Also see“A Note on Distributed Computing”
  • The 8 fallacies
    • The network is reliable.
    • Topology doesn’t change.
    • Latency is zero.
    • There is one administrator.
    • Bandwidth is infinite.
    • Transport cost is zero.
    • The network is secure.
    • The network is homogeneous
21
Q

The network is reliable

A
  • Switches & routers rarely fail
    • Mean time between failures is very high (years!)
  • Why then is this a fallacy?
    • Power supply, hard disk, node failures
    • Incorrect configurations
    • Bugs, dependency on external services
  • Effect is that application hangs or crashes
22
Q

Selected fallacies dissected NN

A
  • Adapted from “Fallacies of Distributed Computing Explained” by Arnon Rotem‐Gal‐Oz
  • Let us look at some fallacies
    • Assumptions
    • Effects
    • Countermeasures
23
Q

The network is reliable: Implications for design

A
  • Redundancy
    • Infrastructure & hardware
    • Software systems, middleware & application
  • Catch exceptions, check error codes, react accordingly
  • Prepare to retry connecting upon timeouts
  • Acknowledge, reply or use negative acknowledgements
  • Identify & ignore duplicates
  • Use idempotent operations
  • Verify message integrity
24
Q

Latency is zero

A
  • “Making a call over the wire is like making a local call.”
  • Informally speaking
    • Latency is the time it takes for data to move from one place to another
    • Bandwidth is how much data can be transferred during that time (bits per second)
  • Latency is capped by the maximum speed of information transmission, i.e., the speed of light
    • at ~300,000 kilometres per second, round trip time between US‐Europe (~ 8K km) is ~60ms
  • Bandwidth (& its use) keeps on growing
25
Q

Latency is zero vs. cost of a method cal NN

A
  • Local call is essentially a Push and a Jump to Subroutine
  • System call is taken care of by OS (100s of assembly instructions)
  • Call across a LAN involves system calls on caller and callee and network latency
  • Call across a WAN … transmission delays etc.
  • Strive to make as few calls as possible, moving as much data as possible
  • Trading off computing for data transmitted (cf. “bandwidth is not infinite” & I/O is expensive).
26
Q

Summary: The 8 fallacies

A
  • The network is reliable.
  • Latency is zero.
  • Bandwidth is infinite.
  • The network is secure.
  • Topology doesn’t change.
  • There is one administrator.
  • Transport cost is zero.
  • The network is homogeneous.