Chapter 4: Distributed Systems in Context Flashcards

1
Q

Why does DNS work?

A
  • Hierarchical, federated system
    • Delegation of requests
  • Distributed world-wide
  • Caching at multiple levels
    • Trusting cached result through signatures
  • Based on UDP
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

DNS Attacks? How to mitigate?

A
  • Type of denial-of-service attack
    • Attackers use EDNS0, an extension of the DNS protocol
    • http://www.rfc-editor.org/rfc/rfc2671.txt
  • In certain cases a 36 byte long requests can trigger a 3000 bytes long reply => amplification of ~100x
  • Last attack March 19, 2013 Spamhaus
    • 75Gbps DDOS
    • Mitigated by load balancing requests
    • http://blog.cloudflare.com/the-ddos-that-knocked- spamhaus-offline-and-ho

Mirigation:

  • DNSSEC – DNS security extensions
  • Resource records digitally signed
  • Recursive resolver validates data integrity and origin authentication
  • Deployed in 2010, not everywhere available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are key-value stores?

A
  • Container for key-value pairs
  • Distributed multi-component systems
  • Category of NOSQL-databases (not only SQL)
  • Databases that break with philosophy of traditional (relational) DBMS to overcome their idiosyncrasies (weaknesses ?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Key value store compared to DBMS - SQL

A

DBMS

  • Relational data schema
  • Normalized data model
  • Datatypes
  • Relationsbetween tables (e.g., foreign key)
  • Row-orientation

KVS

  • No data scheme or modifiable data schema
  • De-normalizedmodel
  • Raw byte access
  • No relations
  • Column-orientation
  • Distributed data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are key-value stores needed?

A
  • Today’s internet applications
    • Huge amounts of stored data (PB (1015 bytes)+)
    • Huge number of users (e.g., 1.11 billion)
    • Frequent updates
    • Fast retrieval of information
    • Rapidly changing data definitions
  • Ever more users, ever more data
  • Incremental scalability
    • User growth, traffic patterns
    • Adapt to number of requests & data size
  • Flexibility
    • Adapt to changing data definitions
  • Reliability
    • Thousands of components at play
    • Provide recovery in presence of failure
  • Availability
    • Users are worldwide
    • Guarantee fast access
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Key-Value store client interface

A
  • Main operations
    • Write/update - put(key, value)
    • Read - get(key)
    • Delete - delete(key)
  • Usually no aggregation, no table joins
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Common features of key-value stores

A
  • Flexible data model
  • Horizontal partitioning
  • Store big chunks of data as raw bytes
  • Fast column-oriented access
  • Memory store & write ahead log (WAL)
    • Keep data in memory for fast access ( Bath / Fasw had access)
    • Keep a commit log as ground truth ( Basis )
  • Versioning
    • Store different versions of data
  • Replication
    • Store multiple copies of data
  • Failure detection & failure recovery
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Common non-features

A
  • Integrity constraints
  • Transactional guarantees, i.e., ACID
  • Powerful query languages
  • Materialized views
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Bigtable components

A
  • Client library
  • Master
    • Metadata operations
    • Load balancing
  • Tablet server
    • Data operations (read / write)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Master

A
  • Assigns tablets to tablet servers
  • Detects addition and expiration of tablet servers
  • Balance tablet server load
  • Etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Tablet server

A
  • Manages a set of tablets (up to a thousand)
  • Handles read and write requests for the tablets it manages
  • Splits tablets that have grown too large
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Bigtable building blocks

A
  • Chubby
    • Metadata storage
  • GFS (Google file system)
    • Data log, storage
    • Repiclation
  • Scheduler
    • Monitoring
    • Failover
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Chubby “& ZooKeeper lock service

A
  • Highly-available, persistent, distributed lock service
  • Use in Bigtable
    • Ensure at most one active master at any time
    • Store bootstrap location of data (root tablet)
    • Discover tablet servers (manage their lifetime)
    • Store schema information
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Representation & use of configuration information with lock service

A

Managed via directories, small files Directories, files can serve as locks Reads, writes are atomic.

Clients maintain sessions If session lease expires and can’t be renewed, locks are released.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Lock service in context

A
  • Comprised of five active replicas
    • Consistently replicate writes
  • One replica is designated as master
    • We need to elect the master (leader)
  • Service is life when:
    • majority of replicas are running and
    • can communicate with one another
    • A quorum needs to be established
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Core mechanisms Big TAble

A
  • Keep replicas consistent in face of failures
    • Paxos algorithm based on replicated state machines
    • Atomic broadcast
  • Ensure one active Bigtable master at any time
    • Mutual exclusion, but in a distributed setting
    • Relevant lectures
      – Cf. lect. on Paxos & RSM

– Cf. lect. on coordination – Cf. lect. on replication

17
Q

Chubby example: leader election

A
  • Electing a primary (leader) process: emulate by acquiring an exclusive lock on a file
    • Clients open a lock file and attempt to acquire the lock in write mode
    • One client succeeds (i.e., becomes the leader) and writes its name to the file } Successor
    • Other clients fail (i.e., become replicas) and discover the name of the leader by reading the file
    • Primary obtains sequencer and passes it to servers (servers know that primary is still valid)
18
Q

HBase architecture overview

A
  • Client
    • Issues put, get, delete operations
  • Zookeeper (Chubby)
    • Distributed lock service for HBase components
  • HRegion (tablet)
    • Tables are split into multiple key-regions
  • HRegionServer (tablet server)
    • Stores actual data
    • Can host multiple key-regions (tablets)
    • Answers client requests
  • HMaster (master)
    • Coordinates components
      • Startup, shutdown, failure of region servers
      • Opens, closes, assigns, moves regions
    • Not on read or write path
  • Write Ahead Log (WAL)
  • MemStore
    • Keeps hot data in main memory
  • HDFS
    • Underlying distributed file system
    • Table data is stored as HFile
    • Replicates data over multiple data nodes
19
Q

HBase global read-path

A
20
Q

HBase write path

A
21
Q

HBase adding of components

A
  • Components can be added on-the-fly
  • Add multiple backup master servers
    • Avoid single point of failure
    • In case of crash, backup master takes over
  • Add multiple region servers
    • Additional capacity can be added
    • Master takes care of load balancing
22
Q

Core mechanisms HBase nn

A
  • Heavy involvement of ZooKeeper for coordination tasks
    • Cf. lect. on Paxos & RSM (i.e., a look inside)
  • HBase reliability relies on HDFS (replication)
    • Cf. lect. on replication
23
Q

Hbase storage unit failure

A
24
Q

Cassandra architecture overview

A
25
Q

Cassandra global read-path

A
26
Q

Cassandra global write-path

A
27
Q

Incremental scaling in Cassandra

A
28
Q

Storage unit failure cassandra

A
29
Q

Core mechanisms cassandra

A
  • Incremental scalability
    • Cf. Lect. on consistent hashing
  • Reliability
    • Cf. Lect. on replication (e.g., quorum protocols)
  • Membership management
    • Cf. Lect. on replication (e.g., gossip protocols)
  • Uses leader election as part of replication
    • Cf. Lect. on coordination