P4L2: Distributed File Systems Flashcards

1
Q

Describe a stateless server for a distributed service?

A

A stateless server does not store state information, so it doesn’t know:

  • which files are being accessed
  • what operations are being performed
  • how many clients are accessing how many files
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe a stateful server for a distributed service design?

A

A stateful server stores information about clients, which files are being accessed, which types of accesses, which clients have cached which file, and which clients have read/written a file.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the tradeoffs of a stateful server?

A

Pros:

  • Allows a server to guarantee consistency, hence caching can be used.
  • Allows for locking
  • Allows for fetching relative blocks (the next kb of data instead of absolute offset).

Cons:

  • The state needs to be recovered on failure, so check-pointing is necessary
  • Maintaining state and enforcing consistency incurs overheads on the server side
  • On the client side, enabling caching also adds overhead
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the tradeoffs of a stateless server?

A

PROS

  • there is no CPU/memory utilization needed to manage the state
  • Resilience: these servers can be restarted after failure with no affect on the client

CONS

  • every request must be self-contained. These require more bits per request
  • It can’t manage file consistency, so it can’t be used with a system that relies on caching
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the tradeoffs between replication and partitioning in DFS?

A

REPLICATION
Pros:
- fault tolerant because, if a machine fails, others still make the files available.
- highly available because any machine can service any request.

Cons:
- NOT scalable (need to increase capacity on each machine)

PARTITIONED
Pros:
- highly scalable: if you need to support more files, just add more machines!

Cons:

  • NOT fault tolerant
  • NOT highly available
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an alternative to replication and partitioning in DFS?

A

An alternative is a system of peers in which every machine both maintains files and services requests (blurring distinction between servers and clients). Each peer handles some portion of the load, typically for files local to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the tradeoffs involved in caching on DFS?

A

A compromise between upload/download and true remote file access is to allow caching (with prefetching).

PROS:

  • This reduces latencies on file operations (helps the client side)
  • Reduces server load (helping servers)

CONS:

  • The client now needs to interact with the server more frequently, notifying servers of any modifications, and querying servers for modifications made by other clients
  • It’s a more complex process
  • The client now needs to understand file sharing semantics different from a normal filesystem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are two extreme DFS models?

A

At one extreme, we have the upload/download model. When the client wants to modify a file, it downloads the whole thing, modifies, then uploads it back to the server.

At the other extreme, we have true remote file access. The file remains on the server and every single operation has to be sent over the network to the server. The client does not use local caching or buffering.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe how the empirical data in the Sprite caching paper motivates the Sprite design

A

33% of all file accesses are writes. This implies that caching could help performance! 2/3 of accesses are reads and therefore would benefit. But what about the writes? Caching is good, but write-through wouldn’t be sufficient because it doesn’t benefit the write operation in any away. So would session semantics help?

75% of files are open less than 0.5 seconds and 90% are open less than 10 seconds. This implies session semantics are no good! We’d have to update most files after only 1/2 second and almost all of them within 10 seconds. Too much overhead!

20-30% of new data is deleted within 30 seconds and 50% is deleted in 5 minutes/ This means write-back on close is unnecessary. We’ll just be writing it to the server and then it will get deleted anyway.

All of the decisions so far are unfriendly to concurrent access (no write-through and no session semantics). But it turns out that file sharing is rare on their system! This means we don’t have to optimize for it. (still support it though)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the final design decisions for the Sprite system?

A

Cache with write-back every 30 seconds. Data younger than 30 secs is likely to be modified again soon, i.e., client is still working on it. Note that after 30 seconds, we’re past the point where 20-30% of data is being deleted anyway.

When a client opens a file, the server checks whether another client is working on it and, if so, retrieves the dirty blocks.

This requires that all open calls go to the server, so directories cannot be cached.

When concurrent writes occur (however rare), Sprite disables caching and all writes are serialized on the server.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are Sprite’s sharing semantics?

A

Sequential write sharing: caching and sequential semantics

Concurrent write sharing: no caching at all. Cost of this is low since sharing is rare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What kind of data structures are needed on the server and client sides to support the Sprite system?

A

Per file, the client tracks:

  • whether the file is cached
  • which blocks are cached
  • timer for each dirty block
  • version number

Per file, server tracks:

  • readers
  • writers
  • version
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a common compromise between partitioning and replication?

A

A common compromise is to partition the files among machines, then replicate each partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the pros/cons of the upload/download model?

A

PROS
- Modifications are done locally, hence quickly, with no network overhead.

CONS

  • Downloading the entire file is an expensive operation for making potentially small changes.
  • Takes control away from the server: once the server gives a file up, it has no idea what the client is doing or when it will be given back.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly