System Performance Flashcards

1
Q

What are the main performance metrics used?

A

Capacity: Consistent measure of a service’s size or amount of resources
Utilization: Percentage of that resource used for a workload
Overhead: Percentage of that utilization used for bookkeeping
Useful Work: Percentage of that utilization used for what we actually need to do

Throughput: Number of operations conducted per unit of time
Latency: Response time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Regarding latency and throughput, which metric is more important? What are the requirements for each?

A

It is application-dependent. The requirements will drive the application design.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some of the factors that can impose a limitation on the performance of the system, that are not dependent on the users nor the application?

A

Physics: (e.g. speed of light limits how fast signals can travel from one end of the chip to
another).
Economics: (e.g. we can’t throw infinite money into the problem)
Technologic: (e.g. we are dependent on the current technology offer, since each technology generation eventually hits a wall)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we identify a system’s performance limitations, for instance, latency or throughput?

A

By decomposing the system into its constituents, like a pipeline, and identifying the bottlenecks. Depending on the requirements we can target specific stages over others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In a system, if we have opportunities to improve the system’s throughput or the system’s latency, which should we choose?

A

It is impossible to know because it depends on the application’s requirements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What strategy(-ies)/technique(s) do we have to reduce latency in a request processing pipeline?

A

Exploit the common case by adding a cache.

Exploit request properties by running independent stages at the same time -> Sometimes this doesn’t improve latency but improves throughput while latency remains nearly the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What strategy(-ies)/technique(s) do we have to hide latency in a request processing pipeline?

A

Instead of taking actions to reduce latency, we can hide it by improving the throughput instead. For example, give each stage the ability to process multiple requests at the same time. Each request takes the same time (or slightly more) but we can process more requests at the same time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the difference between a concurrent and a parallel design?

A

Concurrency: the ability to make progress on more than one task at the same time (i.e. concurrently)

Parallelism: Ability to make progress on a task by splitting it into multiple subtasks that can be processed at the same time (i.e. in parallel)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What strategy(-ies) are there to improve throughput in a request processing pipeline?

A

It is possible to run stages concurrently or in parallel.

If a stage operates at full capacity, we can apply a queueing strategy. The queue size will depend on the workload, and It can be designed to answer short or longer overload bursts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When we apply a queueing strategy but the system is overloaded for big periods of time, what solutions can be applied?

A

Increase the capacity of the system if possible (within the feasible limits). Or shed load, reducing offered load, or limit offered load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In what consists of the technique Load Shedding?

A

Load Shedding is a technique used to fight the long-term overload of the system. This can be achieved by reducing or limiting the load of the system.
Reducing load: refusing to serve some requests by exploring workload properties.
Limiting load: add bounded buffers between stages, starting from the bottleneck stage, cascading until the beginning of the pipeline, if needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the studied bottleneck removal techniques?

A

Exploit workload properties
Concurrency
Queuing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the studied bottleneck fighting techniques?

A

Batching
Dallying
Speculation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe batching technique.

A

Group (batch) several requests into a single request. The cost of sending the request is amortized over the batch size.

Latency increases proportionally to the size of the batch.

Throughput, until a certain point, increases proportionally to the amortized cost.

If the batch size is very large, the system will change between periods of overload and periods of idleness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What kind of optimizations can be done when using batching?

A

If we have multiple writes on the same object, we can drop all except the last one, decreasing utilization. This depends on the workload properties.

For mediums with sequential accesses or high locality, we can proceed to a reordering of the requests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What kind of optimizations can be done when using dallying?

A

Dallying increases batching opportunities. If the batch is not full, wait until more requests arrive by delaying requests on purpose. This can improve requests that follow a sequential accessing or the locality principle. We can also take advantage of write absorption.

Basically, we can reschedule the operations/requests in a way that suits better the given workload.

17
Q

What problems can we have when using dallying?

A

Know how much time should we wait for requests. It should depend on the workload.

17
Q

What problems can we have when using dallying?

A

Know how much time should we wait for requests. It should depend on the workload.

18
Q

Describe the speculation technique.

A

Performing an operation in advance of receiving a request for it. Can decrease latency or increase if it is wrong. It should be done by idle resources, in order to guarantee that if it is wrong, it doesn’t harm the other ongoing requests.

It can be based on forecasting techniques, by identifying well-known patterns of the system.

19
Q

How can speculation create opportunities for batching and dallying?

A

Batching: if we observe read(1);read(2);read(3);, we can speculate read(4);read(5);read(6);, which allows us to fill the batch sooner.

Dallying: if we observe write(1,);write(2,);write(3,); we can speculate other writes to appear, write(1,);write(2,);write(3,), dallying the pending writes since they might not be needed.

20
Q

What are the drawbacks of using speculation?

A

Speculative work that is not useful represents overhead.

Speculation can increase the load on later stages that have less information.
The load can become higher than capacity (Negative impact on latency).

21
Q

As a means to fight the I/O bottleneck, in what ways do VMMs use exceptions for?

A
Memory-mapped files
Copy-on-write
On-demand zero-filled pages
One zero-filled pages
Virtual Shared Memory
22
Q

Describe the Memory-mapped files technique

A

Maps a file (residing on disk) to the application’s address space. Application can read and write from the file as if it was located in RAM (i.e. as if it was a regular array).

23
Q

Describe the Copy-on-write technique

A

If two threads are working on the same data concurrently, then the data can be stored once in memory by mapping the pages that hold the data with only read permissions. If one of the threads attempts to write a shared page, the virtual memory hardware will interrupt the processor with a permission exception. The handler can demultiplex this exception as an indirection exception of the type copy-on-write. In response to the indirection exception, the virtual memory manager transparently makes a copy of the page and maps the copy with read and write permissions in the address space of the threads that wants to write the page. With this technique, only changed pages must be copied.

24
Q

Describe the On-demand zero-filled pages technique

A

When an application starts, a large part of its address space must be filled with zeros—for instance, the parts of the address space that aren’t preinitialized with instructions or initial data values. Instead of allocating zero-filled pages in RAM or on disk, the virtual memory manager can map those pages without read and write permissions. When the application refers to one of those pages, the virtual memory hardware will interrupt the processor with a memory reference exception. The exception handler can demultiplex this exception as an indirection exception of the type zero-fill. In response to this zero-fill exception, the virtual memory manager allocates a page dynamically and fills it with zeros. This technique can save storage in RAM or on disk because the parts of the address space that the application doesn’t use will not take up space.

25
Q

Describe the One zero-filled pages technique

A

The virtual memory manager allocates just one page filled with zeros and maps that one page in all page-map entries for pages that should contain all zeros, but granting only read permission. Then, if a thread writes to this read-only zero-filled page, the exception handler will demultiplex this indirect exception as a copy-on-write exception, and the virtual memory manager will make a copy and update that thread’s page table to have write permission for the copy.

26
Q

Describe the Virtual Shared Memory technique

A

Several threads running on different computers can share a single address space. When a thread refers to a page that isn’t in its local RAM, the virtual memory manager can fetch the page over the network from a remote computer’s RAM.

27
Q

What are the differences between blocking, non-blocking and async I/O?

A
  • Blocking I/O: CPU processing blocks until I/O request completes
  • Non-Blocking I/O: I/O request returns immediately, returns a series of partial reads and writes
  • Async I/O: I/O request returns immediately, I/O subsystem notifies when complete.
28
Q

What are the main scheduler objectives?

A

Maximize throughput
Minimize response time
Fairness -> each request obtains an equal share of the service. An unfair scheduler is not necessarily bad - it may have higher throughput and better response time than a fair scheduler

Impossible to have all three