Big Data Lecture 10 Performance at Large Scales Flashcards

(27 cards)

1
Q

With 1000s of nodes in a system, what will be the performance distribution over them?

A

Most nodes will take around the mean time, but some nodes will take extremely long: phenomenon of <b>tail latency</b>.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CPU, Memory, Disk, Bandwith: which one is usually the bottleneck?

A

That depends, but it is only one of them! Almost never two or more!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When should we be using systems such as Spark or MapReduce?

A

When system I/O is the limit of the system! This is because on one system disk is limiting us and we need parallel writes and reads!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are two different definitions of latency?

A

Some refer to latency as the time when data starts arriving, and some to the time when all of the data arrives (= + delivery time).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Prefix 0.001 in words?

A

Milli (m).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Prefix 0.000 001 in words?

A

Micro (mu).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Prefix 0.000 000 001 in words?

A

Nano (n).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Prefix 0.000 000 000 001 in words?

A

Pico (p).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is latency of one instruction execution on CPU?

A

1 ns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the latency of fetch from main memory?

A

100 ns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the latency of fetching from new disk location?

A

8 ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the latency of reading internet packet to US and back?

A

150 ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the throughput of standard Fast Ethernet?

A

100 Mbit/s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the throughput of write/read to SSD?

A

I/O 240/270 MB/s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the definition of total response time?

A

NAME?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is speedup?

A

Old latency / new latency.

17
Q

What is Amdahl’s law for speed up?

A

For constant problem size<br></br>1/(1 - p +p/s)<br></br>where <i>p </i>is the fraction of how much we can parallelize and <i>s</i> speedup on parallelizable part.

18
Q

What is Gustafson’s law for speedup?

A

For constant computing power<br></br>1 - p + sp<br></br>where <i>p </i>is the fraction of how much we can parallelize and <i>s</i> speedup on parallelizable part.

19
Q

How to avoid memory scale up?

A

Optimize classes that are instantiated billions of times, remove redundancy and inheritance, unused fields and use efficient data structures.

20
Q

How to avoid Disk I/O scale up?

A

Use efficient formats and compression!

21
Q

How to avoid I/O scale up?

A

Push down computation closest to the data.<br></br><br></br>Batch process (send larger packets when possible).

22
Q

What is a good rule of thumb to how many nodes we should have in a system? Why?

A

Usually around the square root of the number of available cores and memory capacity.<br></br><br></br>We want liquidity of tasks in slots, but at the same time not to overload the network in with extreme amount of communication.

23
Q

Describe the phenomenon of tail latency.

A

With rising number of tasks the probability of some task taking a longer than expected time goes to 1.

24
Q

What is the reason for tail latency phenomena?

A

Queues, power limits (hyperthreading), garbage collection and energy management.

25
What formula describes the probability of some node taking more super long?
Probability of taking long for one node is p.

Probability of not taking long is 1 - p.

Hence probability of all not taking long is (1 - p)^n.

Hence probability of all taking long is 1 - (1 - p)^n.

With p < 1 this will go to 1 with n going to infinity.
26
How to solve tail latency?
  • Hedge request: duplicate calls, terminate when one finishes (duplicates load tho),
  • Deffered hedge requests: if a task take more than 95% of task usually take, restart it (increases load in practice by 2% and improves tail 10 times).
27
How to avoid CPU scale up?
Remove gigantic loops, do not override, do not use classes, do not cast and use exceptions.