Big Data Lecture 10 Performance at Large Scales Flashcards by Mabel Wylie

With 1000s of nodes in a system, what will be the performance distribution over them?

Most nodes will take around the mean time, but some nodes will take extremely long: phenomenon of tail latency.

How well did you know this?

Not at all

Perfectly

CPU, Memory, Disk, Bandwith: which one is usually the bottleneck?

That depends, but it is only one of them! Almost never two or more!

How well did you know this?

Not at all

Perfectly

When should we be using systems such as Spark or MapReduce?

When system I/O is the limit of the system! This is because on one system disk is limiting us and we need parallel writes and reads!

How well did you know this?

Not at all

Perfectly

What are two different definitions of latency?

Some refer to latency as the time when data starts arriving, and some to the time when all of the data arrives (= + delivery time).

How well did you know this?

Not at all

Perfectly

Prefix 0.001 in words?

Milli (m).

How well did you know this?

Not at all

Perfectly

Prefix 0.000 001 in words?

Micro (mu).

How well did you know this?

Not at all

Perfectly

Prefix 0.000 000 001 in words?

Nano (n).

How well did you know this?

Not at all

Perfectly

Prefix 0.000 000 000 001 in words?

Pico (p).

How well did you know this?

Not at all

Perfectly

What is latency of one instruction execution on CPU?

1 ns

How well did you know this?

Not at all

Perfectly

What is the latency of fetch from main memory?

100 ns

How well did you know this?

Not at all

Perfectly

What is the latency of fetching from new disk location?

8 ms

How well did you know this?

Not at all

Perfectly

What is the latency of reading internet packet to US and back?

150 ms

How well did you know this?

Not at all

Perfectly

What is the throughput of standard Fast Ethernet?

100 Mbit/s

How well did you know this?

Not at all

Perfectly

What is the throughput of write/read to SSD?

I/O 240/270 MB/s.

How well did you know this?

Not at all

Perfectly

What is the definition of total response time?

NAME?

How well did you know this?

Not at all

Perfectly

What is speedup?

Study These Flashcards

Old latency / new latency.

What is Amdahl’s law for speed up?

Study These Flashcards

For constant problem size 1/(1 - p +p/s) where p is the fraction of how much we can parallelize and s speedup on parallelizable part.

What is Gustafson’s law for speedup?

Study These Flashcards

For constant computing power 1 - p + sp where p is the fraction of how much we can parallelize and s speedup on parallelizable part.

How to avoid memory scale up?

Study These Flashcards

Optimize classes that are instantiated billions of times, remove redundancy and inheritance, unused fields and use efficient data structures.

How to avoid Disk I/O scale up?

Study These Flashcards

Use efficient formats and compression!

How to avoid I/O scale up?

Study These Flashcards

Push down computation closest to the data. Batch process (send larger packets when possible).

What is a good rule of thumb to how many nodes we should have in a system? Why?

Study These Flashcards

Usually around the square root of the number of available cores and memory capacity. We want liquidity of tasks in slots, but at the same time not to overload the network in with extreme amount of communication.

Describe the phenomenon of tail latency.

Study These Flashcards

With rising number of tasks the probability of some task taking a longer than expected time goes to 1.

What is the reason for tail latency phenomena?

Study These Flashcards

Queues, power limits (hyperthreading), garbage collection and energy management.

What formula describes the probability of some node taking more super long?

Probability of taking long for one node is p.

Probability of not taking long is 1 - p.

Hence probability of all not taking long is (1 - p)^n.

Hence probability of all taking long is 1 - (1 - p)^n.

With p < 1 this will go to 1 with n going to infinity.

How to solve tail latency?

Hedge request: duplicate calls, terminate when one finishes (duplicates load tho),
Deffered hedge requests: if a task take more than 95% of task usually take, restart it (increases load in practice by 2% and improves tail 10 times).

How to avoid CPU scale up?

Remove gigantic loops, do not override, do not use classes, do not cast and use exceptions.

Big Data Lecture 10 Performance at Large Scales Flashcards

(27 cards)