Map Reduce Flashcards

1
Q

Which technology is Map Reduce a part of?

A

Hadoop. Hadoop consists of HDFS and Map Reduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the data format of the Input in Map Reduce?

A

(Key, Value) pairs, of arbitrary serializable types, that should fit in memory

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What strategy should be employed when cluster components fail during computation in Map Reduce?

A

To address cluster component failures, it is advisable to parallelize computation into small tasks. In the event that a task fails to deliver results, the recommended approach is to restart that specific task.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Where does the data come from in Map Reduce?

A

The (H)DFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 4 steps in the Map task?

A
  • Read (key, value) pairs in input, from the DFS
  • One Map task per pair (Will be scheduled on/near the machine where the input is).
  • Computes a number of (key, value) pairs, decided by you
  • Outputs to the (local!) disk in a buffer region
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the main operation of the Shuffle (Master controller) task?

A

Keeps track of the (key, value) pairs in the output of all Map tasks. It then does a distributed group by key operation, which outputs the key(s) and its list of values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What 3 qualities defines the Reduce task?

A
  • One reduce works on one key at a time
  • Computes a combined value per key, decided by you
  • The output is saved to (H)DFS files (one reduce per task)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the 3 switch levels of data acquisition from HDFS to the MapReduce task, in order of fastest to slowest?

A
  • Data Local (On the same machine in the rack)
  • Rack Local (On the same rack, but different machines)
  • Off Rack (Between racks)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In general terms, what does the MapReduce task do?

A

It compresses several data entries of the same value, to a single self-specified new key, value (often count). Example: given the input “w1, w2, w3, w2, w3, w3, w3”, the output could be “(w1, 1), (w2, 2), (w3, 4)”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly