4 - MapReduce Flashcards

1
Q

MapReduce is what kind of model? Developed by who?

A

Simple data-parallel programming model for scalability and fault tolerance

Google

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is MapReduce used by Google?

A

Index construction for Google Search
Article clustering for Google News
Statistical machine translation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Functional Programming

A

Computation as application of functions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is functional programming different?

A

Traditional notions of data and instruction not applicable
Data flows are implicit in program
Different orders of execution possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lisp

A

List Processing

Lists are primitives and functions written in prefix notation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Concept: Map

A

Do something to everything on a list to make a new list from the results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Concept: Fold

A

Combine or accumulate results in a list

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does “fold” work?

A

Accumulator set to init value
Func applied to list elem and the accumulator
Result stored in the accumulator
Repeated for every item in the list
Result is the final value in accumulator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Map example
(map (lambda(x) (*x x))
‘(1 2 3 4 5 ) -> ???

A

‘(1 4 9 16 25)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Fold example

Ignore this card for now

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When can we reorder folding?

A

If the fold function is cumulative and associative (order is irrelevant)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Is there a limit to map parallelisation?

A

No since maps are independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Apache Hadoop Architecture: Master

A

communicates with workers, tracks resources and orchestrates work

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Apache Hadoop: Worker

A

Launches and tracks processes spawned on worker host

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For a job to be parallelisable, its tasks need to be …

A

Independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does it mean for a task to be independent?

A

Must not depend on the input or output of other tasks