Distributed Machine Learning Flashcards

Question 1

Q

What are the 7 different communication patterns?

Answer

A

Push
Pull
Broadcast
Reduce
All-reduce
Wait
Barrier

Question 2

Q

What is the push communication pattern?

Answer

A

Machine A send data to Machine B

Question 3

Q

What is the pull communication pattern?

Answer

A

Machine B requests data from Machine A

Question 4

Q

What is the broadcast communication pattern?

Answer

A

Machine A sends data to many machines.

Question 5

Q

What is the reduce communication pattern?

Answer

A

Compute some reduction of data from multiple machines and materialise the result on one machine.

Question 6

Q

What is the all reduce communication pattern?

Answer

A

Compute come reduction of data on multiple machines and materialise the results on all those machines.

Question 7

Q

What is the wait communication pattern?

Answer

A

One machine pauses its computation and wait on a signal from another machine.

Question 8

Q

What is the barrier communication pattern?

Answer

A

Many machines wait until all those machines reach a point in their execution, then continue from there

Question 9

Q

What is the key principle in distributed computing?

Answer

A

Overlapping computation and communication.

Question 10

Q

How does Stochastic Gradient Decent work with All reduce?

Answer

A

The SGD equation is split into M sub equations
Each of M machines is assigned a summation
After all gradients are computed, the outer sum can be performed using all-reduce
After the all-reduce the whole sum is present on all machines and can be used to update model parameters

Question 11

Q

What is the advantage of SGD with All-reduce?

Answer

A

The algorithm is easy to reason about as it’s equivalent to minibatch SGD. The same hyperparameters can be used. The algorithm is easy to implement.

Question 12

Q

What is the disadvantage of SGD with All-reduce?

Answer

A

While the communication for the all-reduce is happening, the workers are idle. We are not overlapping computation and communication.

Question 13

Q

Phases

How is parallel k-means implemented?

Answer

A

Parallel k-means is split into 3 phases:
1. Map
2. Combine
3. Reduce

Question 14

Q

How is the map step done in parallel k-means?

Answer

A

Compute distances between the points and k centroids
Assign points to the clusters
send the intermediary data to combiner

Question 15

Q

How is the combine step done in parallel k-means?

Answer

A

Compute centroid of each cluster
Send local sums of the value across dimensions and the number of samples

Question 16

Q

How is the reduce step done in parallel k-means?

Answer

Study These Flashcards

A

Test convergence
Update the cluster centroids
Return to Map step until convergence reache

Distributed Machine Learning Flashcards

(16 cards)