Lecture 6 Flashcards

1
Q

What’s the original MapReduce mitigation approach for Stragglers?

A
  • Run a speculative copy (called backup task)
  • The copy or original would finish first
    • Without speculative execution, the job would be slower but would improve the job response times
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When a node has an empty slot, Hadoop chooses one from the 3 categories in priority:

A
  • Failed tasks are given higher priority
  • Unscheduled tasks with local data to the node are chosen first
  • Run speculative task
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the progress score used to detect stragglers

A
  • Progress score begins from 0…1
  • *For mappers:** the score is the fraction of input data read
  • *For reducers:** the execution is divided into 3 equal phases:
  • Copy phase: percent of maps that output has been copied from
  • Sort phase: map outputs are sorted by key: percent of data merged
  • Reduce phase: percent of data passed through the reduce function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the Scheduler’s Assumptions

A
  1. Nodes can perform work at the same time/concurrently
  2. Tasks progress at a constant rate throughout time
  3. No cost starting a speculative task
  4. Too many speculative tasks can take away resources from other running tasks
  5. Tasks finish in “waves” so tasks with low progress score is likely a straggler
  6. Task’s progress is equal to the fraction of the total work
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the LATE scheduler

A

Speculatively execute the task with largest estimated finish

“Longest Approx. Time to End”

Look forward than backwards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Sanity thresholds of LATE scheduler

A
  • Cap number of backup tasks
  • Launch backups on fast nodes
  • Only back up tasks that are sufficiently slow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Hadoop?

A
  • An open-source framework for intensive distributed applications
  • Inspired by Google’s MapReduce + GFS
  • Implemented in Java
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe the Hadoop master and slaves

A
  • One machine in the cluster is designated as ‘NameNode’ or ‘JobTracker’.
  • The rest of the other machines are in the cluster are ‘DataNode’ or ‘TaskTracker’ – “slaves/worker nodes.”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly