EMR Flashcards

1
Q

What are the different nodes for EMR?

A

Master Node
Core Node
Task Node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What file system can EMR use?

A
  1. HDFS
  2. EMRFS
  3. EBS FS
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is there data stored on Task Nodes?

A

No HDFS data is stored on Task Nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Should you use HDFS for a large number of small files?

A

No, use HBASE instead

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the two types of clusters?

A
  1. Persistent Clusters
  2. Transient Clusters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What processing Paradigm does EMR use?

A

MapReduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 7 main steps with MapReduce?

A

Input, Split, Map, Shuffle, Sort, Reduce, Output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why use HDFS?

A

Fault tolerant by replicating copies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly