Scale - EMR Flashcards

1
Q

What is Elastic Mapreduce?

A

a managed Hadoop framework for processing huge amounts of data; support Apache Spark, Hbase, Presto, and Flink

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the EMR use case?

A

Most commonly used for log analysis, financial analysis
or extract, translate and loading (ETL) activities; perform data-intensive tasks for applications such as web indexing, data mining, log file analysis, machine learning, financial analysis, scientific simulation, and bioinformatics research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a step in regards to EMR?

A

A Step is a programmatic task for performing some process on the data (i.e. count words)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a cluster in regards to EMR?

A

A Cluster is a collection of EC2 instances provisioned by EMR to run your Steps.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the components of EMR?

A

Master, task and core nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is AWS Glue?

A

AWS Glue is a flexible and easily scalable ETL platform as it works on AWS serverless platform. But, on the other hand, Amazon EMR is less flexible as it works on your onsite platform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why Glue?

A

But, AWS Glue is faster than Amazon EMR being an ETL-only platform. As a serverless platform, AWS Glue has the edge over EMR in terms of operational flexibility.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is core node?

A

can host persistent data using Hadoop Distributed File System and run Hadoop task

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a task node?

A

only run Hadoop tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a master node?

A

used Hadoop to perform computations, usually one only; distributes processing across other nodes; can lunch cluster with 3 master nodes for HA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly