Processing: 24% (EMR, Spark, Hive, Lambda, Glue, ECS) Flashcards

Be able to: a) determine appropriate data processing solution requirements b) design a solution for transforming and preparing data for analysis c) automate and operationalize a data processing solution

1
Q

Name the four data processing methods

A

a) batch, for processing of massive datasets at once
b) periodic, for unpredictable workloads
c) near real-time, for small bursts of data that must be collected and processed within minutes
d) real-time, for tiny bursts of data that must be processed continually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Name the four Hadoop modules

A

a) Common (or Core)
b) the Hadoop Distributed File System (or HDFS)
c) Yet Another Resource Negotiator (or YARN)
d) MapReduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name the difference in purpose between Hive and Presto

A

Hive is optimised for query throughput whereas Presto is optimised for interactivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly