Big Data Flashcards

1
Q

Can you use Redshift in place of RDS?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can you use Redshift in place of RDS?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Redshift?

A

Essentially RDS for Business intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is AWS EMR?

A

Amazon Elastic MapReduce

A managed big data platform that lets us process data based on open source tools.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Amazon EMR built out of?

A

groupings of EC2 instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Do EC2 rules apply to your Amazon EMR instance?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is AWS Kinesis?

A

A service that allows you to ingest, process, and analyze real-time streaming data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of Kinesis Data Streams?

A

Real-time streaming for ingesting data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does ETL stand for?

A

Extract Transform Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the speed of Kinesis Data Streams?

A

Real time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

With Kinesis Data Streams are you responsible for creating the consumer and scaling the stream?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of Kinesis Data Firehose?

A

Data transfer tool to get information to S3, Redshift, Elsaticsearch, or Splunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the speed of Kinesis Data Firehose?

A

Near Real time (Within 1 minute)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

With Kinesis Data Firehose are you responsible for creating the consumer and scaling the stream?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Athena?

A

Serverless SQL

An interactive query service that makes it easy to analyze data in S3 using SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Glue?

A

Serverless ETL

A serverless data integration service that makes it easy to discover, prepare, and combine data. It allows you to perform ETL workloads without managing underlying servers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Can you use Athena to query logs stored in an S3 bucket?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Are Athena and Glue serverless?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Can Glue help you build a schema for your Athena queries?

A

Yes

19
Q

What is AWS Quicksight?

A

A fully managed Business Intelligence (BI) data visualization service. It allows you to easily create dashboards and share them within your company.

20
Q

What is AWS Data Pipeline?

A

A managed ETL (Extract, Transform, Load) service for automating movement and transformation of your data

21
Q

What is the Pipeline Definition for AWS Pipeline?

A

Where you specify the business logic of your data management needs.

22
Q

What is the Managed Compute service for AWS Pipeline?

A

This service will create EC2 instances to perform your activities.

23
Q

What are Task Runners for AWS Pipeline?

A

EC2 poll for different tasks and perform them when found.

24
Q

What are Data Nodes for AWS Pipeline?

A

Defines the locations and types of data that will be input and output.

25
Q

What are Activities in AWS Pipeline?

A

Pipeline components that define the work to perform.

26
Q

Is AWS Pipeline a data-driven workflow?

A

Yes

27
Q

True or False. AWS Data Pipeline does not integrate with storage solutions like Dynamo, RDS, Redshift, S3

A

False

28
Q

True or False. AWS Data Pipeline does integrate with compute solutions like EC2 and EMR.

A

True

29
Q

True or False. AWS Data Pipeline does not integrate with SNS

A

False

30
Q

What is Amazon MSK?

A

Amazon Managed Streaming for Kafka (MSK)
A fully managed service for running data streaming applications that leverage Kafka

31
Q

What is AWS Open Search?

A

The successor to Amazon Elasticsearch

OpenSearch is a managed service allowing you to run search and analytics engines for various use cases.

32
Q

_____ allows you to transform data using SQL as it’s being passed through Kinesis.

A

Kinesis Data Analytics

33
Q

How long are automatic Redshift backups retained by default?

A

1 Day

34
Q

Which AWS service would be best for analyzing large volumes of data, handling complex queries efficiently, delivering fast query performance, and having the ability to scale effectively to support future data growth?

A

Amazon Redshift

35
Q

What is a valid use case for Amazon EMR?

A

Extract, transform, and load (ETL) jobs.

36
Q

What service would you use to create a logging solution involving visualization of log file analytics or BI reports?

A

Amazon OpenSearch Service (successor to Elasticsearch)

37
Q

What type of database is Redshift?

A

Relational

38
Q

You can use _ to build a schema for your data, and _ to query the data that’s stored in S3.

A

Glue, Athena

39
Q

Which service provides the easiest way to run ad-hoc queries across multiple objects in S3 without the need to setup or manage any servers?

A

Athena

40
Q

How much data can a Redshift database hold per cluster?

A

16PB

41
Q

Which AWS service offers a fully managed way of running search and analytics engines?

A

Amazon OpenSearch Service

42
Q

What service allows you to directly visualize your data in AWS?

A

QuickSight

43
Q

Which AWS service is a good choice for visualizing and analyzing application logs?

A

Amazon OpenSearch Service

44
Q

Which of the following statements is true about AWS Glue?

A) In AWS Glue, you can specify the number of DPUs (data processing units) you want to allocate to an ETL job.

B) Auto Scaling based on a workload is NOT a serverless feature in AWS Glue.

A

A

45
Q

If you need to create a new streaming application requiring Apache Kafka as the primary component, which AWS service would be the best fit for this requirement?

A

Amazon Managed Streaming for Apache Kafka (MSK)