Big Data Flashcards

1
Q

What is Redshift?

A

A fully managed, petabyte scale data warehouse service in the cloud.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How much information can Redshift hold?

A

16 petabytes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Is Redshift relational?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is typical use case for Redshift?

A

Business Intelligence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Is Redshift a better RDS?

A

No, Redshift is not meant to replace RDS’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is EMR?

A

A managed big data platform that allows you to process vast amounts of data (AWS”s ETL tool)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Kinesis?

A

Allows you to ingest process and analyze real time streaming data. (think of it as a huge data highway)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Kinesis data streams for?

A

the real time streaming for ingesting data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is kinesis data firehose for?

A

data transfer tool to get information to S3, Redshift, elasticsearch, or spunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the downside to Kinesis data stream?

A

A lot of work to set up (specify shards and data consumer)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What can kinesis data firehose be thought of as

A

a simpler data stream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Kinesis data analytics?

A

allows us to analyze data in the pipeline using standard data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When would you choose Kinesis over SQS for messages?

A

If messages need real time delivery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does kinesis data stream or kinesis data firehose automatically scale?

A

data streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is AWS Athena?

A

An interactive query service that makes it easy to analyze data in S3 using SQL. This allows you to query from S3 without uploading it to database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is AWS Glue?

A

A serverless service that allows you to perform ETL workloads without managing underlying servers

17
Q

If you are ever needing serverless SQL, what should you think of?

A

Athena

18
Q

What is Quicksight?

A

A fully managed business intelligence data visualization service

19
Q

What is AWS data pipeline?

A

a managed ETL service for automating movement and transformation of your data

20
Q

What is a pipeline definition in regards to AWS data pipeline?

A

where you specify the business logic of your data

21
Q

How do you create dependencies between tasks and activities?

A

data driven workflows

22
Q

What service can you use with AWS data pipeline to alert you of any failures?

A

AWS SNS

23
Q

Does AWS data pipeline have automatic retries for data driven workflows?

A

Yes

24
Q

What does Amazon MSK stand for?

A

Amazon managed streaming for apache kafka.

25
Q

What is Amazon MSK?

A

a fully managed service for running data streaming applications that leverage apache kafka

26
Q

Does Amazon AFK have automatic detection and recovery?

A

Yes

27
Q

What is Amazon MSK Serverless?

A

A cluster type within Amazon MSK offering serverless cluster management with automatic provisioning and scaling

28
Q

What is MSK Connect?

A

Allows developers to easily stream data to and from Apache kafka clusters

29
Q

What is Amazon Open Search service?

A

a managed service allowing you to fun search and analytics engines for various use cases

30
Q

What is the successor to Amazon Elastics Search Service?

A

Amazon Open Search

31
Q

What is typically the best tool for visualizing log file analytics or BI reports?

A

Amazon Open Search

32
Q

What type of database is Redshift?

A

relational

33
Q

Why can’t you do multi-AZ deployments with Redshift?

A

You can

34
Q

What service offers real time streaming of data?

A

Kinesis data streams