Big Data Flashcards

1
Q

Redshift

A
  • a fully managed petabyte-scale data warehouse

- a very large relational database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Redshift - size

A
  • up to 16 PB of data per cluster. (You don’t have to split up large data sets)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Redshift - relational

A
  • use your standard SQL and BI tools to interact with it
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Redshift use cases

A
  • BI applications

- not a replacement for standard RDS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Limitations of Redshift

A
  • not highly available

- can only exist within one AZ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

EMR

A

Elastic Map Reduce

  • ETL (Extract Transform Load)
  • an AWS managed big data platform that allows you to process vast amounts of data using open source tools such as Spark, Hive, HBase, Flink, Hudi and Presto
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

EMR exam tips

A
  • opensource cluster
  • a managed fleet of EC2 instances running open source tools
  • EC2 rules apply - use spot instances and RIs to reduce your costs
  • it processes and moves data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Kinesis

A
  • a big highway to transport stuff

- allows you to ingest, process and analyze real-time streaming data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Kinesis Data Streams

A
  • real time streaming for ingesting data
  • you’re responsible for creating the consumer & scaling the stream
  • older than Firehose
  • a lot of overhead to configure
  • does not automatically scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Kinesis Data Firehose

A
  • Data transfer tool to get info to S3, Redshift, ElasticSearch, Splunk
  • Speed: w/i 60 seconds (near real time)
  • plug & play w/ AWS architecture
  • automatically scales
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Kinesis Data Analytics

A
  • paired with Data Firehose or Data Stream
  • lets you analyze data using SQL
  • easy, simple
  • no servers (fully managed)
  • pay per use
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How long can Kinesis store data?

A

up to one year

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When to use SQS over Kinesis?

A
  • slightly delayed message delivery
  • not much configuration needed
  • simple to use
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When to use Kinesis over SQS?

A
  • real time message delivery
  • complicated to configure
  • mostly used for big data applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the easiest way to process streaming data going thru Kinesis using SQL?

A

Kinesis Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Amazon Athena

A
  • makes it easy to analyze data in S3 using SQL
  • Serverless SQL
  • fully-managed
17
Q

Glue

A

a serverless data integration service that makes it seasy to discover, prepare and combine data
- ETL

18
Q

Amazon QuickSight

A
  • visualizing data using dashboards

- fully managed BI data visualization service

19
Q

ElasticSearch

A
  • a fully managed version of open source Elasticsearch
  • allows you to quickly search over stored data and analyze the data you get back
  • primarily used in ELK (ElasticSearch, Logstash, Kibana) stack
20
Q

Elasticsearch exam tip

A

If exam scenario wants a 3rd party logging solution, you can use ElasticSearch as part of the solution

21
Q

When exam wants to store GBs of data

A

RDS or Aurora

22
Q

When exam wants to store PBs/TBs of data

A

Redshift, S3