[Developer] Advanced Analytics Topics Flashcards

(18 cards)

1
Q

What does OLAP stand for?

A

Online Analytics Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does Columnar Data Storage optimize for?

A

read performance

It allows queries to only the necessary columns and gets better compression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When should a developer choose Amazon Redshift over traditional RDS databases?

A

For OLAP use-cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the two key performance technologies Redshift uses?

A

Massive Parallel Processing
and
Columnar Data Storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can you run standard SQL queries against data stored in S3?

A

Amazon Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is Amazon Athena serverless?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the primary format your data should be in for cost-effective querying with Athena?

A

Columnar Formats (like Parquet or ORC)

Athena only pays for the data it scans

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the best choice for developers needing a managed solution for full-text search, log aggregation, and real-time application monitoring?

A

Amazon OpenSearch Service

(previously called ElasticSearch)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the key difference between OpenSearch and a traditional database?

A

OpenSearch is a search and analytics engine designed for unstructured data, and focuses on fast flexible search over transactional ACID compliance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does EMR stand for?

A

Elastic MapReduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When should a developer use EMR instead of Redshift, Athena, or Lambda?

A

For a customized, complex big data framework (like Hadoop, Spark or Hive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Where is input data typically ingested for EMR?

A

S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Where is output data usually written to from EMR?

A

S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the serverless ETL service that Amazon Offers?

A

AWS Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What specialized component of AWS Glue is essential for services like Athena to understand the structure of data in S3?

A

The Glue Data Catalog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What AWS service simplifies setting up a secure data lake and centrally manages security and access permissions for data stored in S3?

A

AWS Lake Formation

17
Q

What is the key benefit of Lake Formation’s security model?

A

It provides fine-grained access control (down to the column/row level) for various analytics services (Athena, Redshift, EMR) reading from the S3 data lake.

18
Q

What is AWS’s Business Intelligence offering?

A

Amazon Quicksight