Data and Analytics Flashcards

1
Q

What is Amazon Athena?

A

A server-less query service used to analyse data stored in Amazon S3 with SQL queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are federated queries?

A

Queries that can be run across multiple data sources than just what is in S3, such as relational, non-relational, object and custom data sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is 1 method that can be used to increase the performance of Athena?

A

Partitioning
Using columnar data
Use larger files as these are easier to scan and retrieve for Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Amazon Redshift used for?

A

Data warehousing and analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Is Redshift columnar or row-based?

A

Columnar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What engine is Redshift based on?

A

PostgreSQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two snapshot modes of Redshift and what are the differences?

A

Automated and manual.
With automated, the snapshot is retained for a period that the user sets, whereas with manual the snapshot is kept until it is deleted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two node types within a Redshift cluster?

A

Leader and compute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Redshift Spectrum?

A

A service that allows the user to query data that is already in S3 without having to load it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the principal benefit of Redshift spectrum?

A

It allows the user to leverage a lot more computing power than they actually have provisioned and for the avoidance of having to actually load the S3 data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is OpenSearch?

A

A service that allows the user to search any field, including partial matches, of a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is EMR?

A

Elastic Map Reduce - a service that allows the user to create Hadoop clusters for big data analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does EMR scale?

A

Automatically, through the provisioning of additional clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the node types within an EMR cluster?

A

Master, core and task.
Master nodes manage the cluster and co-ordinate the other nodes. There is only 1 in a cluster.
Core nodes run tasks and store data.
Tasks nodes are optional and just run tasks but don’t store data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What service would be used to make ML-powered interactive dashboards?

A

QuickSight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When can QuickSight not use SPICE in-memory computation?

A

When it is attached to another database

17
Q

What granularity of security can you set in QuickSight?

A

Column-level security

18
Q

What does ETL stand for?

A

Extract, transform and load

19
Q

What is Glue?

A

A managed and server-less ETL service for analytics

20
Q

What are Glue Crawlers?

A

Scripts that crawl databases or data and write metadata to Glue Data Catalog, e.g. the type of data and its format

21
Q

What are Glue Job Bookmarks?

A

Bookmarks that show where a job was up to, preventing the re-processing of data

22
Q

What is Glue DataBrew?

A

A service that cleans and normalises data for analytics and ML without having to write code - many pre-written transformations

23
Q

What is Lake Formation and data lakes?

A

A data lake is a central place to keep all data of different types for analytics purposes.
Lake Formation is an AWS service that simplifies the process of creating a data lake through the automation of many complex processes.

24
Q

What level of granularity does Lake Formation have in terms of security?

A

Row/column level

25
What service would be used for real-time analytics using SQL?
Kinesis Data Analytics for SQL
26
Where can Kinesis Data Analytics for SQL read from?
Kinesis Data Streams and Kinesis Data Firehose
27
What is a benefit of using Kinesis Data Analytics for Apache Flink over for SQL?
Flink is more powerful with more advanced querying that just using SQL
28
What is Amazon MSK?
Managed Streaming for Apache Kafka (a data streaming alternative to Kinesis) - fully managed Kafka on AWS
29
Are Kinesis Data Streams' streams encrypted?
Yes, in-flight using TLS
30
When would KDA be used over Athena?
For scenarios when the analysis needs to happen before the data is written to storage