Section 22: Data & Analytics Flashcards

(68 cards)

1
Q

What is AWS Athena?

A

AWS Athena is an interactive query service that allows you to analyze data directly in Amazon S3 using standard SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or False: AWS Athena is a serverless service.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Fill in the blank: AWS Athena charges you based on the amount of ______ processed by your queries.

A

data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What types of data formats does AWS Athena support?

A

Athena supports formats such as CSV, JSON, ORC, Parquet, and Avro.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does AWS Athena handle security?

A

Athena uses AWS Identity and Access Management (IAM) for access control and integrates with AWS Key Management Service (KMS) for data encryption.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the maximum query result size in AWS Athena?

A

The maximum query result size in AWS Athena is 30 MB.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which AWS service is often used in conjunction with Athena for data cataloging?

A

AWS Glue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True or False: You need to provision servers to use AWS Athena.

A

False.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the primary use case for AWS Athena?

A

Athena is primarily used for querying large datasets stored in Amazon S3 without the need for data loading.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What SQL dialect does AWS Athena use?

A

Athena uses Presto SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can AWS Athena query data stored in formats like Parquet and ORC?

A

Yes, Athena can query data stored in Parquet and ORC formats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two main components of AWS Athena?

A

The two main components are the query engine and the data catalog.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Fill in the blank: AWS Athena can be accessed via the ______ console, the AWS CLI, and the AWS SDKs.

A

AWS Management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does AWS Athena integrate with Amazon QuickSight?

A

Athena can be used as a data source for Amazon QuickSight to visualize data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the role of AWS Glue Data Catalog in relation to AWS Athena?

A

AWS Glue Data Catalog serves as a central repository to store metadata for the data queried in Athena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Can you use AWS Athena to join data from multiple S3 buckets?

A

Yes, you can join data from multiple S3 buckets in AWS Athena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

True or False: AWS Athena supports partitioned tables.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the benefit of using partitioning in AWS Athena?

A

Partitioning improves query performance by reducing the amount of data scanned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the default retention period for query results in AWS Athena?

A

The default retention period for query results in AWS Athena is 45 days.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Which AWS service can be used to schedule queries in AWS Athena?

A

AWS Lambda can be used to schedule queries in AWS Athena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Fill in the blank: AWS Athena can be used to analyze data in _____ time.

A

real

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What type of queries can be run on AWS Athena?

A

You can run ad-hoc queries, complex queries, and analytical queries on AWS Athena.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or False: AWS Athena allows you to create views.

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In Athena what type of data can save costs and improve performance

A

columnar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Related to file sizes, what files sizes should you use in Athena to increase performance?
128mb or greater
26
Athena functionality to use to run SQL queries across other datasources
Federated Query
27
What is AWS Redshift?
AWS Redshift is a fully managed, petabyte-scale data warehouse service in the cloud.
28
True or False: AWS Redshift is designed for real-time data analytics.
False: AWS Redshift is optimized for complex queries and large-scale data analysis rather than real-time analytics.
29
What type of storage does AWS Redshift use?
AWS Redshift uses columnar storage, which organizes data by columns rather than rows.
30
Fill in the blank: AWS Redshift is based on __________ technology.
PostgreSQL
31
Which of the following is a key feature of AWS Redshift? A) Real-time streaming B) Columnar storage C) Unmanaged services D) On-premises deployment
B) Columnar storage
32
What is the maximum size of a single Redshift cluster?
The maximum size of a single Redshift cluster can be up to 128 petabytes.
33
What is the purpose of the Redshift Spectrum feature?
Redshift Spectrum allows you to run queries against data stored in S3 without loading it into Redshift.
34
True or False: AWS Redshift can automatically scale based on workload.
False: While you can manually resize Redshift clusters, automatic scaling is not a feature.
35
What is the primary benefit of using AWS Redshift for data warehousing?
The primary benefit is its ability to handle large datasets and perform complex queries quickly and efficiently.
36
What is a Redshift cluster?
A Redshift cluster is a set of nodes that work together to store and analyze data.
37
Fill in the blank: The main component of a Redshift cluster that manages the database is called the __________.
leader node
38
What is the purpose of the distribution style in Redshift?
Distribution style determines how data is distributed across the nodes in a cluster to optimize query performance.
39
Which SQL type does AWS Redshift support?
AWS Redshift supports a variant of SQL called Redshift SQL, which is based on PostgreSQL.
40
What is the function of the COPY command in Redshift?
The COPY command is used to load data from S3, DynamoDB, or other data sources into Redshift tables.
41
True or False: AWS Redshift provides built-in machine learning capabilities.
True: AWS Redshift has integrated machine learning features that allow users to create and run models directly within the data warehouse.
42
Is redshift OLAP or OLTP
OLAP: Online analytical processing
43
Athena or Redshift for faster queries?
Redshift
44
How does redshift snapshots provide disaster recovery
They can be copied to other regions
45
3 ways to load data into Redshift
Kinesis Firehose S3 copy command EC2 instance using JDBC
46
OpenSearch allows you you to query these types of matches
Partial
47
OpenSearch security
Cognito IAM KMS TLS
48
What does EMR stand for?
Elastic MapReduce
49
What is the main function of Amazon EMR?
It helps create Hadoop clusters to analyze and process vast amounts of data.
50
What type of AWS service is Amazon EMR?
A managed big data platform.
51
What AWS resource does an EMR cluster use to scale?
EC2 instances (can be hundreds of them).
52
Name some frameworks bundled with Amazon EMR.
Apache Spark, HBase, Presto, Flink
53
What does Amazon EMR handle automatically during cluster setup?
Provisioning and configuration.
54
What feature allows EMR to adjust resources based on workload?
Auto-scaling.
55
Can Amazon EMR use Spot Instances?
Yes, it's integrated with Spot Instances for cost savings.
56
What are common use cases for Amazon EMR?
Data processing, machine learning, web indexing, and big data workloads.
57
What is the role of the Master Node in EMR?
Manages the cluster, coordinates tasks, and monitors health.
58
What does the Core Node do in EMR?
Runs tasks and stores data (long-running).
59
What is the purpose of the Task Node in EMR?
Only runs tasks and is often configured as a Spot Instance.
60
What are the three purchasing options for EMR instances?
On-demand, Reserved, and Spot Instances.
61
What are the characteristics of On-Demand instances in EMR?
Reliable, predictable, and won't be terminated.
62
What are the benefits of Reserved Instances in EMR?
Cost savings and automatically used by EMR if available (minimum 1-year commitment).
63
What is the trade-off when using Spot Instances in EMR?
They are cheaper but can be terminated and are less reliable.
64
What are the two types of EMR clusters based on duration?
Long-running clusters and transient (temporary) clusters.
65
Serverless service to use to create business intelligence focused interactive dashboards
QuickSight
66
Related to QuickSight, what is spice?
In-Memory computation
67
With Glue, what prevents re-processing old data?
Glue Job Bookmarks
68
Glue functionality used to clean and normalize data using pre-built transformation
Glue DataBrew