7. Analytics Flashcards

(65 cards)

1
Q

What does AWS stand for?

A

Amazon Web Services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the primary purpose of AWS Glue?

A

To prepare and transform data for analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False: Amazon Redshift is a data warehouse service.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which AWS service is used for real-time data streaming?

A

Amazon Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of data model does Amazon DynamoDB use?

A

NoSQL database model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fill in the blank: AWS _____ is used for data lake storage.

A

S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the purpose of Amazon QuickSight?

A

To create visualizations and business intelligence dashboards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which service would you use to perform ETL operations in AWS?

A

AWS Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the maximum size of an object that can be stored in Amazon S3?

A

5 TB per object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True or False: Amazon Athena allows you to run SQL queries on data stored in S3.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Amazon EMR primarily used for?

A

Processing large amounts of data using Apache Hadoop and Spark.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which AWS service provides a managed Apache Kafka service?

A

Amazon MSK (Managed Streaming for Kafka)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the term ‘data lake’ refer to?

A

A centralized repository that allows you to store all your structured and unstructured data at any scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fill in the blank: AWS _____ is a serverless data integration service.

A

Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Amazon Redshift Spectrum allow you to do?

A

Query data directly in S3 without loading it into Redshift.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which service provides a fully managed data warehouse solution?

A

Amazon Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

True or False: AWS Data Pipeline is used for data orchestration.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the primary function of Amazon RDS?

A

To provide a managed relational database service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which service is best suited for storing time-series data?

A

Amazon Timestream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the benefit of using Amazon Aurora?

A

It offers high performance and availability for relational databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Fill in the blank: AWS _____ is used to visualize data and create dashboards.

A

QuickSight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which service is designed for batch processing of data?

A

AWS Batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does the term ‘data wrangling’ mean?

A

The process of cleaning and transforming raw data into a usable format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which AWS service allows for serverless data analytics?

A

Amazon Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
True or False: Amazon S3 is a block storage service.
False
26
What is the purpose of AWS Lake Formation?
To simplify the process of building and managing data lakes.
27
Which AWS service is used for data cataloging?
AWS Glue Data Catalog
28
What is the main benefit of using Amazon SageMaker?
To build, train, and deploy machine learning models at scale.
29
Fill in the blank: Amazon _____ is used for sending and receiving messages between distributed systems.
SQS (Simple Queue Service)
30
What does the term 'OLAP' stand for?
Online Analytical Processing
31
Which AWS service is primarily used for data archiving?
Amazon S3 Glacier
32
True or False: Amazon Kinesis Data Firehose can transform data before loading it into storage.
True
33
What is the purpose of Amazon CloudWatch in data engineering?
To monitor and manage AWS resources and applications.
34
Which service would you use to create a scalable data processing pipeline?
AWS Data Pipeline
35
What is the primary use case for Amazon Elasticsearch Service?
Real-time search and analytics on large datasets.
36
Fill in the blank: AWS _____ provides a managed service for data warehousing.
Redshift
37
Which AWS service allows you to run code in response to events without provisioning servers?
AWS Lambda
38
True or False: Amazon DynamoDB is a relational database.
False
39
What is the main advantage of using a NoSQL database like DynamoDB?
Scalability and flexibility in handling unstructured data.
40
What does the term 'data fidelity' refer to?
The accuracy and precision of data.
41
Which service would you use for batch data processing with Apache Spark?
Amazon EMR
42
Fill in the blank: AWS _____ is a fully managed data integration service.
Glue
43
What is the purpose of Amazon Comprehend?
To analyze text and extract insights using natural language processing.
44
True or False: Amazon Athena charges based on the amount of data scanned per query.
True
45
Which AWS service allows for the creation of serverless data lakes?
AWS Lake Formation
46
What is the main function of AWS Step Functions?
To coordinate components of distributed applications and microservices.
47
Fill in the blank: Amazon _____ is used for data visualization and reporting.
QuickSight
48
What is the primary use of Amazon SageMaker Data Wrangler?
To simplify data preparation for machine learning.
49
True or False: AWS Glue can automatically generate ETL code.
True
50
Which service would you use to send notifications based on AWS events?
Amazon SNS (Simple Notification Service)
51
What does the term 'data governance' refer to?
The management of data availability, usability, integrity, and security.
52
Fill in the blank: Amazon _____ is used for scalable and durable object storage.
S3
53
What is the main role of a data engineer?
To design, build, and maintain data processing systems.
54
True or False: Amazon Timestream is optimized for storing relational data.
False
55
Which AWS service is best for running SQL queries against large data sets stored in S3?
Amazon Athena
56
What is the primary benefit of using Amazon Redshift for analytics?
It allows for complex queries on large datasets with high performance.
57
Fill in the blank: AWS _____ provides a fully managed NoSQL database.
DynamoDB
58
What does the term 'ETL' stand for?
Extract, Transform, Load
59
Which AWS service allows you to run machine learning models in real time?
Amazon SageMaker
60
True or False: AWS Glue can only work with data stored in S3.
False
61
What is the primary function of AWS Data Wrangler?
To simplify the process of working with data in Pandas and AWS.
62
Which AWS service is designed for running distributed data processing jobs?
Amazon EMR
63
Fill in the blank: Amazon _____ is a managed service for Apache Kafka.
MSK (Managed Streaming for Kafka)
64
What is the main purpose of Amazon Kinesis Data Streams?
To collect and process real-time streaming data.
65
True or False: Amazon Redshift is not suitable for real-time analytics.
True