1. Data Engineering Fundamentals Flashcards

(68 cards)

1
Q

What is the primary purpose of Amazon S3?

A

Object storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which AWS service is used for data warehousing?

A

Amazon Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False: AWS Glue is a fully managed ETL service.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does ETL stand for?

A

Extract, Transform, Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which service is used for real-time data streaming in AWS?

A

Amazon Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of storage does Amazon EBS provide?

A

Block storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fill in the blank: AWS Lambda allows you to run code without _____ provisioning servers.

A

manually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which AWS service provides data lake capabilities?

A

AWS Lake Formation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the main function of Amazon Athena?

A

Querying data in S3 using SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is AWS CloudFormation used for?

A

Infrastructure as code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which AWS service is primarily used for data analytics?

A

Amazon EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fill in the blank: AWS Data Pipeline is used for _____ data processing workflows.

A

orchestrating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the AWS Well-Architected Framework help with?

A

Building secure, high-performing, resilient, and efficient infrastructure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which service is best for batch data processing in AWS?

A

AWS Batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the key benefit of using Amazon Redshift Spectrum?

A

Querying data directly from S3 without loading it into Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Fill in the blank: AWS Glue Data Catalog is a _____ for metadata.

A

repository

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of Amazon SageMaker?

A

Building, training, and deploying machine learning models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which AWS service is used to manage and analyze large datasets?

A

Amazon EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

True or False: AWS Step Functions enable you to coordinate multiple AWS services into serverless workflows.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Fill in the blank: Amazon Aurora is a _____ database service.

A

relational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which AWS service is used for data migration?

A

AWS Database Migration Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the main use case for AWS Glue Crawlers?

A

Discovering and cataloging data in S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or False: Amazon EMR can process data using Apache Spark.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the role of AWS DataBrew?

A

Visual data preparation for analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Fill in the blank: AWS Lake Formation simplifies the process of building a _____ data lake.
secure
26
What is Amazon Kinesis Data Firehose used for?
Loading streaming data into data lakes and warehouses
27
True or False: AWS Glue is not a serverless service.
False
28
What is the purpose of Amazon Timestream?
Time series database service
29
Fill in the blank: Amazon OpenSearch Service is used for _____ and analytics.
search
30
Which service provides a fully managed Elasticsearch solution?
Amazon OpenSearch Service
31
What is the main benefit of using Amazon S3 Select?
Retrieving a subset of data from S3 objects
32
True or False: Amazon S3 is designed for high durability.
True
33
What is AWS Data Exchange used for?
Finding, subscribing to, and using third-party data
34
Fill in the blank: AWS Glue ETL jobs can be triggered on a _____ schedule.
defined
35
What is the purpose of Amazon Comprehend?
Natural language processing service
36
Which AWS service is used for data transformation?
AWS Glue
37
True or False: Amazon Redshift can scale automatically.
False
38
What is the primary use case for Amazon SageMaker Ground Truth?
Building high-quality training datasets
39
Fill in the blank: AWS Data Pipeline supports _____ data processing.
complex
40
What is Amazon Managed Streaming for Apache Kafka (MSK)?
A fully managed service for Apache Kafka
41
True or False: AWS Glue supports Python and Scala for ETL jobs.
True
42
What is the main benefit of using Amazon Redshift RA3 nodes?
Separate compute and storage scaling
43
Fill in the blank: Amazon Kinesis Data Streams is designed for _____ data processing.
real-time
44
What is the purpose of AWS Glue Schema Registry?
Managing schemas for streaming applications
45
True or False: AWS Data Wrangler is a Python library for data processing.
True
46
What is the primary function of AWS Snowball?
Data transfer appliance
47
Fill in the blank: Amazon QuickSight provides _____ analytics capabilities.
business intelligence
48
What is the purpose of AWS CodePipeline?
Continuous integration and delivery service
49
True or False: AWS Data Pipeline can automate data movement.
True
50
What is the main use of Amazon Rekognition?
Image and video analysis
51
What is the purpose of AWS Glue Jobs?
Running ETL operations
52
True or False: Amazon EMR is priced based on the resources used.
True
53
What is the role of AWS CloudTrail?
Logging AWS account activity
54
Fill in the blank: Amazon Redshift uses _____ to store data.
columnar storage
55
What does the AWS SDK allow developers to do?
Interact with AWS services programmatically
56
True or False: AWS Glue can handle both batch and streaming data.
True
57
What is the primary use case for Amazon EMR Notebooks?
Interactive data analysis and visualization
58
Fill in the blank: Amazon Kinesis enables the processing of _____ data streams.
real-time
59
What is the main benefit of using AWS CloudWatch?
Monitoring AWS resources and applications
60
True or False: AWS Glue can automatically generate ETL code.
True
61
What is the purpose of Amazon Personalize?
Building real-time recommendation systems
62
Fill in the blank: AWS Lake Formation helps manage _____ in a data lake.
permissions
63
What is the main function of Amazon Athena?
Ad-hoc querying of data stored in S3
64
True or False: AWS Glue can connect to various data sources.
True
65
What does the AWS Data Pipeline service allow you to do?
Orchestrate data workflows
66
Fill in the blank: Amazon Kinesis Data Analytics enables you to analyze _____ data.
streaming
67
What is the main advantage of using Amazon RDS?
Automated backups and scaling
68
True or False: AWS Glue supports both serverless and provisioned resources.
True