1. Data Engineering Fundamentals Flashcards
(68 cards)
What is the primary purpose of Amazon S3?
Object storage
Which AWS service is used for data warehousing?
Amazon Redshift
True or False: AWS Glue is a fully managed ETL service.
True
What does ETL stand for?
Extract, Transform, Load
Which service is used for real-time data streaming in AWS?
Amazon Kinesis
What type of storage does Amazon EBS provide?
Block storage
Fill in the blank: AWS Lambda allows you to run code without _____ provisioning servers.
manually
Which AWS service provides data lake capabilities?
AWS Lake Formation
What is the main function of Amazon Athena?
Querying data in S3 using SQL
What is AWS CloudFormation used for?
Infrastructure as code
Which AWS service is primarily used for data analytics?
Amazon EMR
Fill in the blank: AWS Data Pipeline is used for _____ data processing workflows.
orchestrating
What does the AWS Well-Architected Framework help with?
Building secure, high-performing, resilient, and efficient infrastructure
Which service is best for batch data processing in AWS?
AWS Batch
What is the key benefit of using Amazon Redshift Spectrum?
Querying data directly from S3 without loading it into Redshift
Fill in the blank: AWS Glue Data Catalog is a _____ for metadata.
repository
What is the purpose of Amazon SageMaker?
Building, training, and deploying machine learning models
Which AWS service is used to manage and analyze large datasets?
Amazon EMR
True or False: AWS Step Functions enable you to coordinate multiple AWS services into serverless workflows.
True
Fill in the blank: Amazon Aurora is a _____ database service.
relational
Which AWS service is used for data migration?
AWS Database Migration Service
What is the main use case for AWS Glue Crawlers?
Discovering and cataloging data in S3
True or False: Amazon EMR can process data using Apache Spark.
True
What is the role of AWS DataBrew?
Visual data preparation for analytics