Rapid Fire Exam Questions Flashcards
(112 cards)
What is the ReplaceUnhealthy process used for in auto-scaling groups?
The ReplaceUnhealthy process is used to terminate/replace EC2 instances which have been marked as unhealthy during a health check performed by a load balancer or the EC2 service.
Processes in auto-scaling groups can be suspended/resumed at any time. This can be useful when performing maintenance on EC2 instances which are part of the ASG without triggering undesired actions.
What happens when an EC2 instance’s status is modified from InService to Standby?
The Standby status is mainly used for updating + troubleshooting EC2 instances which are part of an auto-scaling group. Instances which are on Standby are still part of the Auto Scaling group, but they do not actively handle load balancer traffic.
When you put an instance on Standby, you can either decrement the desired capacity through this operation, or keep it at the same value. If you choose to decrement the desired capacity of the Auto Scaling group, this prevents the launch of an instance to replace the one on Standby. If you choose not to decrement the desired capacity of the Auto Scaling group, Amazon EC2 Auto Scaling launches an instance to replace the one on Standby.
What is Kinesis Data Streams?
Amazon Kinesis Data Streams is a service which enables real-time processing of streaming big data. It provides ordering of records, as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications.
List 4 advantages/applications of Kinesis Data Streams.
1) Routing related records to the same record consumer (as in streaming MapReduce). For example, counting and aggregation are simpler when all records for a given key are routed to the same record processor.
2) Ordering of records. For example, you want to transfer log data from the application host to the processing/archival host while maintaining the order of log statements.
3) Ability for multiple applications to consume the same stream concurrently. For example, you have one application that updates a real-time dashboard and another that archives data to Amazon Redshift. You want both applications to consume data from the same stream concurrently and independently.
4) Ability to consume records in the same order a few hours later. For example, you have a billing application and an audit application that runs a few hours behind the billing application. Because Amazon Kinesis Data Streams stores data for up to 365 days, you can run the audit application up to 365 days behind the billing application.
What software tools can be used to create or retrieve records from a shard in a Kinesis Data Stream?
The Amazon Kinesis Producer Library (KPL) can be used for creating/delivering records to a particular shard in a data stream.
The Amazon Kinesis Client Library (KCL) can be used for retrieving records stored in a particular shard.
Both the KPL/KCL are high-level libraries built on top of the AWS SDK.
What are the min/max retention periods for records stored in a Kinesis Data Stream?
Between 1-365 days.
List 3 different AWS services which can be set as a shard consumer in a Kinesis Data Stream.
1) AWS Lambda
2) Kinesis Data Firehose
3) Kinesis Data Analytics
What is an Amazon S3 event notification? List 4 different AWS services which can be used as target destinations.
The Amazon S3 event notification feature enables AWS services to receive notifications when certain API calls are made and events are triggered in an S3 bucket (Ex: object creation). To enable notifications, you must first add a notification configuration which identifies the events you want Amazon S3 to publish and the destination where you want Amazon S3 to send the notifications. To send S3 event notifications from a single bucket to multiple destinations, a separate event notification must be configured for each destination.
Amazon S3 supports the following event destinations:
SNS Topics (not FIFO)
SQS Queues (not FIFO)
AWS Lambda Functions
Amazon EventBridge
-Note that each AWS service which receives S3 event notifications must have a resource policy attached allowing access from the S3 bucket.
What is object key name filtering and how is it used when configuring S3 event notifications?
Object key name filtering allows S3 event notifications to be configured which only send event notifications related to objects whose key names (prefix or suffix) match a particular filtering condition. Ex: only sending notifications originating from objects with a particular file extension (*.jpg).
Note that when configuring an S3 bucket to send event notifications to Amazon EventBridge, any/all events generated will be delivered to EventBridge. It is not possible to limit or filter which events are sent by either event type (Ex: S3:ObjectCreated) or using object key name filtering.
Describe the AWS Glue service.
AWS Glue is a managed service for performing extract, transform, and load (ETL) operations using a serverless architecture and is commonly used to transform data in preparation for data analytics.
Ex: a Glue job could involve loading data from an S3 bucket or RDS DB, transforming it using a Lambda function, then loading into a RedShift Data Warehouse.
Which AWS service can be used to convert data into the Apach Parquet or ORC file formats and why is this beneficial?
The AWS Glue service can be used to convert file formats (Ex: csv) into the ORC/Parquet formats. These are both columnar file formats for efficient data storage and retrieval. This is useful when employing AWS services such as Amazon Athena, which improves performance and saves costs by reducing the amount of data scanned during an SQL query.
Describe an AWS architecture which can be used to automatically trigger a Glue job after uploading a file to an S3 bucket.
One architecture could involve using S3 event notifications triggered on object creation events and attached to either a Lambda function or Amazon EventBridge. This in turn could be used to trigger a Glue job on the S3 object which might transform the file and push it to another destination.
What are Glue job bookmarks?
AWS Glue tracks data which has already been processed during a previous run of an ETL job by persisting information from the job run, known as a job bookmark. This helps AWS Glue maintain state information and prevent the reprocessing of old data.
With job bookmarks, you can process new data when rerunning on a scheduled interval. Ex: an ETL job might read only new partitions in an Amazon S3 file. AWS Glue tracks which partitions the job has processed successfully to prevent duplicate processing and duplicate data in the job’s target data store.
List 3 types of data sources which can be tracked using a Glue job bookmark.
Glue job bookmarks are implemented for: JDBC data sources, the Relationalize transform, and S3 buckets.
Describe the Amazon SageMaker service.
Amazon SageMaker is a managed service used to simplify the process of building and training machine-learning models for data scientists in a serverless fashion.
SageMaker can automate many common ML tasks, including: data labeling, ML model building, training, and deployment. This is all done using training data provided by the data scientist.
What AWS service should be used when you’d like to analyze data stored in an S3 bucket using serverless SQL?
Amazon Athena.
What are the advantages of launching EC2 instances using dedicated hardware?
Dedicated hosts and dedicated instances are EC2 purchasing options which are useful for companies which have strict regulatory/compliance requirements or software licenses which demand dedicated hardware. This can include legal requirements such as HIPPA which require dedicated infrastructure for storing patient information. EC2 instances launched using dedicated hardware do not share their physical resources with any other AWS accounts.
Dedicated purchasing options are also useful for software with complicated licensing models (BYOL - Bring Your Own License).
What are the differences between the dedicated host and dedicated instance options when launching an EC2 instance?
Dedicated Instances are Amazon EC2 instances which run on hardware dedicated to a single customer. Dedicated Instances may share hardware with other instances from the same AWS account that are not Dedicated Instances.
With Dedicated Hosts, the entire physical server is reserved for a single AWS account. It does not change, it’s always the same physical machine for as long as you are paying. As soon as you ‘allocate’ a Dedicated Host, you start paying for the entire host.
A host computer is very large. In fact, it is the size of the largest instance of the selected family, but can be divided-up into smaller instances of the same family. (“You can run any number of instances up to the core capacity associated with the host.”)
Any instances that run on that Host are not charged, since you are already being billed for the Host. That is why a Dedicated Host is more expensive than a Dedicated Instance – the charge is for the whole host.
What are the minimum and maximum retention periods for messages stored in an SQS queue? What is the default retention period?
The default retention period for messages stored in an SQS queue is 4 days. The min/max ranges are between 1 min. and 14 days.
What are the minimum and maximum sizes (in KB) allowed when submitting messages to an SQS queue?
1-256 KB.
How many messages can be stored simultaneously in an SQS message queue?
A single SQS message queue can contain an unlimited number of messages. However, there is a limit on the # of in-flight messages allowed for both standard and FIFO queues.
Messages are in-flight after they have been received from the queue by a consuming component, but have not yet been deleted from the queue. However, there is a limit of 120,000 messages for the number of in-flight messages for a standard queue and 20,000 messages for a FIFO queue.
An IAM user successfully creates a Route 53 CNAME record for a domain called ‘www.example.com’ but when trying to create a similar record for ‘example.com’, the request failed. Why is this?
‘example.com’ is an example of a second-level domain (SLD) also known as a Zone Apex. It is not possible to create CNAME records for either TLD’s or SLD’s.
What is a Route 53 hosted zone? What are the two types of hosted zones available in AWS?
A Route 53 hosted zone is a container for records which define how to route traffic for a particular domain and any of its subdomains. Hosted zones come in two varieties: public and private.
Public hosted zones contain records specifying how to route traffic over the internet (Ex: www.google.com). Public hosted zones connect public domain names (which must be purchased) to public IP addresses.
Private hosted zones instead contain records specifying how to route traffic within one or more VPC’s. Private hosted zones connect private domain names to private IP addresses.
What is the default TTL for records returned in Route 53 DNS queries?
300 seconds.