AWS Certified Developer Associate Flashcards
(258 cards)
Kinesis: What is it? can it be used for real-time operations? what two operations are used to write records into Kinesis and how do they work?
Kinesis - stream data operations
can be used for real-time applications
writing records to Kinesis:
PutRecord: writes a single record to the stream
PutRecords: writes multiple records to the stream in a batch. a single failure in one being written does not halt the entire operation.
Kinesis Data Firehose: what does it do? where does it store data?
Kinesis Data Firehose
captures streaming data
can excrypt, transform, batch, convert it to a columnar data format, or perform other operations on it before storing it
stores data into S3, RedShift, ElasticSearch, or Splunk
*sink types
Kinesis Data Analytics
Kinesis Data Analytics
allows you to run SQL queries on stream data
Kinesis data streams: what does it do? what is the kinesis agent? what is the kinesis producer library? how do you resolve a provisioned throughput capacity error in kinesis data streams? what is a partition key? what does it mean for the partition key if you are getting ProvitionedThroughputExceeded errors?
Kinesis Data Streams
collects huge amounts of streaming data in real time from website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events.
enables real-time dashboards, real-time anomaly detection, dynamic pricing, etc.
Kinesis Agent:
stand-alone Java software application that offers an easy way to collect and send data to Kinesis Data Streams.
Kinesis Producer Library (KPL):
The KPL is an easy-to-use, highly configurable library that helps you write to a Kinesis data stream
resolve a ProvitionedThroughputExceeded exception:
to resolve a ProvitionedThroughputExceeded exception, configure the producer to retry with exponential backoff and increase the number of shards within your data strams to provide enough capacity
partition key: used by Kinesis data streams to distribute data across shards. if you have ProvitionedThroughputExceeded errors, your partition key probably isn’t distributed enough
if you are well below you provisioned throughput capacity but still receiving
kinesis video steams
Kinesis Video Streams
enables you to stream video data from connected devices to aws
enables video playback (enables live and on-demand video playback), storage (storage, indexing, and encryption of video data), analytics/machine learning (take advantage of AWS Rekognition and other ML libraries).
Kinesis Adapter
Kinesis Adapter
recommended way to consume streams from DynamoDB for real-time processing
OpenSearch Service
OpenSearch Service
Analyze and monitor activity logs, data from aws services (cloudwatch logs, kinesis data streams, dynamodb), product usage data, CRM data, social media sentiments, and mobile app usage
Glue
Glue
Point glue at data you have stored in aws and glue will discover it and store the metadata, at which point it is ready to have queries run on it
AppSync
AppSync
handles all data-driven application management tasks (online and offline data access, data synchronization, and data manipulation across multiple data sources.)
uses graphQL
EventBridge
EventBridge
responds to event sources like ZenDesk or Shopify and forwards them to targets like Lambda or Saas applications
SNS: what is it? what are filter policies? what are sns topics? what are the types of sns topics and the differences between them?
SNS
fully managed pub/sub messaging service
can send real-time messages to services
filter policies:
by default subscribers receive all messages from publishers. filter policies can be placed on topics, which is a json policy that limits which messages the subscriber receives
topics:
logical access points for producer systems to send messages across to message consumer systems
can be a standard or FIFO topic (name/topic type can’t be changed once created)
you subscribe to a topic to receive messages from it
*configuring topics so that lambda functions can communicate with them and they can send messages to different people
SQS: what is it? what do the DeleteQueue, RemoveQueue, PurgeQueue, and RemovePermission api calls do? what are backlog per instance variables? what are the limits to the amount of messages that can be in an sqs queue? what are the differences between dead letter queues, FIFO queues, standard queues, and delay queues? what is the difference between long polling and short polling? what is the message size limit, and how can you send a larger message? what must you do to enable scaling of sqs? how is encryption acheived in sqs?
SQS
fully managed message queueing service
DeleteQueue, RemoveQueue, RemovePermission, PurgeQueue api calls:
DeleteQueue: Deletes the queue specified by the QueueUrl, regardless of the queue’s contents
RemovePermission: Revokes any permissions in the queue policy that matches the specified Label parameter
PurgeQueue: Deletes available messages in a queue (including in-flight messages) specified by the QueueURL parameter
CreateQueue: creates a new standard or FIFO queue (defaults to standard). can’t change the queue type after creating it. visibility timeout default is 30 seconds
backlog per instance variables
backlog per instance: used instead of ApproximateNumberOfMessagesVisible as a metric for an EC2 AG autoscaling metric
limit to the amount of messages in an SQS queue:
the message capacity in an sqs queue is unlimited
*delay queues, dead-letter queues (and when will SQS add a message to a dead-letter queue), FIFO queues, standard queues
Delay queues: let you postpone the delivery of new messages to consumers for a number of seconds. can delay from 0-15 minutes (DelaySeconds parameter). useful when a consumer needs additional time to process messages
Dead-Letter Queues: where other queues can send their messages after messages are processed unsuccessfully. useful for debugging
FIFO Queues: high (but limited) throughput, messages sent exactly once, and ordering is exact (single-lane highway)
Standard Queues: unlimited throughput, messages sent are delivered at least once, and ordering is best-effort (many-lane highway)
*long polling vs short polling
long polling: SQS returns the messages queried, or waits until messages appear in the queue if no messages are present (this is almost always preferable as is reduces calls to query the queue)
short polling: SQS returns the messages queries, or return automatically if no messages are present (this can easily exceed throughput)
*sending a message larger than 256kb
must use the sqs extended client for java
*know that SQS scales automatically and that nothing has to be done for scaling
SQS KMS: allows messages sent to SQS to be encrypted, with a key managed by AWS KMS
max message size: 256KB
step functions: what are they? what are the states? what are the two different workflows available?
Step Functions
coordinate the components of distributed applications and microservices using visual workflows
provides a graphical console to arrange and visualize the components of microservice application
automatically triggers and tracks each step, and retries when there are errors
logs each steps state for debugging
*know each type of state (success, fail, etc.)
pass: passes input to output without doing any work
task: represents a single unit of work performed by a state machine. uses an activity or aws service
choice: represents a branch in the state machine that may pass data to one of many choice states
wait: delays the state machine from continuing until a specified time
success: represents the successful execution of the step function
fail: represents the failed execution of the step function
parallel: can be used to evaluate separate branches of execution in parallel in the state machine
map: used to run a set of workflow steps on each item in a dataset. runs in parallel
*Standard Workflows vs Express Workflows and the uses cases for each
standard (default): ideal for long-running (up to one year), durable, and auditable workflows. can run for up to a year.
express: ideal for high-volume, event-processing workloads such as IoT data ingestion, streaming data processing and transformation, and mobile application backends
EC2: what is it?
EC2
VPS in the cloud
EC2 autoscaling: can AGs span regions?
where are new EC2 nodes launched when an AZ containing EC2 nodes in an autoscaling group becomes unhealthy? An Auto Scaling group has a maximum capacity of 3, a current capacity of 2, and a scaling policy that adds 3 instances. what is the outcome of the scaling policy? cloudwatch metric integration: what type of metrics exist for AGs?
Autoscaling
can autoscaling groups span regions?:
AGs can span AZs, but not regions
where are new EC2 nodes launched when an AZ containing EC2 nodes in an autoscaling group becomes unhealthy?
Autoscaling attempts to use instance distribution to spread instances in an AG group as far across AZs as it can
if an autoscaling group is provisioned to spread across 3 AZs and 2 instances are added, it will provision those instances in 2 of the 3 AZs
An Auto Scaling group has a maximum capacity of 3, a current capacity of 2, and a scaling policy that adds 3 instances. what is the outcome of the scaling policy?
1 instance is added
cloudwatch metric integration
cloudwatch metrics exist for AG groups including GroupMinSize, GroupMaxSize, GroupTotalInstances, GroupPendingCapacity, WarmPoolPendingCapacity, etc.
EC2: what are the instance types that exist?
instance types (on-demand, dedicated hosts, dedicated instances, spot instances)
on demand: Pay, by the second, for the instances that you launch.
savings plans: reduce costs by making a usage agreement for 1 or 3 year periods
reserved instances: make a commitment to a consistent instance configuration, including instance type and Region, for a term of 1 or 3 years.
zonal reserved instances: reserved instances specific to a certain availability zone. eligable for reserved instance discounts and a capacity reservation
regional reserved instances: a reserved instance for a specific region
spot instances: Request unused EC2 instances, which can reduce your Amazon EC2 costs significantly
dedicated hosts: Pay for a physical host that is fully dedicated to running your instances, and bring your existing per-socket, per-core, or per-VM software licenses to reduce costs.
dedicated instances: Pay, by the hour, for instances that run on single-tenant hardware
EC2: user data: what is it? what types are there?
details on EC2 user data
perform common automated configuration tasks and even run scripts after the instance starts.
add users, groups, install/update packages, start/stop systemd services, create simple web pages, modify file ownership/permissions
types…
shell scripts:
cloud-init directives:
EC2: security groups: what are they? are they stateful or stateless?
control inbound/outbound traffic to an ec2 instance
stateful: if a requests is allowed into a security group, then the response generated from that request is allowed out of the security group regardless of any potential outbound rules on the group.
EC2: do instance key pairs need to be created by the root user? how are you charged for reserved instances based on how much you use them? how do you import the same ssh key into multiple regions? EC2 T family: how is someone charged for using 35 seconds of a burstable instance over the course of a month?
know that EC2 instance key pairs do not need to be created by a root user
how is someone charged for using 35 seconds of a burstable instance over the course of a month?
burstable instances:
T instance family. reduced CPU
provides a baseline CPU performance with the ability to burst above the baseline at any time for as long as required
you can use T2.micro burstable instances for free within certain usage parameters if your account is less than 12 months old
how are you charged for reserved instances based on how much you use them?
you are charged for reserved instances independently of usage
are can apply reserved instance billing to an instance, but running instances concurrently will only apply said billing benefit to a single of the many concurrently running instances, the others will run with on-demand pricing
reserved instance benefits are applied to a maximum of 3600 seconds per clock hour
importing the same ssh key into multiple regions
generate a public ssh key (.pub) from a private ssh key (.pem), then, select the aws region you want to import it into and import it
*max IOPS for general purpose EC2 instance ssd volume
*vallid GiB size and 15000 IOPS configuration for Provisioned IOPS SSD (io1) volume
EC2: what is elastic IP?
Elastic IP Address:
static, region specific IP address
allocated to a specific resource (EC2 instance, network interface, etc.)
Elastic Beanstalk: what is it?
Elastic Beanstalk
tool for deploying and scaling web application
for apps written in Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on Apache, Nginx, Passenger, etc. servers
upload application code, and Elastic Beanstalk handles deployment, capacity provisioning, load balancing, auto scaling and application health monitoring
Elastic Beanstalk: what are the deployment methods?
*what deployment methods are available?
All at once – Deploy the new version to all instances simultaneously. All instances in your environment are out of service for a short time while the deployment occurs.
Rolling – Deploy the new version in batches. Each batch is taken out of service during the deployment phase, reducing your environment’s capacity by the number of instances in a batch.
NOTE: in rolling deployment, instances are not actually lost, just divided into groups and deployed in batches
Rolling with additional batch – Deploy the new version in batches, but first launch a new batch of instances to ensure full capacity during the deployment process.
Immutable – Deploy the new version to a fresh group of instances by performing an immutable update. EBS creates an autoscaling group behind your load balancer and creates a single instance with the new application in the group.
once the new instance passes the health checks, more instances are added to the new AG until the amount of applications in the new AG equal that of the original AG. once these instances pass health checks,
the new instances are transfered to the old AG. the old instances and temporary AG are terminated
Traffic splitting – Deploy the new version to a fresh group of instances and temporarily split incoming client traffic between the existing application version and the new one.
blue/green - create a new environment (a duplicate of the old one) and change the CNAMES of the environments, whiching traffic at the load balancer to redirect traffic instantly
linear deployment - traffic is shifted in equal increments with an equal number of minutes between each increment
canary deployment - the new version of an application is deployed and traffic is randomly directed between the old and new version, according to a preconfigured ratio. this continues until confidence is gained in the new application and traffic is shifted completely over
Elastic Beanstalk: what is .ebextensions? what is the naming convention of files under .ebextensions? what will happen to resources created as part of your .ebextensions if the environment is terminated?
what is .ebextensions/? what happens to applications there when the environment is deleted? what is the config file naming convension in ebextensions? (.ebextension/.config)
directory for EBS econfiguration
files in this directory follow the naming convention .ebextension/*.config
Any resources created as part of your .ebextensions is part of your Elastic Beanstalk template and will get deleted if the environment is terminated.
Elastic Beanstalk: what will happen to instances that failed to deploy correctly after being manually terminated?
*status of instances of an application that failed to deploy correctly after being manually terminated
elastic beanstalk will replace the instances with instances running the application version of the most recent successful deployment