Decoupling Workflows Flashcards
What is SQS?
Simple Queue Service (SQS) is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications.
- Allows asynchronous processing of work. One resource will write a message to an SQS queue, and then another resource will retrieve that message from SQS.
- At least once delivery for each message in the queue.
- Supports resource policies.
What is SNS?
Simple Notification Service (SNS) is a fully managed messaging service for
both application-to-application (A2A) and application-to-person (A2P) communication.
- is a push-based messaging service. It will proactively deliver
messages to the endpoints subscribed to it. This can be used to alert a system or a person. - Delivery Retries - Reliable Delivery.
- Cross Account via TOPIC POLICY.
- Supports cross region replication.
What is API Gateway?
API Gateway is a fully managed service that makes it easy for developers to
create, publish, maintain, monitor, and secure APIs at any scale.
What are the SQS Settings?
- Delivery Delay: Default is 0; can be set up to 15 minutes.
- Message Size: Messages can be up to 256 KB of text in any format.
- Encryption: Messages are encrypted in transit by default, but you can add at-rest.
- Message Retention: Default is 4 days; can be set between 1 minute and 14 days.
- Long vs. Short: Long polling isn’t the default, but it should be.
- Queue Depth: This can be a trigger for autoscaling.
What is the difference between long and short polling in SQS?
- Short polling returns a response immediately, even if the queue is empty. This means that if there are no messages in the queue, the consumer will receive an empty response and will need to poll again.
- Long polling waits until a message arrives in the queue before returning a response(max 20 seconds). This means that the consumer will not receive an empty response, even if there are no messages in the queue at the time of the poll.
With batching, 1 request = (0)1-10 messages up to 64KB total.
What is a Dead Letter Queue?
- A dead letter queue (DLQ) is used to hold messages that were not successfully processed. These messages might have failed due to errors, invalid data, or other issues. The purpose of a dead letter queue is to provide a way to review and troubleshoot these problematic messages.
- When RecieceCount > maxRecieveCount and the message is not deleted, it is moved to a DLQ.
- The retention period OF DLQ should be longer than other queues because the enqueue timestamp is unchanged when a message enters a DLQ(it keeps the old timestamp).
What are FIFO Queues?
- FIFO queues do not have the same level of performance.
- You can order messages with SQS standard, but it’s on you to do it.
- Message Group ID, ensures messages are processed one by one.
- It costs more since AWS must spend computing power to deduplicate messages.
- Exactly once delivery for each message in the queue.
What topic types are supported in SNS?
FIFO or Standard:
- FIFO only supports SQS as a subscriber
- Standard supports: Kinesis Data Firehose, SQS,
Lambda, email, HTTP(S), SMS, platform application endpoint.
- There is also DLQ Support.
API Gateway features.
- Security: This service allows you to easily protect your endpoints by attaching a web application firewall (WAF).
-
Stop Abuse: Users can easily implement DDoS protection and rate
limiting to curb abuse of their endpoints. - Ease of Use: API Gateway is simple to get started with. Easily build out the calls that will kick off other AWS services in your account.
What are the components of AWS Batch?
-
Jobs: Units of work that are submitted to AWS Batch (e.g.,
shell scripts, executables, and Docker images). -
Job Definitions: Specify how your jobs are to be run(essentially, the
blueprint for the resources in the job). - Job Queues: Jobs get submitted to specific queues and reside there until scheduled to run in a compute environment.
-
Compute Environment: Set of managed or unmanaged compute
resources used to run your jobs.
Fargate or EC2 Compute Environments for AWS Batch?
Fargate is the recommended way of launching most batch jobs.
Fargate or EC2 Compute Environments.
Sometimes, EC2 is the best choice!
- Custom AMIs can only be ran via EC2
- Anything needing more than four vCPUs needs to use EC2.
- EC2 is recommended for anything needing more than 30 GiB of memory.
- If your jobs require a GPU, then it must be on EC2! Arm-based
Graviton CPU can only be leveraged via EC2 for AWS Batch. - When using linuxParameters parameters, you must run on EC2 compute.
- For a large number of jobs, it’s best to run on EC2. Dispatched at
a higher rate (more concurrency) than Fargate!
AWS Batch or AWS Lambda?
- AWS Lambda currently has a 15-minute execution time limit. Batch does not have this.
- AWS Lambda has limited disk space, and EFS requires functions live within a VPC.
- Lambda is fully serverless, but it has natively limited runtimes! Batch uses Docker, so any runtime can be used
-
What is Amazon MQ?
- Message broker service allowing easier migration of existing applications to the AWS Cloud.
- Leverages multiple programming languages, operating systems, and messaging protocols.
- Currently supports both Apache ActiveMQ orRabbitMQ engine types.
- Allows you to easily leverage existing apps without managing and maintaining your own system.
SNS with SQSvs.Amazon MQ
- Each offers architectures with topics and queues. Allows for one-to-one
or one-to-many messaging designs. - If migrating existing applications with messaging systems in place, you
likely want to consider Amazon MQ. - If creating new applications, look at SNS and SQS simpler to use, highly
scalable, and simple APIs. Good fit for most new use cases! - Amazon MQ REQUIRES private networking like VPC, Direct Connect, or VPN. SNS and SQS are publicly accessible by default.
What are step functions?
- Comes with a graphical console for easier application workflow views and flows.
- Main components are state machines and tasks.
- Specific states within a workflow (state machine) representing a single unit of work
- Every single step within a workflow is considered a state
- Standard workflow(default) has maximum duration 1 year. Express workflow is for high IO with 5 minutes max duration.
What are the 2 types of workflow that AWS Step Functions support?
Each workflow has executions.
Executions are instances where you run your workflows in order to perform your tasks.
STANDARD
- Have an exactly-once execution
- Can run for up to one year
- Useful for long-running workflows that need to have an auditable history
- Rates up to 2,000 executions per second
- Pricing based per state transition
EXPRESS
- At-least-once workflow execution
- Can run for up to five minutes
- Useful for high-event-rate workloads Example use is IoT data streaming and ingestion
- Pricing based on number of executions, durations, and memory consume
- Think about anonline pickup order: Each step in that workflow is considered a state.
What are the different states of step functions?
- Pass: Passes any input directly to its output — no work done
- Task: Single unit of work performed (e.g., Lambda, Batch, and SNS
- Choice: Adds branching logic to state machines
- Wait: Creates a specified time delay within the state machine
- Succeed: Stops executions successfully
- Fail: Stops executions and marks them as failures
- Parallel: Runs parallel branches of executions within state machines
- Map: Runs a set of steps based on elements of an input array
What Is AppFlow?
- Fully managed integration service for exchanging data between SaaS apps and AWS services
- Pulls data records from third-party SaaS vendors and stores them in
Amazon S3 - Bi-directional data transfers with limited combinations
- Can Run on-demand Run on event Run on schedule.
What Is Redshift?
Redshift is a fully managed, petabyte-scale data warehouse service in the cloud. It’s a very large relational database traditionally used in big data applications.
- It can hold up to 16 PB of data.
What is EMR?
EMR (Elastic Map Reduce) is a managed big data platform that allows you to process vast amounts of data using open-source tools, such as Spark, Hive, HBase, Flink, Hudi, and Presto.
- It is AWS’s ETL tool.
- It’s an Open-Source Cluster. EMR is a managed fleet of EC2 instances running open-source tools.
What is ETL?
Extract Transform Load.
What Is Kinesis Data Streams?
Kinesis Data Streams allow to ingest, process, and analyze real-time(200ms) streaming data(ingestion of data). You can think of it as a huge data highway connected to your AWS account. Great for analytics and dashboards.
- Streams store a 24-hour moving window of data that can be increased to a maximum of 365 days at an additional cost.
- Supports multiple producers and consumers(you must configure the consumer). Consumers can access the data in different ways from the moving window(per second, per hour, etc)
- To improve the performance of a kinesis stream, the Stream Shards need to be changed.
What Is Kinesis Data Firehose?
- Data transfer tool to get Kinesis Data Streams to S3(or direct data from the producers), into Redshift(uses s3 as intermediate), Elasticsearch, or Splunk(or HTTP meaning to 3rd party applications).
- Offers persistence above the moving window of Data Streams.
- Always Near real-time(within 60 seconds). Even when the producers are directly connected to Firehose(and not to the Streams).
- Supports transformation of the data on the fly(lambda).
What Is Athena?
Athena is an interactive query service that makes it easy to analyze data in S3 using SQL. This allows you to directly query data in your S3 bucket without loading it into a database (schema on read).
- It supports all AWS logs(CLoudTrail, VPC flow logs, ELB logs, cost reports etc)
- AWS Glue Data Catalog & Web Server logs
- Athena Federated Query supports also other sources than s3(new feature uses lambda)