Messaging and Kinesis Flashcards
What is SQS?
Simple Queue Service
Q: How is Amazon SQS different from Amazon Simple Notification Service (SNS)?
Amazon SNS allows applications to send time-critical messages to multiple subscribers through a “push” mechanism, eliminating the need to check periodically or “poll” for updates. Amazon SQS is a message queue service used by distributed applications to exchange messages through a polling model and can be used to decouple sending and receiving components.
Does Amazon SQS provide message ordering?
Yes. FIFO (first-in-first-out) queues preserve the exact order in which messages are sent and received. If you use a FIFO queue, you don’t have to place sequencing information in your messages. For more information, see FIFO Queue Logic in the Amazon SQS Developer Guide.
Standard queues provide a loose-FIFO capability that attempts to preserve the order of messages. However, because standard queues are designed to be massively scalable using a highly distributed architecture, receiving messages in the exact order they are sent is not guaranteed.
Does Amazon SQS guarantee delivery of messages?
Standard queues provide at-least-once delivery, which means that each message is delivered at least once.
FIFO queues provide exactly-once processing, which means that each message is delivered once and remains available until a consumer processes it and deletes it. Duplicates are not introduced into the queue.
What is a visibility timeout?
The visibility timeout is a period of time during which Amazon SQS prevents other consuming components from receiving and processing a message.
when a message is polled by a consumer, it becomes invisible to other customers. E.g., if the message visibility timeout is 30 seconds, it means that message will not be visible to other consumers for 30 seconds after it is being polled. After 30 seconds, a message will be visible for other consumers to process.
If a consumer needs more time to process the message, it can call the change message visibility API to get more time. It will inform SQS to extend the visibility timeout and now and not make this visible to other consumers to process
How SQS can scale?
Consumer polling messages from the queue can be deployed on EC2 instances as part of ASG. Cloud watch can be configured to trigger an alarm if length of SQS reaches a certain threshold. The alarm ApproximateNumberOfMessages can be used for this purpose. The alarm can trigger an autoscale group to add more EC2 instances
how SQS can help in decoupling different application tiers?
let’s take an example of a video processing application. when A user submits the request for processing a video, the request can be taken in the form of SQS message. The SQS message it can be processed by the back-end processing application when it has the resources available. This is how application tiers can be decoupled.
what is a long poll?
the consumer can be configured a message to arrive if there is none in the queue. This helps in decreasing the number of API calls made to SQS while increasing the efficiency and latency of an application.
The wait time can be between one second to 20 seconds
what is SQS FIFO queue
FIFO queue ensures the ordering of messages in the queue. The messages are delivered to the consumer in the order they were received. It limits throughput to 300 messages/s without batching and 3000 messages with batching. Exactly once sound capability for the messages is implemented in FIFO.
How to use SQS as a buffer to database rights
If the application is writing a transaction in the database then the database has to be available for transactions. If something goes wrong in the database, the transaction may fail. SQS can be used in the middleware. An application can send a request in the form of messages to SQS (with infinitely scalable) and then the consumer poll the messages and tries to insert them into the database. This will ensure that all the requests or transactions get written to the database.
How to use SQS as a buffer to database rights
If the application is writing a transaction in the database then the database has to be available for transactions. If something goes wrong in the database, the transaction may fail. SQS can be used in the middleware. An application can send a request in the form of messages to SQS (with infinitely scalable), and then the consumer poll the messages and tries to insert them into the database. This will ensure that all the requests or transactions get written to the database.
What is Amazon SNS service?
Sns service is a pub sub-service in which even producer only send messages to 1SNS topic. Many subscribers may listen to those messages via subscribing to SNS topic. Each subscriber receives all the messages. They can be 1.2 million subscriber per topic
What kind of subscribers can be there for SNS?
Subscribers can be an email address, as email, mobile notification, HTTP endpoints.
What is SNS Fanout pattern?
The Fanout scenario is where a message published to an SNS topic is replicated and pushed to multiple endpoints, such as Kinesis Data Firehose delivery streams, Amazon SQS queues, HTTP(S) endpoints, and Lambda functions. This allows for parallel asynchronous processing.
How SNS message filtering work?
JSON policy can be used to filter the messages sent to SNS topic subscriptions. If a subscription does not have a filter policy it receives every message. For example: if an order has a state as new, it can be sent to the SQS queue for new orders. If the order is canceled, it can be sent to the SQS queue for canceled orders.
What is kinesis data stream
Amazon Kinesis Data Streams is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale. It is used to stream big data into your system.
Kenesis data stream is made of multiple shards and numbered (shard 1, shard 2, etc.). Shards need to be provisioned ahead of time. The incoming data split across all the shards. Shards define the injection and consumption rate of the data that is coming in.
The producer sends data into the kinesis data stream and the consumer can be manifold. The producer can be created by using AWS SDK or the kinesis producer library
The producer creates a record in the kinesis data stream. A record is made up of two things. First is the partition key, and the second is the data blob (which can be 1 MB in size). The partition key defines in which shard the record will go.
How consumer receives the data from the kinesis data stream?
The consumer can be an app, Lambda function, Kenesis data fire hose, or Kinesis Data Analytics. The consumer receives the record from the Kinesis data stream, and the record consists of a partition key, sequence number (which represents where the record was in the shard), and data. Now we have different consumption modes for Kinesis Data Streams. We have two megabytes per second of throughput shared for all the consumers per shard, Or you get two megabytes per second, per shard, per consumer if you are enabling the enhanced consumer mode, the enhanced fan-out.
What are the key properties of the kinesis data stream?
The retention limit is between one day to 365 days. It provides the ability to reprocess data. Once data is entered into Kinesis, it cannot be deleted. The data with the same partition id go to the same shard.
Producers: The producer can be AWS SDK or Kenesis producer library
Consumer: you can write your own consumer using the kinesis client library or AWS SDK. You can also use managed consumers like AWS Lambda, Kinesis data firehose, or kinesis data Analytics.
What are the different modes in the Kinesis data stream for receiving the data?
Partition Mode:
1. You choose the number of shards provisioned. It scales manually or using API.
2. Each shard gets one MB per second or 1000 records per second. In enhanced mode, each shard gets 2 MB per second. You pay per shard provisioned per hour.
On demand mode: No need to provide the capacity. Default capacity provisioned - 4mb/s. it is scales automatically based on observed throughput peak during the last 30 days. Pay per stream per hour and data in and out per GB
what is kinesis data firehose?
Amazon Kinesis Data Firehose is an extract, transform, and load (ETL) service that reliably captures, transforms and delivers streaming data to data lakes, data stores, and analytics services. Kinesis data firehose receives the record up to one MB in size and transforms the data with the help of Lambda functions, and writes it in multiple destinations.
It’s a fully managed service with no administration or automatic scaling and is serverless. You pay for the data that goes through the firehose for the stop.
It can send failed or all data to a backup S3
what are the AWS destinations for kinesis data firehose?
Amazon S3, Redshift, amazon ElasticSearch. For writing the data into RedShift first writes the data in S3 and then issues a copy command to copy the data into RedShift.
What are other AWS Kenisis Fire hose destinations?
3rd party partner destinations, AWS destinations, and custom destinations.
What’s the difference between kinesis data streams and kinesis data firehose?
kinesis data streaming service does ingest at scale. you can write custom code for producers and consumers. it’s real-time to 200 milliseconds. you can manage the scaling (Shard splitting/merging). data storage from one to 365 days could stop support replay capability.
Kinesis data firehose load streaming data into S 3 redshift, third-party tools, or custom STD. It can be used for data transformation as well using Lambda. it is fully managed, near real-time (look for this word in the exam), has Automatic scaling, no data storage, doesn’t support replay capabilities
How the partition keys are used in the kinesis data stream
The partition keys are used to decide which shard the data stream will go to. The data record contains the partition key and it’s used redirect the record to the appropriate shard.