Messaging and Kinesis Flashcards

1
Q

What is SQS?

A

Simple Queue Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Q: How is Amazon SQS different from Amazon Simple Notification Service (SNS)?

A

Amazon SNS allows applications to send time-critical messages to multiple subscribers through a “push” mechanism, eliminating the need to check periodically or “poll” for updates. Amazon SQS is a message queue service used by distributed applications to exchange messages through a polling model and can be used to decouple sending and receiving components.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Does Amazon SQS provide message ordering?

A

Yes. FIFO (first-in-first-out) queues preserve the exact order in which messages are sent and received. If you use a FIFO queue, you don’t have to place sequencing information in your messages. For more information, see FIFO Queue Logic in the Amazon SQS Developer Guide.

Standard queues provide a loose-FIFO capability that attempts to preserve the order of messages. However, because standard queues are designed to be massively scalable using a highly distributed architecture, receiving messages in the exact order they are sent is not guaranteed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Does Amazon SQS guarantee delivery of messages?

A

Standard queues provide at-least-once delivery, which means that each message is delivered at least once.

FIFO queues provide exactly-once processing, which means that each message is delivered once and remains available until a consumer processes it and deletes it. Duplicates are not introduced into the queue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a visibility timeout?

A

The visibility timeout is a period of time during which Amazon SQS prevents other consuming components from receiving and processing a message.

when a message is polled by a consumer, it becomes invisible to other customers. E.g., if the message visibility timeout is 30 seconds, it means that message will not be visible to other consumers for 30 seconds after it is being polled. After 30 seconds, a message will be visible for other consumers to process.
If a consumer needs more time to process the message, it can call the change message visibility API to get more time. It will inform SQS to extend the visibility timeout and now and not make this visible to other consumers to process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How SQS can scale?

A

Consumer polling messages from the queue can be deployed on EC2 instances as part of ASG. Cloud watch can be configured to trigger an alarm if length of SQS reaches a certain threshold. The alarm ApproximateNumberOfMessages can be used for this purpose. The alarm can trigger an autoscale group to add more EC2 instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how SQS can help in decoupling different application tiers?

A

let’s take an example of a video processing application. when A user submits the request for processing a video, the request can be taken in the form of SQS message. The SQS message it can be processed by the back-end processing application when it has the resources available. This is how application tiers can be decoupled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a long poll?

A

the consumer can be configured a message to arrive if there is none in the queue. This helps in decreasing the number of API calls made to SQS while increasing the efficiency and latency of an application.
The wait time can be between one second to 20 seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is SQS FIFO queue

A

FIFO queue ensures the ordering of messages in the queue. The messages are delivered to the consumer in the order they were received. It limits throughput to 300 messages/s without batching and 3000 messages with batching. Exactly once sound capability for the messages is implemented in FIFO.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to use SQS as a buffer to database rights

A

If the application is writing a transaction in the database then the database has to be available for transactions. If something goes wrong in the database, the transaction may fail. SQS can be used in the middleware. An application can send a request in the form of messages to SQS (with infinitely scalable) and then the consumer poll the messages and tries to insert them into the database. This will ensure that all the requests or transactions get written to the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to use SQS as a buffer to database rights

A

If the application is writing a transaction in the database then the database has to be available for transactions. If something goes wrong in the database, the transaction may fail. SQS can be used in the middleware. An application can send a request in the form of messages to SQS (with infinitely scalable), and then the consumer poll the messages and tries to insert them into the database. This will ensure that all the requests or transactions get written to the database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Amazon SNS service?

A

Sns service is a pub sub-service in which even producer only send messages to 1SNS topic. Many subscribers may listen to those messages via subscribing to SNS topic. Each subscriber receives all the messages. They can be 1.2 million subscriber per topic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What kind of subscribers can be there for SNS?

A

Subscribers can be an email address, as email, mobile notification, HTTP endpoints.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is SNS Fanout pattern?

A

The Fanout scenario is where a message published to an SNS topic is replicated and pushed to multiple endpoints, such as Kinesis Data Firehose delivery streams, Amazon SQS queues, HTTP(S) endpoints, and Lambda functions. This allows for parallel asynchronous processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How SNS message filtering work?

A

JSON policy can be used to filter the messages sent to SNS topic subscriptions. If a subscription does not have a filter policy it receives every message. For example: if an order has a state as new, it can be sent to the SQS queue for new orders. If the order is canceled, it can be sent to the SQS queue for canceled orders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is kinesis data stream

A

Amazon Kinesis Data Streams is a serverless streaming data service that makes it easy to capture, process, and store data streams at any scale. It is used to stream big data into your system.
Kenesis data stream is made of multiple shards and numbered (shard 1, shard 2, etc.). Shards need to be provisioned ahead of time. The incoming data split across all the shards. Shards define the injection and consumption rate of the data that is coming in.
The producer sends data into the kinesis data stream and the consumer can be manifold. The producer can be created by using AWS SDK or the kinesis producer library
The producer creates a record in the kinesis data stream. A record is made up of two things. First is the partition key, and the second is the data blob (which can be 1 MB in size). The partition key defines in which shard the record will go.

17
Q

How consumer receives the data from the kinesis data stream?

A

The consumer can be an app, Lambda function, Kenesis data fire hose, or Kinesis Data Analytics. The consumer receives the record from the Kinesis data stream, and the record consists of a partition key, sequence number (which represents where the record was in the shard), and data. Now we have different consumption modes for Kinesis Data Streams. We have two megabytes per second of throughput shared for all the consumers per shard, Or you get two megabytes per second, per shard, per consumer if you are enabling the enhanced consumer mode, the enhanced fan-out.

18
Q

What are the key properties of the kinesis data stream?

A

The retention limit is between one day to 365 days. It provides the ability to reprocess data. Once data is entered into Kinesis, it cannot be deleted. The data with the same partition id go to the same shard.

Producers: The producer can be AWS SDK or Kenesis producer library

Consumer: you can write your own consumer using the kinesis client library or AWS SDK. You can also use managed consumers like AWS Lambda, Kinesis data firehose, or kinesis data Analytics.

19
Q

What are the different modes in the Kinesis data stream for receiving the data?

A

Partition Mode:
1. You choose the number of shards provisioned. It scales manually or using API.
2. Each shard gets one MB per second or 1000 records per second. In enhanced mode, each shard gets 2 MB per second. You pay per shard provisioned per hour.

On demand mode: No need to provide the capacity. Default capacity provisioned - 4mb/s. it is scales automatically based on observed throughput peak during the last 30 days. Pay per stream per hour and data in and out per GB

20
Q

what is kinesis data firehose?

A

Amazon Kinesis Data Firehose is an extract, transform, and load (ETL) service that reliably captures, transforms and delivers streaming data to data lakes, data stores, and analytics services. Kinesis data firehose receives the record up to one MB in size and transforms the data with the help of Lambda functions, and writes it in multiple destinations.
It’s a fully managed service with no administration or automatic scaling and is serverless. You pay for the data that goes through the firehose for the stop.
It can send failed or all data to a backup S3

21
Q

what are the AWS destinations for kinesis data firehose?

A

Amazon S3, Redshift, amazon ElasticSearch. For writing the data into RedShift first writes the data in S3 and then issues a copy command to copy the data into RedShift.

22
Q

What are other AWS Kenisis Fire hose destinations?

A

3rd party partner destinations, AWS destinations, and custom destinations.

23
Q

What’s the difference between kinesis data streams and kinesis data firehose?

A

kinesis data streaming service does ingest at scale. you can write custom code for producers and consumers. it’s real-time to 200 milliseconds. you can manage the scaling (Shard splitting/merging). data storage from one to 365 days could stop support replay capability.

Kinesis data firehose load streaming data into S 3 redshift, third-party tools, or custom STD. It can be used for data transformation as well using Lambda. it is fully managed, near real-time (look for this word in the exam), has Automatic scaling, no data storage, doesn’t support replay capabilities

24
Q

How the partition keys are used in the kinesis data stream

A

The partition keys are used to decide which shard the data stream will go to. The data record contains the partition key and it’s used redirect the record to the appropriate shard.

25
Q

How to group data in SQS (not in Kinesis)?

A

In SQS standard, there is no ordering. For SQS FIFO, you can include a group ID if you want your messages to be grouped and sent to a specific consumer. In the FIFO queue if Group ID is not used, all the messages are consumed in the order they are sent by one consumer only.

26
Q

How SNS FIFO works?

A
  • Similar features as SQS FIFO:
  • Ordering by Message Group ID (all messages in the same group are ordered)
  • Deduplication using a Deduplication ID or Content Based Deduplication
  • Can only have SQS FIFO queues as subscribers
  • Limited throughput (same throughput as SQS FIFO)
27
Q

What is Dead letter queue?

A

Dead-letter queues can be used by other queues (source queues) as a target for messages that can’t be processed (consumed) successfully. Dead-letter queues are useful for debugging your application or messaging system because they let you isolate problematic messages to determine why their processing doesn’t succeed. You cannot use dead-letter queues to postpone the delivery of new messages to the queue for a few seconds.

Dead letter queue is supported by both SQS and SNS.

In SNS: A dead-letter queue is an Amazon SQS queue that an Amazon SNS subscription can target for messages that can’t be delivered to subscribers successfully. Messages that can’t be delivered due to client errors or server errors are held in the dead-letter queue for further analysis or reprocessing.

28
Q

What is delay queue?

A

Delay queues in SQS let you postpone the delivery of new messages to consumers for a number of seconds, for example, when your consumer application needs additional time to process messages. If you create a delay queue, any messages that you send to the queue remain invisible to consumers for the duration of the delay period

29
Q

what’s the difference between kinesis and SQS ordering

A

Assuming that there are 100 trucks and five kinesis shards, and one SQS FIFO.
Kinesis data stream: on average, you will have 20 trucks per shard. Trucks will have their data ordered within each shard. The maximum amount of consumers in parallel we can have is 5. It can receive a maximum of 5 megabits of data.

SQS FIFO:
you can have only the SQS FIFO queue. you will have 100 group ID. You can have up to 100 consumers since you have 100 group ID. You have up to 300 massages per second or if you are batching.

30
Q

what’s the difference between SQS, SNS, and kinesis?

A

in SQS, the consumer pulls data, data is deleted after being consumed, and you can have as many workers as you want, with no need to no need to provision throughput, ordering guarantees only on FIFO queues, and Individual message capability.

SNS: SMS pushes data to many subscribers, up to 1.25M subscribers. data is not persistent (lost if not delivered), it is based on the publisher sub-model, you can have up to 100,000,000 topics with no need to provision throughput, integrates with SQS for fan out architecture pattern, FIFO capability for SQS FIFO

Kinesis: the standard pull data (2MB per shad), enhanced fan out push data - 2MB per shard per consumer, possibility to replay data, meant for real-time big data analytics and ETL or at the short level, and data expires after X days provisioned mode and on-demand capability mode.

31
Q

What is amazon MQ?

A

Amazon MQ is a managed message broker service that makes it easy to migrate to a message broker in the cloud. A message broker allows software applications and components to communicate using various programming languages, operating systems, and formal messaging protocols. Currently, Amazon MQ supports Apache ActiveMQ and RabbitMQ engine types.

Amazon MQ works with your existing applications and services without the need to manage, operate, or maintain your own messaging system.

32
Q

How MQ Scales?

A

MQ doesn’t scale as much as SQS/SNS. MQ runs on servers and can run in multi-AZ with failover. MQ comes with both queue features (SQS) and topic features (SNS)

33
Q

how failover is managed on Amazon MQ?

A

There would be 2 instances of MQ broker in a region. one would be active, and another would be on standby. Both of them will be connected to Amazon EFS storage (network file storage). Since the data is saved on Amazon EFS, the client can start using the standby instance if anything goes wrong with the active instance.

34
Q

Which services can consume data directly from Kinesis Data Stream?

A

Lambda, Kinesis Data FireHose, Kinesis Analytics, Container Services.