Messaging Flashcards

1
Q

List some feature of Amazon SQS

A

Unlimited Throughput

Unlimited number of messages in the queue

default retention of messages, maximum of 14 days

low latency (< 10 ms on publish and receive)

limit of 256 kb per message sent

can have duplicate messages

can have out of orrder messages

at least once delivery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does message consumption work?

A

Consumers are computing entities (EC2, on-premises, Lambda) that poll messages, up to 10 at atime, after processing the message, the consumer deletes the message using the DeleteMessage Api

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

SQS with ASG (Auto Scaling Group) -

Which metric is responsible to trigger the scaling?

A

Queue Length - ApproximateNumberOfMessages

Alarm is set up once limit is triggered scaling the consumers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SQS and Security

A

Encryption:

In-flight enryption using HTTPS API

At-rest encryption using KMS Key

Access Controls:

IAM policies to regulate access to the SQS API

SQS Access policies (Similar to S§ bucket policies):

useful for cross-account access

useful for other services (SNS, S3, …) to write to SQS queues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do Access Policies work with SQS?

A

Cross Account access:

Poll messages

specify policy in source account that the consumer in the target account assumes

policy has to specify the account id of target account, the actions to take place, and the arn of the sqs resource

Let S3 write messages to sqs queue

specify in policy the sqs arn, the action to take place, and in the condition section the arn of the s3 bucket as well as the S3 account id

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Message invisibilty time out in SQS?

A

After a message is polled by a consumer it is invisible to other consumers for 30 seconds (default)

If message is not deleted during the timeout it returns to the queue

Hence, if a message is not processed and deleted during the time out, it will be processed twice

The ChangeMessageVisibilty API to increase time out from the consumer-side

too high timeout => slow processing

too small timeout => message is processed multiple times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Dead Letter Queues in SQS?

A

A threshold can be set on how many times a message is returned to the queue after timeout

After the MaximumReceives threshold is exceeded, the message is put into the dead letter queue

useful for debugging

make sure to process the messages before they expire

set retention to 14 days

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a delay queue in SQS?

A

Delay a message up to 15 minutes until the consumers can see it

Default is 0 seconds

Can be set at queue level

Can override default on send using the DelaySeconds parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Long Polling in SQS?

A

When a consumer requests messags from a queue it can optionally wait until a message arrives

Reduces api calls made to the SQS queue, while improving latency and increasing efficiancy of the application

Wait times between 1 sec to 20 sec

Long Polling is preferred to Short Polling using WaitTimeSeconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is SQS Extended Client?

A

A java library to increase message size (256kb) up to 1 gb?

A small metadata message is sent to the queue that contains a pointer to the S3 bucket holding the big data

Hence, the large message is sent to and retrieved from S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Usefule SQS API calls

A

CreateQueue (MessageRetentionPeriod), DeleteQueue

PurgeQueue: delete all messages in queue

SendMessage (DelaySeconds), ReceiveMessage, DeleteMessage

MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage Api)

ReceiveMessageWaitTimeSeconds: Long Polling

ChangeMessageVisibility: change the message time out

Batch APIs for SendMessage, DeleteMessage, ChangeMessageVisibility help to reduce costs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What about the FIFO queue in SQS?

A

More ordering than in a standard queue

Limited Throughput: 300 msg/s without Batch, 3000 msg/s with batch

Exactly once-send capability, by deleting duplicates - feature of fifio queues

Messages are processed in order by the consumer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Deduplication in a FIFO queue?

A

If the same message arrives within 5 minutes, the second one is refused

Two methods:

Content-based Deduplication using a message hash with sha256

Explicilty provide a Message Deduplication ID

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Message Grouping in a FIFO queue?

A

If the same value for MessageGroupID is provided then all of these messages will be processed by the same consumer in the order they appear

TO get a level of ordering for a subset of messages then use different values for MessageGroupID

Each Group id can have a different consumer

Ordering across groupd is not guaranteed

Maybe the total ordering of all messages is not needed, only consumer specific

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is AWS Simple Message Service SMS?

A

Want to send one message to many receivers

uses Pub/Sub

one topic posts to many subscribers

event producer sends messages to only one SNS topic

as many receivers as we want will listen to the SNS topic notifications

each subscriber to the topic will get all the messages (new filter is implemented)

up to 10 mil subs per topic

100k topics limit

Subscribers can be:

Lambda, SQS, Emails, HTTP/S, SMS messages, Mobile Notifications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is SNS integrated into AWS?

A

Many AWS services can send data directly to SNS

CloudWatch Alarms

AutoScaling Actions

Amazon S3

17
Q

How to publish a message with SNS?

A

Topic Publish:

Create a topic

Create at least one sub

Publish to the topic

Direct Publish:

Create a platform app

Create a platform endpoint

Publish to the platform endpoint

Works with Google GCM, Apple APNS, Amazon ADM

18
Q

Security with SNS

A

Encryption:

In-flight encryption with HTTPS API

At-rest encryption using KMS keys

Client-side encryption if the client wants to perform encryption/decryption himself

Access Controls:

IAM policies to regulate access to the SNS API

SNS Access Policies:

Usefule for cross-account access

Useful for other services to write to an SNS topic

19
Q

What is Fan Out in SNS and SQS?

A

Push once in SNS, receive in all SQS queues that are subs

Fully decoupled, no data loss

SQS allows for: data persistence, delayed processing, and retries of work

Ability to add more SQS subs over time

Make sure your SQS queue access policy allows for SNS to write

AND MORE

20
Q

What is AWS Kinesis?

A

Makes it easy to collect, process, analyze streaming data in real time

4 Kinesis Services:

Kinesis Data Streams: capture, process, and store data streams

Kinesis Data Firehose: load data streams into AWS data stores

Kinesis Data analytics: analyze data streams with Apache Flink or SQL

Kinesis Video Streams: no in the exam

21
Q

How does Kineses Data Stream work?

A

Producer (Apps, clients, sdks, kinesis agent) write records (partition key, Sequence no, data blob) to shards of Kinesis Data Stream with 1 mb/sec or 1000 msg/sec per shard

The shards write records (partition key, Sequence no. , data blob) to consumers (Apps, Lambda, Kinesis Data Firehose, Kinesis Data Analytics) with either shared 2 mb/sec per shard all consumers or enhanced 2 mb/sec per shard per consumer

22
Q

Some more info on Kinesis Data Stream

A

Billed per shard

Can have as many shards as desired

Retention of data between 1 day (default) and 365 days

Ability to reprocess/replay data

Once data is inserted in Kinesis it cant be deleted (immutable)

data in the same partition goes to the same shard (ordering)

Producers: AWS SDK, Kinesis Producer Library (KPL), Kinesis Agent

Consumers:

my own: Kinesis Client Library, SDK

managed: Lambda, Kinesis Firehose, Kinesis Analytics

23
Q

Kinesis Data Security

A

Control Access/Authorization using IAM Policies

Encryption in-flights with HTTPS endpoints

Ecryption at rest using KMS

Encryption on client side can be implemented (harder)

VPC endpoints available to access over VPC network

Monitor API calls using CloudTrail

24
Q

What do Kinesis Producers do?

A

Put data records into data streams

A data record consists of: partition key, sequence no, data blob (up to 1 mb)

Producers: AWS SDK, Kinesis Producer Library (KPL) c++, java, batch compression retries, Kinesis Agent (monitor log files)

Write Throughput: 1 mb/sec or 1000 records per shard

PutRecord API

Use batching with PutRecords API to reduce costs and increase throughput

ProvisionedThroughputExceeded: too many records in too short a time for one shard, choose partition key wisely, retries with exponential backoff, increase shards

25
Q

What do Kinesis Consumers do?

A

Get data records from data streams and process them

aws lambda, Kinesis Data Analytics, Kinesis Data Firehose

Customer Consumer (AWS SDK) - Classic or enhanced fan out

Kinesis Client Library - library to simplify reading from data stream

26
Q

Kinesis Consumer Types

A

Shared Fan-out consumer - pull

low number of consuming applications

Read throughput: 2 mb/sec per shard across all consumers

Max: 5 GetRecords Api call per sec

Latency around 200 ms

cheap

COnsumers poll data from shards using GetRecords api call

Returns up to 10 mb (then throttles for 5 sec) or up to 1000o records

Enhanced fan-out consumer - push

2 mb/sec per consumer/shard

latency around 70 ms

more expensive

Kinesis pushes data to consumers over HTTP/2 - SubscribeToShard API

Soft limit of 5 apps (KCL) per data stream (default)

27
Q

Lmabda as a Kinesis Consumer

A

SUpports classic and enhanced fan out consumers

read records in batches

can configure batch size and batch window

uses GetBatch api call

on error, lambda retries until data expires

can process up to 10 batches per shard simultaniuosly

28
Q

What is the Kinesis Client Library?

A

A java library that helps distributed apps with the read workload from a Kinesis Data Stream

One KCL instance per shard maximum, less is okay

Progress is checkpointed to Dynmao DB, requires permissions

Track other workers and and share work among shards using dynamo db

KCL can run on EC2, Elastic Beanstalk, and on-premises

Records are read in order at the shard level

Versions: KCL 1.x only shard consumer, KCL 2.x also enhances

29
Q

Kinesis Shard SPlitting

A

Used to increase stream capacity (make 2 out of 1 shard e.g.) 1 mb/sec data per shard ?

used to divide a hot shard

old shard is closed and will be deleted once data expires

no automatic scaling

can’t split into more than 2 shard in one operation

30
Q

Kinesis Merging Shard

A

Decrease stream capacity and reduce costs

can be used to group two shards with low traffic (cold shards)

old shards are closed and will be deleted once data expires

cant merge more than two shards in one operation

31
Q

Kinesis Data Firehose

A

Can take data from Kinesis Data Stream, but alos all producers, CLoudWatch, …

writes data to Destinations:

S3, Redshift (copy through S3), Amazon ElasticSearch

or 3rd party destinations, mongo db etc or my own api

fully managed service, serverless, autoscaling, no administration

pay for data going through firehose

near real time, not real time

support custom data transformation with lambda

can send failed or all data to a backup S3 bucket

32
Q

Kinesis Data Analytics

A

Gets Data from Kinesis Data Stream of Data Firehose

enables to apply SQL language to data from sources

Data Analytics than can send this data to a Stream or Firehose, and thus to all endpoints that these two offer

real-time analytics for stream data using SQL

fully managed, no servers to provision

automatic scaling

real-time analytics

pay for actual consumption rate

can create streams out of the real-time queries

use cases: time series analytics, real-time dashboards, real-time metrics

33
Q

Ordering Data in Kinesis and SQS FIFO

A