Kinesis Flashcards

1
Q

What is the default shard limit per Kinesis stream?

A

500 shards per stream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Kinesis shard?

A

A shard contains multiple data records, consists of a partition key, sequence number and data payload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the read limits from a Kinesis shard?

A

5 read transactions/sec or 2 MB data per sec

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the write limits to a Kinesis shard?

A

1000 write transactions/sec or 1MB data per sec

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the size limit of a data payload in KDS?

A

1MB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you scale a KDS?

A

You add or subtract shards in a process called resharding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

By default how long is data retained in Kinesis Data Streams?

A

24 hours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the minimum and maximum retention period of data in Kinesis Data Streams?

A

24 hours min, 365 days maximum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a partition key in KDS?

A

Attribute that determines which shard data gets sent to. Same kinesis worker processes 1 shard.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why use Kinesis Firehose over KDS?

A

Firehose automatically scales, is fully managed and integrates directly with AWS services, but is only near realtime, and data storage limited to 24 hours, no replay

KDS is realtime, low latency, for custom application, able to do data storage, replay records but requires custom work to scale/reshard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 4 main benefits of using KCL?

A

Kinesis client library allows you to
1. automatically integrate with KPL to de-aggregate records
2. Checkpoints processed records for you
3. Auto balances shard to workers leases if worker or shard counts change
4. Sends custom metrics to CloudWatch automatically

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What languages does KCL support?

A

KCL is written in Java but allows you to use other runtimes like Python via MultiLangDaemon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a record processor in KCL?

A

the logic for how data is processed and is instantiated one record processor per shard by a worker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How many workers are there in KCL?

A

There is 1 worker per KCL application instance, with 1 or more application instance running in a distributed fashion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How would you resolve issues with throttling on a shard with multiple consumers?

A

Since read limits on a shard are per shard, you can enable enhanced fan-out which makes the limit the same for each consumer instead of shared by all consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Is KPL synchronous or async?

A

Can use either one with KPL, but async is default and recommended

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How many consumers can there be of a shard?

A

Multiple consumers can read from a shard

18
Q

What are the 4 benefits of KPL?

A
  1. Increases performance by aggregating small records
  2. Provides automatic retry logic if there is record failure
  3. Handles multi-threading, batching, aggregation
  4. Sends metrics to CloudWatch automatically
19
Q

What is a downside of KPL?

A

There can be some extra processing delay due to the wrapper code, up to the RecordMaxBufferedTime

20
Q

What happens in Firehose if a data producer is sending more data than Firehose is able to deliver to S3

A

The BufferSize will dynamically increase and attempt to catch up with the delivery stream

21
Q

Does Firehose support KPL de-aggregation from a KDS?

A

Yes, de-aggregates before delivering to a destination or before Lambda pre-processing

22
Q

Which Kinesis option supports native S3 Backup integration?

A

Firehose supports S3 backup of original source data as well as failed data (processing or delivery failure)

23
Q

What is a common Firehose task?

A

Converting record formats from JSON to Parquet or ORC, then storing in S3

Can also have a Firehose Lambda to transform source data into JSON first e.g. CSV into JSON

https://docs.aws.amazon.com/firehose/latest/dev/record-format-conversion.html

24
Q

What are possible data sources for Firehose?

A
  1. KDS
  2. Kinesis Agent
  3. AWS SDK
  4. CloudWatch
  5. AWS IOT
25
Q

What are the possible output sources for KDA?

A
  1. KDS
  2. Kinesis Firehose
  3. Lambda
26
Q

What are the 2 interfaces to write KDA apps?

A

SQL Interface or Apache Flink (Java) Interface

27
Q

Can you write KDA to multiple outputs? What is the limit?

A

Yes you can write to multiple destinations, up to 3

28
Q

What are the 3 windowed query types for KDA?

A
  1. Stagger: aggregate as windows open when data arrives. time based windows, reduces late/out of order/inconsistent arrival data
  2. Tumbling: aggregate based on windows that open and close on regular intervals, nonoverlapping manner
  3. Sliding: fixed time or row count interval, continuous aggregation, overlapping windows
29
Q

Why use MSK over Kinesis?

A
  1. MSK has unlimited retention period
  2. MSK allows greater payload size of 6MB vs Kinesis 1MB
30
Q

What are possible data sources for KDA Flink? vs. SQL?

A
  1. KDS
  2. MSK
  3. KDS
  4. Firehose
31
Q

What are the downsides of MSK?

A
  1. Cluster provisioning model
  2. 3rd party tooling not integrated with AWS natively
  3. Scaling is not seamless to clients
32
Q

What is the Firehose buffer size min/max for S3 and ES?

A
  1. S3 is 1MB to 128 MB
  2. ES is 1MB to 100 MB
33
Q

What is the Firehose buffer interval?

A

60 to 900 seconds

34
Q

What is the payload limit for MSK?

A
  1. 8MB
35
Q

What is the payload limit for Firehose?

A
  1. 1024 KB or 1 MB
36
Q

What is the process of resharding in KDS?

A
  1. Merge Shards
  2. Split Shards
37
Q

What are the 5 destinations KDS can write to?

A
  1. Lambda
  2. Kinesis Firehose
  3. Kinesis Data Analytics
  4. KCL
  5. Glue Streaming
38
Q

What are the 3 data sources for KDS?

A
  1. KPL
  2. Kinesis Agent
  3. PUT to Kinesis API (SDK)
39
Q

What are the 5 destinations for Kinesis Firehose?

A
  1. Redshift
  2. S3
  3. OpenSearch aka ES
  4. Http Endpoint
  5. Vendor Integration
40
Q

What are the 2 data sources for KDA?

A
  1. KDS
  2. Kinesis Firehose
41
Q

How would you get realtime events from CloudWatch? What are the valid destinations?

A
  1. Use Cloudwatch Logs with Subscription Filters
  2. Destinations are Lambda, KDS, Firehose
    https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Subscriptions.html
42
Q

What streaming service should you use if you have reference data in S3 that needs to be joined/merged?

A

Kinesis Data Analytics