What are the 4 options to send data into a Kinesis Stream?
- Kinesis SDK
- Kinesis Producer Library (KPL)
- Kinesis Agent
- 3rd party libraries
What is the Kinesis Producer Library (KPL)?
It is a more advanced way to send data to the stream
What is the Kinesis Agent?
It is an agent that runs on a Linux server
What are some examples of 3rd party libraries to connect to Kinesis?
- Kafka Connect
When would you decide to use Kinesis SDK?
A case with low throughput, higher latency, simple API using AWS Lambda etc
What would a “ProvisionedThroughputExceeded” exception indicate?
It indicates you have exceeded you data sending limits
What are the 3 possible solutions to resolve “ProvisionedThroughputExceeded”?
- Retry with backoff
- Increase shards (scaling)
- Ensure your partition key is good
What are the 8 features of the Kinesis Producer Library (KPL)?
- Easy to use and highly configurable with C++/Java
- Used for building high performance, long-running producers
- Automated and configurable retry mechanism
- Synchronous or Asynchronous API
- Submits records to Cloudwatch for monitoring
- 2 Batch features enabled by default
- Compression must be enabled by the user
- KPL records must be decoded by KCL
What are the 2 Kinesis Producer Library (KCL) Batch features enabled by default?
- Collect records and write to multiple shards in the same PutRecords API call
- Aggregate records
What are the 2 things that happen when Kinesis Producer Library (KPL) Aggregates records?
- Store multiple records in 1 record, allowing you to go over the 1000 records per limit
- Increase payload size and improve throughput (max 1MB/s)
What is an example of Kinesis Producer Library (KPL) Batching records?
The limit in Kinesis is 1MB. But if KPL finds 3 records that have a combined total less than that, it will combine them into 1 message.
How does KPL know how long to wait before sending batched records?
You can set this with the “RecordMaxBufferedTime”, which defaults to 100ms
What are the 4 features of Kinesis Agent?
- Write from multiple directories and write to multiple streams
- Routing feature based on directory/log file
- Pre-processing data before sending to streams (csv to json etc)
- Handles log rotation and retry