AWS Kinesis Data Firehose Flashcards

1
Q

What is Kinesis Data Firehose

A

Fully Managed Service, no administration, automatic scaling, serverless. It takes data from a producer, can transform it using Lambda, and then writes it to a consumer

Below are the consumers
* AWS: Redshift / Amazon S3 / OpenSearch
* 3rd party partner: Splunk / MongoDB / DataDog / NewRelic
* Custom: send to any HTTP endpoint

  • Pay for data going through Firehose

Near Real Time - (Exam key hint)
* 60 seconds latency minimum for non full batches
* Or minimum 1MB of data at a time

  • Supports many data formats, conversions, transformations, compression
  • Supports custom data transformations using AWS Lambda
  • Can send failed or all data to a backup S3 bucket

It automatically scales, doesn’t support replay capability like Kinesis streams, and there are no data storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Explain Firehose Buffer Sizing

A

Firehose accumulates records in a buffer
The buffer is flushed based on time and size rules:
* Buffer Size (ex: 32MB): if that buffer size is reached, it’s flushed
OR
* Buffer Time (ex: 1 minute): if that time is reached, it’s flushed

  • Firehose can automatically increase the buffer size to increase throughput
  • High throughput => Buffer Size will be hit
  • Low throughput => Buffer Time will be hit
  • If real-time flush from Kinesis Data Streams to S3 is needed, use Lambda
How well did you know this?
1
Not at all
2
3
4
5
Perfectly