Domain 1: Collection Flashcards

1
Q

Which Kinesis services offers asynchronous features and high throughput?

A

Kinesis Producer Library

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where must compression be implemented in Kinesis?

A

By the end user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How many GetRecords API calls are allowed per second by Kinesis streams in Classic mode?

A
  • Maximum of 5 GetRecords API calls per shard per second = 200ms latency
  • If 5 consumers application consume from the same shard, means every consumer can poll once a second and receive less than 400 KB/s
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the average latency in Kinesis Steams Enhanced Fan Out mode?

A

70ms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the throughput of Kinesis Consumer Classic mode?

A

2MB/sec

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 4 services that Kinesis Firehose can write to?

A

S3, Redshift, ElasticSearch, Splunk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the key features of the Kinesis Producer Library (KPL)

A
  • Used for building high performance, long-running producers
  • Automated and configurable retry mechanism
  • Synchronous or Asynchronous API (better performance for async)
  • Submits metrics to CloudWatch for monitoring
  • Batching (Collect and Aggregate)
  • Compression must be implemented by the user
  • KPL Records must be de-coded with KCL or special helper library
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which protocol is not supported by IoT Device Gateway?

A

FTP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the minimum latency for Firehose with non full batches?

A

60 seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What data conversions are possible using Firehose with S3/

A

JSON to Parquet/ORC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What data transformations are possible using Firehose with Lambda?

A

CSV to JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What compression algorithms are supported by Firehose with S3?

A

GZIP, ZIP, SNAPPY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What compression algorithm is supported by Firehose with Redshift?

A

GZIP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are you charged on Firehose?

A

Amount of data going through Firehose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Can Spark and KCL read from Firehose?

A

No. They can only read from Kinesis Data Streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the minimum buffer time in Firehose?

A

60 seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Can resharding be done in parallel?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

To how many AZs is data replicated in Kinesis Data Streams?

A

3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the default retention period in Kinesis Data Streams?

A

24 hours

(or customizable to 365 days)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can data be deleted from Kinesis streams?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a key best practice with partition keys in Kinesis Streams?

A

Highly distributed keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the maximum size of data blobs in Kinesis?

A

1MB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the throughput limits for Kinesis producers?

A

1MB/s or 1000 messages/s at write per shard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What happens if you exceed throughput limits on Kinesis producers?

A

Provisioned Throughput Exception

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are the throughput limits for Kinesis consumers in Classic mode?
2MB/s per shard across all consumers 5 API calls per second per shard across all consumers
26
What are the throughput limits for Kinesis consumers in EFO mode?
2MB/s per shard across per enhanced consumer No API calls needed
27
Is Kinesis EFO a push or pull model?
Push
28
What are the use cases for Kinesis Producer SDK?
Low throughput, high latency, keep it simple, AWS Lambda
29
Can Kinesis Data Analytics produce back into Kinesis Data Streams?
Yes
30
What is the first troubleshooting step if you get a Provisioned Throughput Exception?
Check for hot shards (bad partition key)
31
How do you remediate a Provisioned Throughput exception?
- Retries with backoff - Scale up your shards - Improve partitions
32
What languages are available in Kinesis Producer Libraries?
C++ and Java
33
Which Kinesis producer should be used for asynchronous requirements (or more performant requirements)?
Kinesis Producer Libraries
34
Through what methods can Kinesis Producer Library records be de-coded?
Kinesis Client Library or special helper library (Lambda)
35
What configuration item can be used to adjust buffer times for KPL batches? What is the default configuration?
RecordMaxBufferedTime; 100 ms
36
Which Kinesis consumer option offers checkpointing using DynamoDB?
KCL (Client)
37
On what service must Kinesis Connector Libraries run?
EC2
38
What services can Kinesis Connector Libraries write to?
S3, DynamoDB, Elasticsearch, Redshift
39
Which two tools have mostly replaced the use case for Kinesis Connector Libraries?
Firehose and Lambda
40
What happens to the data in the old shard after it has been split or merged?
It will be deleted once the shard expires
41
Which protocols are supported by IoT Gateway?
MQTT, WebSockets, HTTP 1.1
42
What is IoT Message Broker?
Pub/Sub messaging tool, used for devices to communicate with each other
43
What is IoT Thing Registry?
IAM for IoT, supports metadata, creates X.509 certificates, provides IoT Groups
44
What are the three authentication methods for IoT Things?
X.509 certs, AWS SigV4, Custom tokens
45
What is the IoT Rules Engine rules defined?
On the MQTT topics
46
What is IoT Greengrass?
Allows compute (Lambda functions) to be executed on the IoT Thing itself.
47
What are the 3 types of Collection (frequency)?
1. Real-time (KDS, SQS, IoT) 2. Near real-time (KDF, DMS) 3. Batch (Snowball, Data Pipeline)
48
What are the three parts of a Kinesis Stream Record?
1. Data Blob: where the data is stored. max 1mb of data 2. Record Key: sent alongside a record, helps to group records in Shards. Same key = Same shard. 3. Sequence number: Unique identifier for each records put in shards. Added by Kinesis after ingestion
49
What are the use cases for Kinesis Agent?
* Monitor Log files and sends them to Kinesis Data Streams * Java-based agent, built on top of KPL * Install in Linux-based server environments
50
What consumers are available for Kinesis Streams (Classic)?
* Kinesis SDK * Kinesis Client Library (KCL) * Kinesis Connector Library * 3 rd party libraries: Spark, Log4J Appenders, Flume, Kafka Connect… * Kinesis Firehose * AWS Lambda * (Kinesis Consumer Enhanced Fan-Out discussed in the next lecture)
51
When consuming data to DynamoDB, using KCL, what should you do if you get an ExpiredIteratorException?
KCL raises this exception because DynamoDB is not fast enough to keep up with the writes. To solve that, you need to increase the WCU (Write Capacity Units) of the DynamoDB.
52
What are the key differences of Enhanced-Fan Out (EFO) vs Standard Consumers?
**Standard consumers:** * Low number of consuming applications (1,2,3…) * Can tolerate ~200 ms latency * Minimize cost **Enhanced Fan Out Consumers:** * Multiple Consumer applications for the same Stream * Low Latency requirements ~70ms * Higher costs (see Kinesis pricing page) * Default limit of 5 consumers using enhanced fan-out per data stream
53
What is “out-of-order” records after resharding? and how to solve it?
* If you start reading the child before completing reading the parent, you could read data for a particular hash key out of order * to solve that, after a reshard, read entirely from the parent until you don’t have new records * Note: The Kinesis Client Library (KCL) has this logic already built-in, even after resharding operations
54
How duplicates created by producers can be handled by consumers?
* producers may embed a unique record ID so consumers can understand duplicates * consumers could be idempotent, which means they will know how to handle duplicates * or you can treat the duplication in the final destination, for example in a database
55
key diffs: Streams vs Firehose
**Streams** * Going to write custom code (producer / consumer) * Real time (~200 ms latency for classic, ~70 ms latency for enhanced fan-out) * Must manage scaling (shard splitting / merging) * Data Storage for 1 to 365 days, replay capability, multi consumers * Use with Lambda to insert data in real-time to ElasticSearch (for example) **Firehose** * Fully managed, send to S3, Splunk, Redshift, ElasticSearch * Serverless data transformations with Lambda * Near real time (lowest buffer time is 1 minute) * Automated Scaling • No data storage
56
What are the 3 services that can receive a stream of CloudWatch Logs Subscription Filters?
1. With Firehose for near real-time (may be cleaned/enriched with lambda) 2. With Lambda for real-time 3. With Kinesis Streams if Analytics is needed
57
How can you get High Resiliency and Maximum Resiliency with Direct Connect?
**High Resiliency:** One connection at multiple locations **Maximum Resiliency:** separate connections terminating on separate devices in more than one location
58
What MSK stands for?
* Managed Streaming for Apache Kafka * it is an alternative to Kinesis Data Streams
59
What are the key diffs of Streams vs MSK?
60
Which of the following Firehose does not write to? * S3 * Redshift * DynamoDB * ElasticSearch / OpenSearch * Splunk
DynamoDB