D4: Analysis Flashcards

1
Q

What are the main sources of data for Kinesis Analytics?

A

KDS, KDF
S3 can store Reference tables to be able to join/enrich the incoming data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can Kinesis Analytics Reference Tables be used

A
  • You store a mapping file in S3 and make that available as a reference table in Kinesis Analytics
  • You use a JOIN command in SQL to join data that in.
  • Eg. you have zip codes in the data, but want to enrich that with the city names.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How are errors handled in Kinesis Analytics

A

There is an error stream that records will be written to when there are error conditions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the possible destinations for Kinesis Analytics

A

KDS, KDF, and Lambda
Once you send data to Lambda, that opens up several other destinations that lambda integrates with (eg SNS, S3, Dynamo, Redshift, SNS, SQS, Cloudwatch, etc)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does KDA for Apache Flink work?

A
  • Flink is an open source framework for hanlding data streams.
  • You can develop and use that Flink application and store that in S3, reference that when you setup KDA for Flink
  • Serverless, you dont need to worry about where/how Flink runs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some common use cases for KDA?

A
  • Streaming ETL
  • Continuous metric generation
  • Responsive analytics - eg computing the availability or success of a customer facing API over time an send that to Cloudwatch
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

KDA Cost Model, Security

A

Serverless, you pay for what you consume
1 KPU = 1 vCPU + 4GB mem
Use IAM permissions to access streaming source

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is KDA Scema Discovery?

A

KDA can analyze an incoming stream to discover the schema

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is RANDOM_CUT_FOREST?

A

It is a SQL function used for anomaly/outlier detection on numeric columns in the stream
Example: detect anomalous subway ridership during NYC marathon.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly