Datawarehousing Flashcards

(91 cards)

1
Q

Is Redshift good for ELT?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Can Lambda Expression be trigged by IOT?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Can Lambda Expression be trigged by Kinesis?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Can Apache Spark notebooks run on EMR?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can Apache Spark read from S3?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can Apache Zeppelin be used to visualize data in Amazon Redshift?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Is Redshift a columnar database?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is Redshift MPP?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Is Redshift ANSI SQL Compliant?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In addition, to data compression and columnar storage, how is I/O reduced in Redshift?

A

Zone maps : A zone map exists for each 1 MB block, and consists of in-memory metadata that tracks the minimum and maximum values within the block, Hence if you sort the column e.g. a date_column If it is sorted then it will be faster to find the block in which data is stored. Amazon redshift does not use indexes as any conventional database.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can Redshift Clusters be managed via API?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Does redshift support ODBC and JDBC?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe Redshift architecture?

A

1 Leader Node. Communicating to multiple Compute nodes that house the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does Redshift encrypt data at rest?

A

Yes AES-256

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Does Amazon Redshift take care of key management?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Anti-Patterns for Redshift

A

Small datasets, OLTP, Unstructured data, BLOB data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the 2 methods used by Kinesis Firehouse?

A

PutRecord and PutRecordBatch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the max size for a Firehouse PutRecord?

A

1000 Kb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Kinesis Agent

A

Java agent is a stand-alone software which can send information to Kinesis and Kinesis Firehose. It can be installed on Linux servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Can the Kinesis Agent monitor multiple files and write to multiple streams?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the max buffer size for Kinesis Firehose?

A

3Mb

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Can Kinesis Firehouse invoke a Lambda Function?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why should a record separator be added to Kinesis Stream data?

A

Kinesis stream bundles records together. If you don’t add a record separator, you can’t split the records later.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are buffer sizes for S3?

A

1 MB - 128 MB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What are the buffer intervals for S3?
60 to 900 Seconds
26
Can Kinesis Firehouse dynamically raise the buffer size?
Yes
27
What does the Redshift copy command do?
Copies data from dynamoDB or S3 into Redshift existing table
28
Before you send a record to Kinesis Firehouse, what do you need to do?
Flatten the record and make sure it is in UTF-8 encoded into a single JSON object
29
What is the elastic search buffer size range?
1 MB to 100 MB
30
What is the buffer interval for elastic search
60 to 900 seconds
31
Describe Kinesis Analytics
A SQL based query that can aggregate data in a stream and output to a kinesis stream or a lambda function
32
What is the maximum time a Lambda Function can run?
5 minutes
33
How do Kinesis Stream and Kinesis Firehose differ?
Kinesis Streams. The more customizable option, Streams is best suited for developers building custom applications or streaming data for specialized needs. The customizability of the approach, however, requires manual scaling and provisioning. Data typically is made available in a stream for 24 hours, but for an additional cost, users can gain data availability for up to seven days. Kineses Firehose. The simpler approach, Firehose handles loading data streams directly into AWS products for processing. Scaling is handled automatically, up to gigabytes per second, and allows for batching, encrypting, and compressing. Firehose also allows for streaming to S3, Elasticsearch Service, or Redshift, where data can be copied for processing through additional services.
34
What are some destinations for Kinesis Analytics?
Firehouse, Streams, S3, Redshift, Elastic Search
35
Can data be enriched via Kinesis Stream?
Yes, but it must be stored in S3 and then an in-application reference table is created by Kinesis stream
36
What is a common use case for Kinesis Stream?
Read streaming data and analyze and aggregate it and drop to EMR or Redshift
37
Why would one use KPL and KPC?
KPL and KPC are the kinesis libraries that take care of load balancing, multi-threading, aggregatio and de-aggregation, retries, scaling, and other functionality not in the Kinesis API. They are placed between the produce and consumer programs and the streams.
38
How else is data placed into a Kinesis Stream?
Via the API or via an agent that is installed on each client. The agent monitors for file changes (e.g. log files)
39
What are the two modes of operation for the KPL?
Synchronous and Asynchronous?
40
Which mode is preferred practice?
Asynchronous
41
If you had to reduce end-to-end latency would you use KPL, Kinesis Agent, or the Kinesis API?
API?
42
What languages does Lambda support?
AWS Lambda supports code written in Node.js (JavaScript), Python, Java (Java 8 compatible), and C# (.NET Core) and Go. Your code can include existing libraries, even native ones.
43
What are the 3 ways provision your I/O in kinesis stream?
They can be provisioned in in 1 MB increments via API, Console, or SDK
44
What can you tell me about data in a Kinesis stream?
It is stored for 24 hours by default, and replicated across 3 AZs.
45
Ideal Patterns for Kinesis Stream?
Real-time data analytics, log and data intake and processing, Real-time metrics and reporting
46
Is a Kinesis stream made up of shards?
Yes
47
How many read transactions does each shard give you?
5
48
How many MB can 5 read transactions give you?
2 MB
49
How many writes per second can a shard support?
1000
50
A shard can support how much per second?
1 MB data written per second
51
What determines the data capacity of your stream?
The number of shards
52
Each shard can capture how many MB per second?
1 MB
53
Each shard can write how many MB per second?
2 MB
54
In case of failure, where can you store the cursor for Kinesis?
DynamoDB
55
What is kinesis storm spout?
The Amazon Kinesis Storm Spout helps developers use Amazon Kinesis with Storm, an open source, distributed real-time computation system. This version of the Amazon Kinesis Storm Spout fetches data from the Amazon Kinesis stream and emits it as tuples that Storm topologies can process. Developers can add the Spout to their existing Storm topologies, and leverage Amazon Kinesis as a reliable, scalable, stream capture, storage, and replay service that powers their Storm processing applications.
56
Name two anti-patterns for Kinesis?
Long term storage and small scale consistent throughput
57
Name 5 ideal patterns for lambda?
real-time processing, real-time file processing, cron, AWS events, ETL
58
What two modes can Lambda expressions function?
Synchronously and Asynchronously
59
What happens when a synchronously called Lambda function fails?
It throws an exception
60
What happens when an asynch lambda gets called and fails?
It gets called 3 times.
61
How many lambda functions can run concurrently per account?
100
62
What are the 3 anti-patterns for Lambda?
Long running apps. Dynamic websites. Stateful apps.
63
Ideal usage patterns?
log processing, ETL, Big Data, data mining
64
Is EMR fault-tolerant for code node failure?
Yes
65
Does EMR provision for failed slave nodes?
No
66
Amazon EMR with MapR distribution has what advantage?
No-name node architecture that can tolerate failure
67
Does EMR integrate with S3 and DynamoDB?
Yes
68
What is Spark?
An open-source analytics in-memory analytics engine?
69
What is Impala?
SQL for hadoop
70
What is Hbase?
An open-source distributed database running on top of hadoop
71
What is S3DispCP
Apache DistCp is an open-source tool you can use to copy large amounts of data.During a copy operation, S3DistCp stages a temporary copy of the output in HDFS on the cluster. S3DistCp is an extension of DistCp that is optimized to work with AWS, particularly Amazon S3.
72
What is EMRFS?
an implementation of HDFS on S3. You can enable client and server side encryption. Metadata is stored in dynamodb
73
Name 2 anti-patterns for EMR?
small data sets and ACID transactions
74
Name 2 anti-patterns for ML?
Very large dataset and unsupported learning tasks?
75
What is dynamodb streams?
DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table, and stores this information in a log for up to 24 hours. ... A DynamoDB stream is an ordered flow of information about changes to items in an Amazon DynamoDB table.
76
What is the limit of data storage for dynamo db?
None
77
What are the anti-patterns for dynamo db?
Joins, ad-hoc query, blobs, and large-data with low i/o rate
78
What service would you use for OLAP/BI?
Redshift because it has columnar storage. It is scaleable and works with BI tools
79
Where do Redshift clusters reside?
Within an AZ
80
Can Redshift clusters reside across multiple AZs?
If you set it up for replication manually, yes.
81
Name 4 Redshift anti-patterns.
ACID, BLOB, Unstructured and small datasets
82
What types of searches are done with Elastic Search?
Text, structure data, analytics
83
Is Elastic Search self-healing?
Failed clusters are replaced auto-magically
84
What does ES integrate with?
Logtash (log pipeline) and Kibana (Analytics and visualization)
85
Elastic Search suited for?
Log analysis, streaming data,
86
Elastic Search Anti-Patterns
OLTP and Petabyte Storage
87
Quicksight
Cloud powered-BI for visualization and ad-hoc queries
88
AWS Shield
managed DDoS
89
What is Cost Explorer?
Service that lets you gain insight into where costs are spent.
90
Spark Streaming
extends spark API can be installed on EMR.
91
SparkSQL
extends spark API allows SQL queries along side complex calculations