Domain 4: Analysis Flashcards

1
Q

RANDOM_CUT_FOREST

A

Kinesis Data Analytics SQL (or Flink) Function for anomaly detection in numeric columns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Kinesis Firehose Buffer Limits

A

1 to 128 MB

60 to 900 seconds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Kinesis Data Analytics Supported Sources

A

Kinesis Streams and Kinesis Firehose

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Kinesis Data Analytics Supported Destinations

A

Kinesis Streams, Kinesis Firehose, Lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What happens if a record arrives late to a Kinesis Data Analytics application

A

Record is written to the error stream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In what form does Kinesis Data Analytics provision capacity?

A

Kinesis Processing Units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How much memory is provided per KPU?

A

4GB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the default number of KPU per Kinesis Data Analytics application?

A

8

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the name of the visualization tool in the Elastic Stack?

A

Kibana

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is ElasticSearch Serverless?

A

No, still have to scales servers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What should ElasticSearch NOT be used for?

A
  • OLTP (RDS or DynamoDB instead)

- Ad-Hoc Querying (Athena instead)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can data be imported to ElasticSearch?

A

Kinesis, DynamoDB, Logstash, Beats, ElasticSearch API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What query engine does Athena use?

A

Presto

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What data formats does Athena support?

A

CSV, JSON, Parquet, ORC, Avro

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Is Athena serverless?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Does Athena support unstructured data?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which data formats are columnar?

A

ORC and Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which data formats are splittable?

A

ORC, Parquet, Avro

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which notebooks can Athena integrate with?

A

Jupyter, Zeppelin, RStudio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the cost rate for Athena?

A

$5 per TB scanned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Do cancelled queries count toward Athena charges?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Do failed queries count toward Athena charges?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What data format will be the most cost effective in Athena?

A

Columnar (ORC, Parquet)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Does Athena charge for DDL processing?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How can Athena results be encrypted?
Encrypt at rest in S3 using SSE-S3, SSE-KMS, CSE-KMS
26
Can Athena access S3 in another account?
Yes
27
How are Athena results encrypted in transit?
Transport Layer Security (TLS)
28
Is Redshift Serverless or Fully Managed?
Fully Managed?
29
What is the maximum number of compute nodes in a Redshift cluster?
128
30
What are the two types of compute nodes that can be selected for a Redshift cluster?
``` Dense Storage (DS) - uses HDDs for large size at low cost Dense Compute (DC) - uses SSD and lots of memory for faster performance at a higher cost ```
31
How many HDDs on an ds2.xlarge Redshift compute node?
3 for a total of 2TB storage
32
How many HDDs on an ds2.8xlarge Redshift compute node?
24 for a total of 16TB storage
33
How many SSDs on an dc2.large Redshift compute node?
160GB SSD storage, 15GB RAM
34
How many SSDs on an dc2.8xlarge Redshift compute node?
2.6TB SSD, 244GB RAM
35
What determines the number of Node Slices on a Compute Node?
The size of the Compute Node
36
What kind of data storage does Redshift use for high performance?
Columnar
37
Can you change the compression encoding for a column after a table is created in Redshift?
No
38
How many copies of your data is stored within Redhisft?
Three - one main on cluster, one backup on cluster, one snapshot in S3
39
Can Redshift data be backed up to another region?
Yes - asynchronously in S3
40
How many AZs is Redshift limited to?
One
41
What is the default Redshift distribution style
AUTO
42
What is the EVEN Redshift distribution style?
Steps through each slice and assigns data in round-robin fashion
43
What is the KEY Redshift distribution style?
Assigns data to each slice based on a selected key column. Ideal if you plan to query data on a specific column.
44
What is the ALL Redshift distribution style?
All data is replicated on every node in the cluster. Multiplies storage by the number of nodes in the cluster.
45
What are Redshift Sort Keys?
Similar to an index, makes for fast range queries
46
What are the three types of Redshift Sort Keys?
Single, Compound, Interleaved
47
What is the default types of Redshift Sort Key?
Compound
48
Does the order of Compound Sort Keys matter in Redshift?
Yes - first will be primary
49
What is required when performing COPY from S3 to Redshift?
Manifest File and IAM role
50
What is the command to copy Redshift data into S3?
UNLOAD
51
How can you configure S3 to Redshift connections without going over public internet?
Enhanced VPC routing
52
Can COPY decrypt S3 data as it is loaded into Redshift?
Yes, using hardware accelerated SSL
53
If loading a tall but narrow table to Redshift, what should you attempt to do for efficiency?
Try to use only one COPY command (metadata is added for each COPY command)
54
How do you copy a Redshift snapshot to another region?
1. Create KMS Key in destination region 2. Specify unique name for your snapshot copy grant 3. Specify the KMS Key for which you're creating the copy grant 4. In the source region, Enable copying of snapshots to the copy grant you created
55
What is Redshift DBLINK?
Connects Redshift to PostgreSQL (which could be on RDS) | MUST be in the same Availability Zone
56
Can data be imported from DynamoDB to Redshift?
Yes
57
What is Redshift Workload Management (WLM)?
Prioritizes short, fast queries vs long, slow queries
58
How can you configure Redshift WLM?
Redshift Console, CLI, or API
59
What is Redshift Concurrency Scaling?
Automatically adds cluster capacity to handle increases in concurrent read queries
60
How do Redshift WLM and Concurrency Scaling interact?
WLM queues can manage which queries are sent to concurrency scaling clusters
61
How many queues can be created with Redshift Automatic WLM?
8 (default of 5)
62
Is concurrency raised or lowered on large queries in Automatic WLM?
Lowered
63
How many queues can be created with Redshift Manual WLM?
8 (default 1)
64
What is the default concurrency of the default queue in Redshift Manual WLM?
5
65
What is the maximum concurrency level in Redshift Manual WLM?
50
66
What is query queue hopping?
Timed out queries automatically hop to another queue and retry
67
What is Redshift Short Query Acceleration (SQA)?
Prioritizes short queries. Alternative to WLM.
68
What statements does Redshift SQA support?
CREATE TABLE AS, and SELECT statements
69
How does Redshift SQA predict query execution time?
Machine Learning
70
What is the Redshift VACUUM command?
Recovers space from deleted rows?
71
What are the four types of Redshift VACUUM commands?
FULL, DELETE ONLY, SORT ONLY, REINDEX (reanalyzes interleaved sort keys)
72
What is Elastic Resize in Redshift?
Quickly add or remove nodes of the same type. Low downtime. For some types, you can only double of halve the nodes.
73
What is Classic Resize in Redshift?
Change node type and/or number of nodes. Can lead to hours or days of read-only.
74
What is Redshift Snapshot, restore, resize?
Used to keep cluster available during a Classic resize. Minimizes downtime.
75
What are Redshift RA3 nodes?
Allow you scale compute and storage capacity independently
76
What is Redshift Data Lake Export?
Unloads Redshift to S3 in Parquet format
77
What are some advantages of Parquet?
2x faster, 6x smaller, automatically partitioned, compatible with many services (Spectrum, Athena, EMR, Sagemaker)
78
What does ACID stand for?
Atomicity, Consistency, Isolation, Durability
79
What port does Kibana run on?
5601
80
Do you have to use Glue Data Catalogs when using Athena?
No, you can use standard Athena Data Catalogs
81
What language does Glue's ETL engine use?
Python
82
Can Athena invoke SageMaker models?
Yes
83
Is Athena's Data Catalog Hive metastore compatible?
Yes
84
When should you use Redshift?
Many different sources, highly structured, single source of truth, stored for long periods of time, performant on large sizes of data
85
When should you use Athena?
Don't want to worry about formatting or infrastructure, quick queries for troublehsooting, ad-hoc
86
When should you use EMR?
Need a wide variety of custom processing tasks, fine grained control over your clusters, custom code
87
What is Federated query in Athena?
allows you to run SQL queries across variety of relational, non-relational, and custom data sources. A unified way to run SQL queries across various data stores.
88
Can Athena read from compressed files?
Yes
89
Can Hive Query be run on Athena?
No (only Presto is supported)
90
What is SerDe?
Serializer/Deserializer; libraries that tell Hive how to interpret data formats; also used by Athena
91
What needs to be done to add data to a partitioned table in Athena?
ALTER TABLE ADD PARTITION
92
Can Athena access an S3 bucket in another account?
Yes