Tutorial Dojo Flashcards

1
Q

AWS Data exchange

A

-3rd party datasets in s3
-accessed via API GetDataSet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Redshift concurrent scaling
and workload management

A

-handles concurrent users/unpredictability like BI workloads
-WLM can set query priority
-WLM up to 8 queues & eachqueue max concurrency of 50

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Athena workgroups

A

-organize and manage queries
-can use Apache spark for analytics
-security and access control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

AWS datasync

A

-on-prem to aws file storage like s3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

S3 event notification

A

-event type=ObjectCreated for example
-can trigger lambda based off for example suffix .csv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

AWS Glue for Ray

A

-job type for scale AI and Python and native library
-ray dataset based on Apache arrow

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Aws managed services for Apache flink

A

-for real time, time series analysis
-sliding window for intervals or overlapping

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Object lambda

A

-add code to get request enables real-time transformation as data retrieved
-on the fly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Aws Graviton instance

A

-custom aws for best price performance for workloads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Lambda provisioned concurrency

A

-setting scale without latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Redshift data sharing

A

-share read access across clusters, workgroups, accounts, regions
-live data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

-what to check state machine fails to start at a step?

A
  • state machines iam role
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Glue’s sensitive data detection feature

A

-auto recognize PII AND redact

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

S3 VPC gateway endpoint

A

-specify as target route in route table for traffic destined to s3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Athena federated query

A

-connectors using lambda
-nosql, sql, timestream, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Kenisis reporting with redshift real time

A

-create external schema for data stream
-materialized view referencing schema w auto refresh

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Stl_alert_event_log

A

-redshift view help identify performance issues and solution

18
Q

Glue resource policy

A

-think finance and hr running own etl and access own dbs

19
Q

S3 access point

A

-for multiple application access
-for cross-account access
-works with bucket policy

20
Q

MSCK Repair table

A

-Athena query when new data added to existing partition
-makes new partitions visible but does not necessarily speed up performance

21
Q

EFS and lambda

A

-mounts to efs seamlessly

22
Q

Improve kinesis Performance when processing

A

-add shards
-config parallel satin
-reg lambda func as consumer w enhanced fan-out
-exponential backoff and retry?

23
Q

Glue catalog partition predicates (frame)
&
Push down predicate

A

-server side filtering during frame creation (before data even loaded)
-faster than client side where data loaded in memory

-push down is similar but no mention of partition

24
Q

Transient EMR clusters

A

-think batch jobs
-cluster created then terminated after

25
26
SQS settings
DelaySeconds -how long before visible in queue VisibilityTimeout -prevents multiple receive/processed MaxRecieveCount -amt of times a msg can be received before deleted
27
Athena notebooks
-interactive python coding environment -execute spark code visually
28
CloudWatch container insights
-for microservices and container apps
29
OpenSearch storage
-hot = fastest access expensive -ultra warm = less accessed cheaper -cold = infrequent access can attach to ultra warm
30
Stored proc and aurora
-can run proc in aurora to trigger lambda when loan is approved for example.
31
MSK kafka ACLs
-microservices -which apps read/write diff topics
32
Cloud trail data events vs management events
-data events = executions/s3 put example.. -management events = deleting resources
33
Glue DataBrew masking techniques
-substitution = aron changed to donny -probablistic = different ciphertext each time -nulling deleting
34
35
Athena Partition projection
-helps query performance focusing on subsets -good to run when already partitioned and data is growing.
36
Redshift distribution style
-EVEN=rows even across node. Good when no joins/no clear dist key -KEY=rows w same key stored together. Good for query frequently filter or joined on spec column -ALL = full copy to each node. Best for small static table -AUTO = may change over time or not clear
37
Sagemaker canvas
-no-code visual canvas. -simplifies whole process from cleaning to prediction
38
Redshift vacuum commands
VACUUM FULL -same as vacuum VACUUM DELETE ONLY -doesn’t speed up performance just reclaims disc space VACUUM REINDEX -analyze interleaved sort key and performs vacuum. VACUUM SORT ONLY -sorts w out reclaiming disk. Used when rows unsourced but space not an issue
39
Sagemaker workflows/lineage tracking
-save steps in workflow -visually think step function editor
40
CloudWatch contributor insights & dynamodb
-view of dynamodb traffic trends
41
Dynamodb cardinality key
-when throttling issues use high cardinality key, so more evenly distributed. -for hot partition issues
42
RDS performance insights
-gathers performance metrics