Tutorial Dojo Flashcards by Aron Dinneen

AWS Data exchange

-3rd party datasets in s3
-accessed via API GetDataSet

How well did you know this?

Not at all

Perfectly

Redshift concurrent scaling
and workload management

-handles concurrent users/unpredictability like BI workloads
-WLM can set query priority
-WLM up to 8 queues & eachqueue max concurrency of 50

How well did you know this?

Not at all

Perfectly

Athena workgroups

-organize and manage queries
-can use Apache spark for analytics
-security and access control

How well did you know this?

Not at all

Perfectly

AWS datasync

-on-prem to aws file storage like s3

How well did you know this?

Not at all

Perfectly

S3 event notification

-event type=ObjectCreated for example
-can trigger lambda based off for example suffix .csv

How well did you know this?

Not at all

Perfectly

AWS Glue for Ray

-job type for scale AI and Python and native library
-ray dataset based on Apache arrow

How well did you know this?

Not at all

Perfectly

Aws managed services for Apache flink

-for real time, time series analysis
-sliding window for intervals or overlapping

How well did you know this?

Not at all

Perfectly

Object lambda

-add code to get request enables real-time transformation as data retrieved
-on the fly

How well did you know this?

Not at all

Perfectly

Aws Graviton instance

-custom aws for best price performance for workloads

How well did you know this?

Not at all

Perfectly

Lambda provisioned concurrency

-setting scale without latency

How well did you know this?

Not at all

Perfectly

Redshift data sharing

-share read access across clusters, workgroups, accounts, regions
-live data

How well did you know this?

Not at all

Perfectly

-what to check state machine fails to start at a step?

state machines iam role

How well did you know this?

Not at all

Perfectly

Glue’s sensitive data detection feature

-auto recognize PII AND redact

How well did you know this?

Not at all

Perfectly

S3 VPC gateway endpoint

-specify as target route in route table for traffic destined to s3

How well did you know this?

Not at all

Perfectly

Athena federated query

-connectors using lambda
-nosql, sql, timestream, etc

How well did you know this?

Not at all

Perfectly

Kenisis reporting with redshift real time

-create external schema for data stream
-materialized view referencing schema w auto refresh

How well did you know this?

Not at all

Perfectly

Stl_alert_event_log

Study These Flashcards

-redshift view help identify performance issues and solution

Glue resource policy

Study These Flashcards

-think finance and hr running own etl and access own dbs

S3 access point

Study These Flashcards

-for multiple application access
-for cross-account access
-works with bucket policy

MSCK Repair table

Study These Flashcards

-Athena query when new data added to existing partition
-makes new partitions visible but does not necessarily speed up performance

EFS and lambda

Study These Flashcards

-mounts to efs seamlessly

Improve kinesis Performance when processing

Study These Flashcards

-add shards
-config parallel satin
-reg lambda func as consumer w enhanced fan-out
-exponential backoff and retry?

Glue catalog partition predicates (frame)
&
Push down predicate

Study These Flashcards

-server side filtering during frame creation (before data even loaded)
-faster than client side where data loaded in memory

-push down is similar but no mention of partition

Transient EMR clusters

Study These Flashcards

-think batch jobs
-cluster created then terminated after

SQS settings

DelaySeconds -how long before visible in queue VisibilityTimeout -prevents multiple receive/processed MaxRecieveCount -amt of times a msg can be received before deleted

Athena notebooks

-interactive python coding environment -execute spark code visually

CloudWatch container insights

-for microservices and container apps

OpenSearch storage

-hot = fastest access expensive -ultra warm = less accessed cheaper -cold = infrequent access can attach to ultra warm

Stored proc and aurora

-can run proc in aurora to trigger lambda when loan is approved for example.

MSK kafka ACLs

-microservices -which apps read/write diff topics

Cloud trail data events vs management events

-data events = executions/s3 put example.. -management events = deleting resources

Glue DataBrew masking techniques

-substitution = aron changed to donny -probablistic = different ciphertext each time -nulling deleting

Athena Partition projection

-helps query performance focusing on subsets -good to run when already partitioned and data is growing.

Redshift distribution style

-EVEN=rows even across node. Good when no joins/no clear dist key -KEY=rows w same key stored together. Good for query frequently filter or joined on spec column -ALL = full copy to each node. Best for small static table -AUTO = may change over time or not clear

Sagemaker canvas

-no-code visual canvas. -simplifies whole process from cleaning to prediction

Redshift vacuum commands

VACUUM FULL -same as vacuum VACUUM DELETE ONLY -doesn’t speed up performance just reclaims disc space VACUUM REINDEX -analyze interleaved sort key and performs vacuum. VACUUM SORT ONLY -sorts w out reclaiming disk. Used when rows unsourced but space not an issue

Sagemaker workflows/lineage tracking

-save steps in workflow -visually think step function editor

CloudWatch contributor insights & dynamodb

-view of dynamodb traffic trends

Dynamodb cardinality key

-when throttling issues use high cardinality key, so more evenly distributed. -for hot partition issues

RDS performance insights

-gathers performance metrics

Tutorial Dojo Flashcards

(42 cards)