DynamoDB Flashcards

1
Q

What are some NoSQL characteristics?

A

NoSQL dbs are distributed

NoSQL dbs do NOT support join

NoSQL dbs do not poerform aggregations such as sum

NoSQL dbs scale horizontally

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some nice features about DynamoDB?

A

Fully managed NoSQL dbms highly available with replications across 3 AZs

Distributed databade

Scales to massive workloads

Millions of requrests per second, trillions of rows, 100 TBs of storage

Fast and consistent in performance (low latency retrieval)

Integrated with IAM for sceurity, authorization, administration

Enables event-driven programming with Dynamo DB Streams

Low cost and auto-scaling capabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some Dynamo DB Table properties?

A

Each table has a primary key, that must be chosen at creation time

Each table can have an infinite number of items aka rows

Each item has attributes that are added over time and can be null

Maxiumum size of an item is 400 KB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What data types can a Dynamo DB item attribute have?

A

Scalar types: String, Number, Binary, Boolean, NULL

Document Type: List, Map

Set Types: String Set, Number Set, Binary Set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What options does Dynamo DB offer as Primary Keys?

A

Option 1:

Partition Key Only (HASH):

Partition Key must be unique for each item

Partition Key must be as diverse as possible to distribute the data

Option 2:

Partition Key + Sort Key:

The combination of the two must be unique

Data is grouped by partition key

Data is sorted after the partition key by the sort key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some feature of Dynamo DBs Provisioned Throughput?

A

Tables must have provisioned RCUs and WCUs

An option exists to auto-scale throughput on demand

Throughput can be exceeded temporarily with “burst credit”

However, after all “burst credt” is used up, a ProvisionedThroughputException is returned

it’s then advied to an exponential backup recovery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the formula of WCU?

A

One WCU corrsponds to 1 write per second for an item up to 1 kb in size

10 objects per second, 2 kb each => 10*2=20 WCUs

6 objects per second, 4.5 kb each => 6*5=30 WCUs

120 objects per minute, each 2 kb => (120/60)*2=4 WCUs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What types of Reads does Dynamo DB offer?

A

Eventually Consistent Read:

If we read just after a write, we could get an unexpected response due to replication

Strongly Consistent Read:

If we read just after a wrist, we will get the correct data

Default:

Eventually Consistent Read

but

GetItem, Query and Scan provide a ConsistentRead parameter that can be set to True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the formula for RCUs?

A

Depends on read option

One RCU equals:

2 Eventually Consistent Reads per second for a file up to 4 kb in size

1 Strongly Consistent Reads per second for a file up to 4 kb in size

10 Strongly Consistent Reads per second for file of 4 kb size => 10*4/4= 10 RCUs

16 Eventually Consistent Reads per second for a file of 12 kb each => (16/2)*(12/4)= 24 RCUs

10 Strongly Consistent Reads per second for file of 6 kb size => 10*ceil(6/4)= 20 RCU

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Is Dynamo DB data divided into partitions?

A

Yes.

Partition Keys go through a hashing algorithm to know to which partition they belong to

WCUs and RCUs are spread evenly across partitions

To compute the number of partitions:

by capacity: (TOTAL RCU/3000) + (TOTAL WCU/3000)

by size: Total Size/10gb

Total Partitions CEIL(MAX(capacity, size))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Throttling in Dynamo DB?

A

ProvisionedThroughputExceededException is received if RCUs or WCUs are exceeded

Reasons:

Hot Keys: One partition key is being read too many times

Hot Partitions:

Very large items

SOlutions:

Exponential backoff if exception is encountered (already in SDK)

Distribute partition keys as much as possible

If RCU issue, we can use Dynamo DB (Accelerator) DAX

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What ways can you write data to Dynamo DB?

A

PutItem: Consumes WCU - create data or full replace

UpdateItem: partial update of attributes - Can use and increase Atomic Counters

Conditional Writs: Distributed system can write same row at same time - write condition such that write or update has to fullfil it to write to the table - no performance impact - helps with concurrent acces to items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What ways can you delete data in Dynamo DB?

A

DeleteItem: delete individual? row - ability to perform conditional delete

DeleteTable: delete a whole table and all its items - quicker deletion than calling DeleteItem on all items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What way can you batch-write data to DynamoDB?

A

BatchWriteItem: up to 25 PutItem or DeleteItem in one call - up to 16 mb of data written - up to 400 kb of data per item

Batching allows to reduce latency by reducing the number of API calls done against Dynamo DB

Operations are done in parallel for better efficiency

In case a part of the batch fails we have to retry using exponential back-off algorithm - up to me to perfmorm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to read data from a Dynamo DB table?

A

GetItem: read based on primary key - primary key = HASH or HASH-RANGE - by default eventually consistent read - option to use strongly consistent read which might take longer and consumes more RCU - ProjectExpression can be specified to include only specific attributes

BatchGetItem: up to 100 items - up to 16 mb of data - done in parallel to minimize latency

Query: returns items based on partition key (must be ‘=’ operator) - optional: sort key (=, >=,<=, , Between, Begin) - FilterExpression to furhter filter from the client side - returns up to 1 mb of data - can use LIMIT - can query an index, local secondary index and, global secondary index - pagination

Scan: scan entire table then filters - returns up to 1 mb of data - use pagination to keep on reading - consumes a lot of RCU - can use LIMIT - for better performance use parrallel scans (more RCUs more thourghput, multiple machines scan multiple partitions) - can use ProjectionExpression and FIlterExpression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a Local Secondary Index in Dynamo DB?

A

ALternate range key for a table - local to the hash key

up to five LSIs per table

the sort key consists of one scalar value

the attribute chosen has to a scalar string, number, or binary

LSI must be defined at creation time

17
Q

What is a Global Secondary Index?

A

To speed up queries on non-key attributes use GSI

GSI = partition key + optional sort key

The index is a “new” table and we can project attributes on it

Partition and sort key of the original table are always protected (KEYS_ONLY)

Can specify extra attributes to project (INCLUDE)

Can use all attributes from the original table (ALL)

Must define RCU/WCU for GSI

Unlike LSI GSI can be modified/created

18
Q

What is the connection between Indexes and Throttling?

A

If the writes to the GSI are throttled than the main table will be throttled as well, even though its WCUs are fine => CHoose GSI partition key and the WCUs carefully

LSI uses the same WCU and RCU as the main table

19
Q

What is the Dynamo DB Concurrency model?

A

Dynamo DB has conditional updates/deletes

You can ensure an item hasn’t changed before altering it

that makes Dynamo DB an optimsitc locking/concurrency database

Versions of updates are updated, if version e.g. increased then update is denied

20
Q

What is Dynamo DB DAX?

A

Seemless Caching for Dynamo DB - no application re-write

Writes go through DAX to Dynamo DB

Micro Second Latency for cached reads & writes

Solves the Hot Key problem (too many reads) - Prevents Throttling

5 minutes TTL of cached content

up to 10 noded in the DAX cluster

Multi AZ

Secure (Encryption at rest with KMS, VPC, IAM, CloudTrail)

21
Q

DAX vs. ElastiCache

A

DAX:

Individual, query or scan caches are perfect for DAX, very quick

ElastiCache:

Is not better than DAX at what DAX does

however, good to cache aggregation results in

22
Q

What is Dynamo DB Streams?

A

Changes (create, update, delete) can end up in a Dynamo DB stream

This stream can be read by Lambda or EC2, as triggers, and then could:

react to changes in realtime (emails) - Analytics - Create derivative tables/views - Insert into ElasticSearch

Streams can implemented cross region replication

Streams has 24 hours of data retention

Writes logs to CloudWatch Log Groups

23
Q

How does Dynamo DB Streams work?

A

CHoose what will be written to the stream whenever the data is modified:

KEYS_ONLY: Only the key attributes of the modfied item

NEW_IMAGE: THe entire item after it was modified

OLD_IMAGE: The entire item before it was modified

NEW_AND_OLD_IMAGE: New and old images of the item

Dynamo DB Streams are made of shard, like Kinesis Data Stream

Records are not retroactively poplutaed after enabling Streams

Unlike Kinesis Data Stream, shards are provisioned by AWS with Streams

24
Q

How to configure Streams with Lambda?

A

In lambda define an Event Source Mapping to read from a Dynamo DB Stream

The ESM is polling from the Dynamo DB Stream and Dynamo DB returns event batches to the ESM

After receiving a batch, lambda is invoked synchronously with the Event Batch

make sure Lambda has the required permissions AWS LambdaDynamoDBExeceutionRole

25
Q

What is Dynamo DB TTL?

A

Time To Live

delete an item after a specific time, date

No extra costs

no impact on WCU RCU

Background task performed by DynmaoDB

helps reduce storage and manage table size over time

helps with regulations

is enabled per row, we define a TTL column and a epoch timestamp

expired items are usually deleted after 48 hours

items are also deleted in GSI and LSI

Streams can help recover deleted items

26
Q

What are some good DynamoDB CLI commands to know?

A

–projection-expression: attributes to retrieve

–filter-expression: filter results (use also –expression-attribute-values)

CLI pagination options:

Optimization:

–page-size: full dataset is still received but each API call requests less data - avoids timeouts

Pagination:

–max-items: max number of results to return from the CLI, return NextToken

–starting token: specify the last received NextToken to keep on reading

27
Q

What is DynamoDB Transactions?

A

Ability to create, update, delete multiple rows in multiple tables at once

All or nothing operation - either everything succeed or nothing

Transactional is new write and a new rread mode

Consumes 2x WCUs/RCUs

API names: TransactWriteItems/TransactGetItems

28
Q

How to save Session State Cache in DynmaoDB?

A

vs. ElastiCache:

EC is in-memeory

DynamoDB is serverless (automatic scaling)

both are key-value stores

vs. EFS:

EFS must be attached to e.g. EC2 not lambda

vs EBS and Instance store:

thse two only store locally not shared

vs. S3:

S3 is higher latency, and made for bigger objects

29
Q

What is DynamoDB Write Sharding?

A

Add a random suffix to the partition key to scake the writes across many shardes

30
Q

What are all dynmaoDB write types?

A

Concurrent Writes:

First write will be overwritten by second write

Conditional Write:

Write is bound to condition, after first write, second might be declined because condition state changed after first write

Atomic Writes:

Includes an INCREASE_BY or DECREASE_BY in the request

both requests succeed in writing and aggregation

batch Writes:

Write many items at a time

31
Q

How to use DynamoDB pattern with S3?

A

Large objects can be written to DynamoDB:

First it is uploaded to S3 and

the metadata is written to DynamoDB , eg location, id , key

how to search for those files?

After object is uploaded to S3 and lambda function writes the metadata to DynamoDB

32
Q

what are DynamoDB Operations?

A

Table CleanUp:

Option 1: Scan + delete, slow consumes many RCus and WCus

Option 2: Drop teable, recreate table, fast, cheao, efficient

Copying a DynamoDB Table:

Option 1: Use AWS Pipeline & EMR

Option 2: Create a backup and restore the backup, can take some time

Option 3: Scan + Write => requires own code

33
Q

DynamoDB Security and other features

A

Security:

VPC endpoints are available to access DynamoDB without internet

Access is controlled by IAM

Users identified by federation can temporary tokens to access Dynamo DB data, attached IAM role can include Condition with LeadingKeys attribute to limit access by primary keys and Attributed to limit retrievable columns

Encryption at rest using KMS

Encryption in transit using SSL/TLS

Backup and resote features available:

Point in time restore like RDS

no performace impact

Global Tables:

Multi region (option?), fuly replicated, high performance

Migration:

Amazon DMS can be used to migrate to DynmaoDB from e.g. MongoDB

A local DynamoDB instance can be used for developemt