Database Specialty - DynamoDB Flashcards

1
Q

Tool for Backup and restore

A

PITR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Terminology DynamoDB

A

Tables, Items, Attributes, Primary Keys, Local Secondary Indexes, Global Secondary Indexes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data Types in DynamoDB

A

Scalar, Set, Document

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Important points Read Consistency

A

Strong, Eventual and Transacional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Points Write Consistency

A

Standard and Transacional

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Modes Pricing Model

A

Provisioned and On-Demand Capacity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Types of caches in DAX

A

Item Cache and Query Cache

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Scaling Options

A

Automatic, Provisioned, Global Replication, Burst Capacity, On-Demand Capacity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Amazon DynamoDB – Overview Points

A
  • Non-relational Key-Value store
  • Fully Managed, Serverless, NoSQL database in the cloud
  • Fast, Flexible, Cost-effective, Fault Tolerant, Secure
  • Multi-region, multi-master database (Global Tables)
  • Backup and restore with PITR (Point-in-time Recovery)
  • Single-digit millisecond performance at any scale
  • In-memory caching with DAX (DynamoDB Accelerator, microsecond latency)
  • Supports CRUD (Create/Read/Update/Delete) operations through APIs
  • Supports transactions across multiple tables (ACID support)
  • No direct analytical queries (No joins)
  • Access patterns must be known ahead of time for efficient design and performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

DynamoDB Tables

A
  • Tables are top-level entities
  • No strict inter-table relationships (Independent Entities)
  • You control performance at the table level
  • Table items stored as JSON (DynamoDB-specific JSON)
  • Primary keys are mandatory, rest of the schema is flexible
  • Primary Key can be simple or composite
  • Simple Key has a single attribute (=partition key or hash key)
  • Composite Key has two attributes
    (=partition/hash key + sort/range key)
  • Non-key attributes (including secondary key attributes) are
    optional
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data Types in DynamoDB

A
  • Scalar Types
    • Exactly one value
    • e.g. string, number, binary, boolean, and null
    • Keys or index attributes only support string, number and binary scalar types
  • Set Types
    • Multiple scalar values
    • e.g. string set, number set and binary set
  • Document Types
    • Complex structure with nested attributes
    • e.g. list and map
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

AWS Global Infrastructure

A
  • Has multiple AWS Regions across
    the globe
  • Each region has one or more AZs
    (Availability Zones)
  • Each AZ has one or more
    facilities (= Data Centers)
  • DynamoDB automatically
    replicates data between multiple
    facilities within the AWS region
  • Near Real-time Replication
  • AZs act as independent failure
    domains
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

DynamoDB Consistency

A
  • Read Consistency: strong consistency, eventual consistency, and transactional
  • Write Consistency: standard and transactional
  • Strong Consistency
    • The most up-to-date data
    • Must be requested explicitly
  • Eventual Consistency
    • May or may not reflect the latest copy of
      data
    • Default consistency for all operations
    • 50% cheaper than strong consistency
  • Transactional Reads and Writes
    • For ACID support across one or more
      tables within a single AWS account and
      region
    • 2x the cost of strongly consistent reads
    • 2x the cost of standard writes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Strongly Consistent Read vs Eventually Consistent Read

A
  • Eventually Consistent Read: If we read just
    after a write, it’s possible we’ll get
    unexpected response because of replication
  • Strongly Consistent Read: If we read just
    after a write, we will get the correct data
  • By default: DynamoDB uses Eventually
    Consistent Reads, but GetItem, Query &
    Scan provide a “ConsistentRead” parameter
    you can set to True
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

DynamoDB Pricing Model - Provisioned Capacity

A
  • You pay for the capacity you provision
    (= number of reads and writes per second)
  • You can use auto-scaling to adjust the
    provisioned capacity
  • Uses Capacity Units: Read Capacity Units
    (RCUs) and Write Capacity Units (WCUs)
  • Consumption beyond provisioned capacity may
    result in throttling
  • Use Reserved Capacity for discounts over 1 or
    3-year term contracts (you’re charged a one- time fee + an houtly fee per 100 RCUs and
    WCUs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

DynamoDB Pricing Model - Provisioned Capacity - On-Demand Capacity

A

On-Demand Capacity
* You pay per request (= number of read and
write requests your application makes)

  • No need to provision capacity units
  • DynamoDB instantly accommodates your
    workloads as they ramp up or down
  • Uses Request Units: Read Request Units and
    Write Request Units
  • Cannot use reserved capacity with On-Demand mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

DynamoDB Throughput - Provisioned Capaciy mode

A
  • Uses Capacity Units
    • 1 capacity unit = 1 request/sec
  • RCUs (Read Capacity Units)
    • In blocks of 4KB, last block always rounded up
    • 1 strongly consistent table read/sec = 1 RCU
    • 2 eventually consistent table reads/sec = 1 RCU
    • 1 transactional read/sec = 2 RCUs
  • WCUs (Write Capacity Units)
    • In blocks of 1KB, last block always rounded up
    • 1 table write/sec = 1 WCU
    • 1 transactional write/sec = 2 WCUs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

DynamoDB Throughput - On-Demand Capacity mode

A
  • Uses Request Units
    • Same as Capacity Units for calculation purposes
  • Read Request Units
    • In blocks of 4KB, last block always
      rounded up
    • 1 strongly consistent table read request = 1 RRU
    • 2 eventually consistent table read request = 1 RRU
    • 1 transactional read request = 2 RRUs
  • Write Request Units
    • In blocks of 1KB, last block always rounded up
    • 1 table write request = 1 WRU
    • 1 transactional write request = 2 WRUs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Provisioned Capacity - Points

A
  • Typically used in production environment
  • Use this when you have predictable traffic
  • Consider using reserved capacity if you
    have steady and predictable traffic for
    cost savings
  • Can result in throttling when
    consumption shoots up (use auto-scaling)
  • Tends to be cost-effective as compared
    to the on-demand capacity mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

On-Demand Capacity Mode

A
  • Typically used in dev/test environments
    or for small applications
  • Use this when you have variable,
    unpredictable traffic
  • Instantly accommodates up to 2x the
    previous peak traffic on a table
  • Throttling can occur if you exceed 2x
    the previous peak within 30 minutes
  • Recommended to space traffic growth
    over at least 30 mins before driving
    more than 2x
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Example 1: Calculating Capacity Units

A

Calculate capacity units to read and write a 15KB item

  • RCUs with strong consistency:
    • 15KB/4KB = 3.75 => rounded up => 4 RCUs
  • RCUs with eventual consistency:
    • (1/2) x 4 RCUs = 2 RCUs
  • RCUs for transactional read:
    • 2 x 4 RCUs = 8 RCUs
  • WCUs:
    • 15KB/1KB = 15 WCUs
  • WCUs for transactional write:
    • 2 x 15 WCUs = 30 WCUs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Example 2: Calculating Capacity Units

A

Calculate capacity units to read and write a 1.5KB item

  • RCUs with strong consistency:
    • 1.5KB/4KB = 0.375 => rounded up => 1 RCU
  • RCUs with eventual consistency:
    • (1/2) x 1 RCUs = 0.5 RCU => rounded up = 1 RCU
  • RCUs for transactional read: * 2 x 1 RCU = 2 RCUs
  • WCUs: * 1.5KB/1KB = 1.5 => rounded up => 2 WCUs
  • WCUs for transactional write:
    • 2 x 2 WCUs = 4 WCUs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Example 3: Calculating Throughput
A DynamoDB table has provisioned capacity of 10
RCUs and 10 WCUs. Calculate the throughput that
your application can support:

A
  • Read throughput with strong consistency = 4KB x 10 = 40KB/sec
  • Read throughput (eventual) = 2 (40KB/sec) = 80KB/sec
  • Transactional read throughput = (1/2) x (40KB/sec) = 20KB/sec
  • Write throughput = 1KB x 10 = 10KB/sec
  • Transactional write throughput = (1/2) x (10KB/sec) = 5KB/sec
24
Q

DynamoDB Burst Capacity

A
  • To provide for occasional bursts
    or spikes
  • 5 minutes or 300 seconds of
    unused read and write capacity
  • Can get consumed quickly
  • Must not be relied upon
25
DynamoDB Adaptive Capacity
* Total provisioned capacity = 600 WCUs per sec * Provisioned capacity per partition = 200 WCUs per sec * Unused capacity = 200 WCUs per sec * So the hot partition can consume these unused 200 WCUs per sec above its allocated capacity * Consumption beyond this results in throttling * For Non-uniform Workloads * Works automatically and applied in real time * No Guarantees
26
DynamoDB LSI (Local Secondary Index)
* Can define up to 5 LSIs * Has same partition/hash key attribute as the primary index of the table * Has different sort/range key than the primary index of the table * Must have a sort/range key (=composite key) * Indexed items must be ≤ 10 GB * Can only be created at the time of creating the table and cannot be deleted later
27
DynamoDB GSI (Global Secondary Index)
* Can define up to 20 GSIs (soft limit) * Can have the same or different partition/hash key then the table’s primary index * Can have the same or different sort/range key then the table’s primary index * Can omit sort/range key (=simple and composite) * No size restrictions for indexed items * Can be created or deleted at any time. Can delete only one GSI at a time * Can query across partitions (over the entire table) * Support only eventual consistency * Has its own provisioned throughput * Can only query projected attributes (attributes included in the index)
28
When to choose which index? Local Secondary Indexes
* When application needs same partition key as the table * When you need to avoid additional costs * When application needs strongly consistent index reads
29
When to choose which index? Global Secondary Indexes
* When application needs different or same partition key as the table * When application needs finer throughput control * When application only needs eventually consistent index reads
30
DynamoDB Indexes and Throttling, LOCAL SECONDARY INDEXES
* Uses the WCU and RCU of the main table * No special throttling considerations
31
DynamoDB Indexes and Throttling - Global Secondary Indexes
* If the writes are throttled on the GSI, then the main table will be throttled! (even if the WCU on the main tables are fine) * Choose your GSI partition key carefully! * Assign your WCU capacity carefully!
32
Simple design patterns with DynamoDB
* You can model different entity relationships like 1:1, 1:N, N:M * Store players’ game states – 1:1 modeling, 1:N modeling * user_id as PK, game_id as SK (1:N modeling) * Players’ gaming history – 1:N modeling * user_id as PK, game_ts as SK (1:N modeling) * Gaming leaderboard – N:M modeling * GSI with game_id as PK and score as SK
33
DynamoDB Write Sharding
* Imagine we have a voting application with two candidates, candidate A and candidate B. * If we use a partition key of candidate_id, we will run into partitions issues, as we only have two partitions * Solution: add a suffix (usually random suffix, sometimes calculated suffix)
34
Error and Exceptions in DynamoDB
* Common Exceptions * Access Denied Exception * Conditional Check Failed Exception * Item Collection Size Limit Exceeded Exception * Limit Exceeded Exception * Resource In Use Exception * Validation Exception * Provisioned Throughput Exceeded Exception * Error Retries * Exponential Backoff
35
DynamoDB Partitions
* Store DynamoDB table data (physically) * Each (physical) partition = 10GB SSD volume * Not to be confused with table’s partition/hash key (which is a logical partition) * One partition can store items with multiple partition keys * A table can have multiple partitions * Number of table partitions depend on its size and provisioned capacity * Managed internally by DynamoDB * Provisioned capacity is evenly distributed across table partitions * Partitions once allocated, cannot be deallocated (important!)
36
Calculating DynamoDB Partitions
1 partition = 1000 WCUs or 3000 RCUs (Maximum supported throughput per partition) * 1 partition = 10GB of data * No. of Partitions = Either the number of partitions based on throughput or the number of partitions based on size, whichever is higher
37
Partition Behavior Example (Scaling up Capacity)
* Provisioned Capacity: 500 RCUs and 500 WCUs * Storage requirement < 10 GB * Number of Partitions: PT = ( 500 RCUs/3000 + 500 WCUs/1000) = 0.67 => rounded up => 1 partition * Say, we scale up the provisioned capacity * New Capacity: 1000 RCUs and 1000 WCUs PT = ( 1000 RCUs/3000 + 1000 WCUs/1000) = 1.33 => rounded up => 2 partitions
38
DynamoDB Scaling
* You can manually scale up provisioned capacity as and when needed * You can only scale down up to 4 times in a day * Additional one scale down if no scale downs in last 4 hours * Effectively 9 scale downs per day * Scaling affects partition behavior * Any increase in partitions on scale up will not result in decrease on scale down (Important!) * Partitions once allocated will not get deallocated later
39
DynamoDB Accelerator (DAX)
* In-Memory Caching, microsecond latency * Sits between DynamoDB and Client Application (acts a proxy) * Saves costs due to reduced read load on DynamoDB * Helps prevent hot partitions * Minimal code changes required to add DAX to your existing DynamoDB app * Supports only eventual consistency (strong consistency requests pass-through to DynamoDB) * Not for write-heavy applications * Runs inside the VPC * Multi AZ (3 nodes minimum recommended for production) * Secure (Encryption at rest with KMS, VPC, IAM, CloudTrail…)
40
DAX architecture
* DAX has two types of caches (internally) * Item Cache * Query Cache * Item cache stores results of index reads (=GetItem and BatchGetItem) * Default TTL of 5 min (specified while creating DAX cluster) * When cache becomes full, older and less popular items get removed * Query cache stores results of Query and Scan operations * Default TTL of 5 min * Updates to the Item cache or to the underlying DynamoDB table do not invalidate the query cache. So, TTL value of the query cache should be chosen accordingly.
41
DAX Operations
* Only for item level operations * Table level operations must be sent directly to DynamoDB * Only for item level operations * Table level operations must be sent directly to DynamoDB * Write Operations use write-through approach * Data is first written to DynamoDB and then to DAX, and write operation is considered as successful only if both writes are successful * You can use write-around approach to bypass DAX, e.g. for writing large amount of data, you can write directly to DynamoDB (Item cache goes out of sync)
42
DAX Operations 2
* Only for item level operations ] * Table level operations must be sent directly to DynamoDB * Write Operations use write-through approach * Data is first written to DynamoDB and then to DAX, and write operation is considered as successful only if both writes are successful * You can use write -around approach to bypass DAX, e.g. for writing large amount of data, you can write directly to DynamoDB (Item cache goes out of sync) * For reads, if DAX has the data (=Cache hit), it’s simply returned without going through DynamoDB
43
Implementing DAX
* To implement DAX, we create a DAX Cluster * DAX Cluster consists of one or more nodes (up to 10 nodes per cluster) * Each node is an instance of DAX * One node is the master node or primary node * Remaining nodes act as read replicas * DAX internally handles load balancing between these nodes * 3 nodes minimum recommended for production
44
Backup and Restore in DynamoDB
* Automatically encrypted, cataloged and easily discoverable * Highly Scalable - create or retain as many backups for tables of any size * Backup operations complete in seconds * Backups are consistent within seconds across thousands of partitions * No provisioned capacity consumption * Does not affect table performance or availability * Backups are preserved regardless of table deletion
45
Backup and Restore in DynamoDB v2
* Can backup within the same AWS region as the table * Restores can be within same region or cross region * Integrated with AWS Backup service (can create periodic backup plans) * Periodic backups can be scheduled using Lambda and CloudWatch triggers * Cannot overwrite an existing table during restore, restores can be done only to a new table (=new name) * To retain the original table name, delete the existing table before running restore * You can use IAM policies for access control
46
Backup and Restore in DynamoDB v3
* Restored table gets the same provisioned RCUs/WCUs as the source table, as recorded at the time of backup * PITR RPO = 5 minutes approx. * PITR RTO can be longer as restore operation creates a new table
47
Backup and Restore in DynamoDB v4
* What gets restored: * Table data * GSIs and LSIs (optional, you can choose) * Encryption settings (you can change) * Provisioned RCUs / WCUs (with values at the time when backup was created) * Billing mode (with value at the time when backup was created) * What you must manually set up on the restored table: * Auto scaling policies, IAM policies * CloudWatch metrics and alarms * Stream and TTL settings * Tags
48
Continuous Backups with PITR
* Restore table data to any second in the last 35 days! * Priced per GB based on the table size * If you disable PITR and re-enable it, the 35 days clock gets reset * Works with unencrypted, encrypted tables as well as global tables * Can be enabled on each local replica of a global table * If you restore a table which is part of global tables, the restored table will be an independent table (won’t be a global table anymore!) * Always restores data to a new table * What cannot be restored * Stream settings * TTL options * Autoscaling config * PITR settings * Alarms and tags * All PITR API calls get logged in CloudTrail
49
DynamoDB Encryption
Server-side Encryption at Rest * Enabled by default * Uses KMS * 256-bit AES Encryption * Can use AWS owned CMK, AWS managed CMK, or customer managed CMK * Encrypts primary key, secondary indexes, streams, global tables, backups and DAX clusters * Encryption in transit * Use VPC endpoints for applications running in a VPC * Use TLS endpoints for encrypting data in transit
50
DynamoDB Encryption Client
* For client-side encryption * Added protection with encryption in-transit * Results in end-to-end encryption * Doesn't encrypt the entire table * Encrypts the attribute values, but not the attribute names * Doesn't encrypt values of the primary key attributes * You can selectively encrypt other attribute values * You can encrypt selected items in a table, or selected attribute values in some or all items
51
DynamoDB Streams
* 24 Hours time-ordered log of all table-write activity * React to changes to DynamoDB tables in real time * Can be read by AWS Lambda, EC2, ES, Kinesis… * DynamoDB Streams are organized into shards * Records are not retroactively populated in a stream after enabling it * Simply enable streams from DynamoDB console
52
DynamoDB Streams - supported views - Keys only
captures only the key attributes of the changed item
53
DynamoDB Streams - supported views - New image
captures the entire item after changes
54
DynamoDB Streams - supported views - Old image
captures the entire item before changes
55
DynamoDB Streams - supported views - New and old images
captures the entire item before and after changes