DynamoDB Flashcards
(33 cards)
What is DynamoDB?
DynamoDB is a no sql database which is fully managed. It’s highly available with replication across multiple AZ
It scales to massive workloads, millions of requests per second, trillions of rows, hundreds of TB of storage
Key Features:
• NoSQL Database: Supports key-value and document data models.
• Managed Service: No need to manage hardware, setup, or maintenance.
• High Performance: Offers single-digit millisecond latency at any scale.
• Scalable: Automatically scales throughput capacity and storage.
• Durable and Highly Available: Data is replicated across multiple Availability Zones (AZs).
• Serverless: No server provisioning required. Integrated with AWS Lambda for event-driven apps.
• Security: Fine-grained access control using IAM, encryption at rest, and in transit.
⸻
Core Components:
• Tables: Collection of items (similar to rows).
• Items: Individual records (can have different attributes).
• Attributes: Key-value pairs in each item.
• Primary Key: Uniquely identifies each item (can be just a partition key or partition + sort key).
• Indexes:
• Global Secondary Index (GSI): Query flexibility using alternate keys.
• Local Secondary Index (LSI): Same partition key, different sort key.
What are primary keys in DynamoDB and how to choose them ?
- DynamoDB is made of tables
- Each table has a primary key (must be decided at creation time)
- Each table has an infinite number of items (called rows)
- Each item has attributes (can be added over time, could be null)
- Maximum size of an item is 400 KB
- Data types supported are -
Scalar (String, Number, Binary, Boolean, Null)
Document Type (List, Map)
Set Types (String Set, Binary Set, Number Set)
Types of Primary Keys in DynamoDB
1. Simple Primary Key (Partition Key only)
• Format: { partition_key }
• All items must have a unique partition key.
• Good for lookups where each key maps to one item.
2. Composite Primary Key (Partition Key + Sort Key)
• Format: { partition_key, sort_key }
• Partition key groups items; sort key allows range queries within the group.
• Enables storing multiple related items under the same partition key.
⸻
How to Choose a Good Partition Key
1. Uniform Distribution
• Choose a partition key that spreads data evenly across partitions.
• Avoid hot partitions (where one key gets most of the traffic).
2. Access Pattern Awareness
• Design keys based on how your app reads/writes data.
• Ask: “What are the most common queries?” and build keys to support them efficiently.
3. High Cardinality
• Pick a key with many unique values to ensure even distribution.
• E.g., user_id, order_id, device_id are usually good choices.
⸻
When to Use Sort Key
• You need range queries (e.g., fetch orders by date for a customer).
• You want to group related items under a single partition key.
• Example: partition_key = user_id, sort_key = timestamp
⸻
Design Tips
• Avoid hot keys: If one key (e.g., “India”) is accessed too frequently, performance will suffer.
• Consider synthetic keys: If natural keys don’t scale well, concatenate values like user_id#date to ensure uniqueness and distribution.
• Model for read patterns: Think in terms of what exact queries your application needs to run.
What are different options available to configure Throughput for DynamoDB? What are different units to measure Throughput of DynamoDB ?
Two options
1. Provisioned
We will have to calculate RCU and WCU and provision them beforehand
- On Demand
No need to provision, can specify beforehand
How to calculate WCU and RCU wrt DynamoDB ?
WCU (Write Capacity Unit) and RCU (Read Capacity Unit) represent how you provision throughput capacity for your tables when using provisioned mode.
- Write Capacity Unit (WCU)
• Definition: 1 WCU = 1 write per second for an item up to 1 KB in size.
• If item > 1 KB: Each additional KB needs 1 more WCU.
• Example:
• Writing a 2 KB item = 2 WCUs
• Writing 10 0.5 KB items/sec = 5 WCUs
⸻
- Read Capacity Unit (RCU)
• Definition:
• Strongly consistent read: 1 RCU = 1 read/sec for an item up to 4 KB
• Eventually consistent read: 1 RCU = 2 reads/sec for 4 KB item
• Transactional read: 1 RCU = 1 read/sec for 4 KB, but at double the cost (uses 2 RCUs)
• If item > 4 KB: Each 4 KB chunk needs 1 more RCU.
• Example:
• Strongly consistent read of 8 KB item = 2 RCUs
• Eventually consistent read of 8 KB item = 1 RCU
What are DynamoDB APIs for writing data?
PutItem
• Creates a new item or fully replaces an existing item with the same Primary Key
• Consumes WCUs (Write Capacity Units)
UpdateItem
• Edits an existing item’s attributes or adds a new item if it doesn’t exist
• Can be used to implement Atomic Counters – a numeric attribute that is incremented unconditionally
Conditional Writes
• Accepts write/update/delete only if conditions are met; otherwise returns an error
• Helps with concurrent access to items
What are DynamoDB APIs for reading data?
- DynamoDB – Reading Data (Query)
• Query returns items based on:
• KeyConditionExpression:
• Partition Key value (must use = operator) – required
• Sort Key value (=, <, <=, >, >=, Between, Begins with) – optional
• FilterExpression:
• Additional filtering after the query operation (before data is returned)
• Use only with non-key attributes (does not allow HASH or RANGE attributes)
• Returns:
• The number of items specified in Limit
• Or up to 1 MB of data
• Supports pagination on the results
• Can query:
• Table
• Local Secondary Index (LSI)
• Global Secondary Index (GSI)
⸻
- DynamoDB – Reading Data (GetItem)
• GetItem
• Read based on Primary Key
• Primary Key can be HASH or HASH + RANGE
• Default is Eventually Consistent Read
• Option for Strongly Consistent Reads (uses more RCUs, may take longer)
• Use ProjectionExpression to retrieve only specific attributes - DynamoDB – Reading Data (Scan)
• Scans entire table and then filters (inefficient)
• Returns up to 1 MB; use pagination
• Consumes high RCU
• Use Limit to control scan size
• Parallel Scan improves throughput
• Use ProjectionExpression and FilterExpression (no change in RCU)
What are DynamoDB APIs for deleting data?
• DeleteItem: Delete individual item (supports conditional delete)
• DeleteTable: Delete entire table (faster than deleting items one by one)
What are DynamoDB APIs for batch operations?
• Reduces latency by minimizing API calls
• Parallel execution improves efficiency
BatchWriteItem:
• Up to 25 PutItem/DeleteItem
• Up to 16 MB total or 400 KB per item
• Cannot update items
• Use UnprocessedItems for retries
BatchGetItem:
• Up to 100 items or 16 MB total
• Parallel reads
• Use UnprocessedKeys for retries
What is DynamoDB – PartiQL
• SQL-compatible query language for DynamoDB
• Allows select, insert, update, delete using SQL
• Run queries across multiple tables
• Run PartiQL queries from:
• AWS Management Console
• NoSQL Workbench for DynamoDB
• DynamoDB APIs
• AWS CLI
• AWS SDK
What is DynamoDB – Conditional Writes?
• For PutItem, UpdateItem, DeleteItem, and BatchWriteItem
• You can specify a Condition expression to determine which items should be modified:
- attribute_exists
- attribute_not_exists
- attribute_type
- contains (for string)
- begins_with (for string)
- ProductCategory IN (:cat1, :cat2) and Price between :low and :high
- size (string length)
Conditional Writes – provide Example on Update Item
aws dynamodb update-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–update-expression “SET Price = Price - :discount” \
–condition-expression “Price > :limit” \
–expression-attribute-values file://values.json
————————// values.json //————————
{
“:discount”: { “N”: “150” },
“:limit”: { “N”: “500” }
}
Conditional Writes – provide Example on Delete Item
attribute_not_exists – only succeeds if the attribute doesn’t exist
aws dynamodb delete-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–condition-expression “attribute_not_exists(Price)”
attribute_exists – opposite of attribute_not_exists
aws dynamodb delete-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–condition-expression “attribute_exists(ProductReviews.OneStar)”
Conditional Writes – provide Example Complex Condition
aws dynamodb delete-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–condition-expression “(ProductCategory IN (:cat1, :cat2)) and (Price between :lo and :hi)” \
–expression-attribute-values file://values.json
————————// values.json //————————
Conditional Writes – provide Example of String Comparisons?
begins_with – check if prefix matches
contains – check if string is contained in another string
aws dynamodb delete-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–condition-expression “begins_with(Pictures.FrontView, :v_sub)” \
–expression-attribute-values file://values.json
———————————// values.json //———————————
What are the DynamoDB indexes?
- Local Secondary Index (LSI)
Definition:
An LSI is an index that has the same partition key as the base table but a different sort key.
Use Case:
• When you need multiple sort key options for the same partition key.
• For example, if your base table is UserID (partition key) and OrderDate (sort key), you can create an LSI with UserID and OrderAmount to sort orders by amount instead.
Key Points:
• Defined at table creation.
• Strongly consistent reads supported.
• Shares the same partition space as the base table (same capacity units and storage limits).
⸻
- Global Secondary Index (GSI)
Definition:
A GSI can have completely different partition and sort keys from the base table.
Use Case:
• When you want to query the table based on other attributes, not the original primary key.
• For example, if your base table has UserID as the primary key, and you want to query by Email, you can create a GSI with Email as the partition key.
Key Points:
• Can be created any time (not limited to table creation).
• Supports eventual consistency by default (strong consistency not supported).
• Separate throughput and storage from the base table.
What are throttling considerations for DynamoDB indexes
Throttling in DynamoDB happens when your table or index exceeds the provisioned or burstable read/write throughput capacity. This can affect both main tables and indexes (LSIs and GSIs). Here’s a breakdown of the throttling considerations specifically for DynamoDB indexes:
⸻
- Global Secondary Index (GSI) Throttling
• Separate Capacity:
GSIs have their own read and write capacity settings (if provisioned mode is used). If you don’t provision enough capacity, the GSI can throttle even if the base table is healthy.
• Write Throttling Risk:
Every write to the base table that includes GSI keys results in a write to the GSI. If the GSI write capacity is lower than the base table’s, the GSI will throttle and the base table write will also fail.
• Burst Traffic:
GSIs can be overwhelmed by sudden spikes in writes. Using on-demand capacity or enabling auto-scaling can mitigate this.
⸻
- Local Secondary Index (LSI) Throttling
• Shared Capacity:
LSIs share the same provisioned capacity and partition space as the base table. So:
• High query traffic on an LSI can consume the base table’s read capacity.
• Writes to LSIs are tied to the base table’s write capacity.
• Write Size Impact:
Each LSI copy consumes additional write capacity units (WCU) based on the size of the indexed attributes. Large items will consume more WCUs.
What is DynamoDB PartiQL (pronounced “particle”)
DynamoDB PartiQL is a SQL-compatible query language that allows you to interact with Amazon DynamoDB using familiar SQL-like syntax. It simplifies data operations like reading, inserting, updating, and deleting items without needing to use complex DynamoDB-specific APIs.
PartiQL works with:
• Tables
• Global Secondary Indexes (GSI)
• Local Secondary Indexes (LSI)
⸻
Key Features of PartiQL
• SQL-like syntax for NoSQL data
• Works on JSON-style DynamoDB items
• No need to write ExpressionAttributeNames or ExpressionAttributeValues
• Compatible with DynamoDB’s standard APIs (via SDK, CLI, or Console)
Important PartiQL Statements in DynamoDB
- SELECT – Read items
- INSERT – Add new items
- UPDATE – Modify existing items
- DELETE – Remove items
What is optimistic locking in DynamoDB?
DynamoDB has a feature called Conditional Writes
Optimistic locking in DynamoDB is a strategy to prevent concurrent write conflicts when multiple users or processes attempt to update the same item simultaneously.
It works by using a special attribute—usually called a version number—that tracks changes to an item. Before updating the item, the client checks that the version hasn’t changed. If it has, the update is rejected, preventing unintended overwrites.
What is DynamoDB accelerator or DAX?
DynamoDB Accelerator (DAX) is a fully managed, in-memory caching service designed to speed up read operations for DynamoDB tables by 10x, reducing read latency from milliseconds to microseconds.
DAX is especially useful for read-heavy and read-latency-sensitive applications.
Key Features of DAX
• Microsecond latency for cached reads
• Fully managed by AWS — handles replication, failover, patching
• Compatible with existing DynamoDB API calls (uses same SDK with minimal changes)
• Write-through caching: Data is written to both DAX and DynamoDB
• Highly available: Supports clustering and automatic failover
What are DynamoDB streams?
DynamoDB streams are an ordered streams of item level modifications in a table (create/update/delete)
Stream records can be
1. Sent to Kinesis Data Streams
2. Read by AWS Lambda
3. Read by Kinesis Client Library applications
Data retention for upto 24 hours
Use Cases
1. React to changes in real time (send email to users)
2. Analytics
3. Insert into derivative tables
4. Insert into OpenSearch service
5. Implement cross-region replication
What are various other options to store session state cache apart from DynamoDB?
It’s very common to use DynamoDB to store session state cache. But there are other options as well like
- ElastiCache (preferred) - it’s in memory but DynamoDB is serverless, both are key/value stores
- EFS (preferred) - but must be attached to an EC2 instance as a network drive
- EBS & Instance Store (not preferred) -good for local caching but not good for global caching
- S3 (not preferred) - high latency, meant to store large objects and not to store small objects like session cache
What is DynamoDB TTL (time to live) ?
DynamoDB TTL (Time to Live) is a feature that automatically deletes expired items from a table based on a timestamp attribute. It helps manage storage costs and keeps data fresh without manual cleanup.
⸻
How TTL Works
1. You choose a specific attribute (e.g., expiryTime) to act as the TTL attribute.
2. The value of that attribute must be a Unix timestamp (in seconds) indicating when the item should expire.
3. DynamoDB automatically deletes the item within 48 hours after the timestamp is reached.
List some good to know DynamoDB CLI options
DynamoDB CLI – Good to Know
• –projection-expression: one or more attributes to retrieve
• –filter-expression: filter items before returned to you
General AWS CLI Pagination options (e.g., DynamoDB, S3, …)
• –page-size: specify that AWS CLI retrieves the full list of items but with a larger number of API calls instead of one API call (default: 1000 items)
• –max-items: max. number of items to show in the CLI (returns NextToken)
• –starting-token: specify the last NextToken to retrieve the next set of items
What are DynamoDB Transactions?
What are DynamoDB Transactions?
• Transactional Operations: Enable coordinated, all-or-nothing operations (add/update/delete) across one or more items and multiple tables. • ACID Guarantees: Transactions in DynamoDB support Atomicity, Consistency, Isolation, and Durability. • Read Modes: • Eventual Consistency • Strong Consistency • Transactional Read (as part of transactions) • Write Modes: • Standard • Transactional • Cost: • Transactions consume 2x WCUs (Write Capacity Units) and 2x RCUs (Read Capacity Units). • This is because each operation includes a prepare and commit phase.