DynamoDB Flashcards

(33 cards)

1
Q

What is DynamoDB?

A

DynamoDB is a no sql database which is fully managed. It’s highly available with replication across multiple AZ

It scales to massive workloads, millions of requests per second, trillions of rows, hundreds of TB of storage

Key Features:
• NoSQL Database: Supports key-value and document data models.
• Managed Service: No need to manage hardware, setup, or maintenance.
• High Performance: Offers single-digit millisecond latency at any scale.
• Scalable: Automatically scales throughput capacity and storage.
• Durable and Highly Available: Data is replicated across multiple Availability Zones (AZs).
• Serverless: No server provisioning required. Integrated with AWS Lambda for event-driven apps.
• Security: Fine-grained access control using IAM, encryption at rest, and in transit.

Core Components:
• Tables: Collection of items (similar to rows).
• Items: Individual records (can have different attributes).
• Attributes: Key-value pairs in each item.
• Primary Key: Uniquely identifies each item (can be just a partition key or partition + sort key).
• Indexes:
• Global Secondary Index (GSI): Query flexibility using alternate keys.
• Local Secondary Index (LSI): Same partition key, different sort key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are primary keys in DynamoDB and how to choose them ?

A
  1. DynamoDB is made of tables
  2. Each table has a primary key (must be decided at creation time)
  3. Each table has an infinite number of items (called rows)
  4. Each item has attributes (can be added over time, could be null)
  5. Maximum size of an item is 400 KB
  6. Data types supported are -
    Scalar (String, Number, Binary, Boolean, Null)
    Document Type (List, Map)
    Set Types (String Set, Binary Set, Number Set)

Types of Primary Keys in DynamoDB
1. Simple Primary Key (Partition Key only)
• Format: { partition_key }
• All items must have a unique partition key.
• Good for lookups where each key maps to one item.
2. Composite Primary Key (Partition Key + Sort Key)
• Format: { partition_key, sort_key }
• Partition key groups items; sort key allows range queries within the group.
• Enables storing multiple related items under the same partition key.

How to Choose a Good Partition Key
1. Uniform Distribution
• Choose a partition key that spreads data evenly across partitions.
• Avoid hot partitions (where one key gets most of the traffic).
2. Access Pattern Awareness
• Design keys based on how your app reads/writes data.
• Ask: “What are the most common queries?” and build keys to support them efficiently.
3. High Cardinality
• Pick a key with many unique values to ensure even distribution.
• E.g., user_id, order_id, device_id are usually good choices.

When to Use Sort Key
• You need range queries (e.g., fetch orders by date for a customer).
• You want to group related items under a single partition key.
• Example: partition_key = user_id, sort_key = timestamp

Design Tips
• Avoid hot keys: If one key (e.g., “India”) is accessed too frequently, performance will suffer.
• Consider synthetic keys: If natural keys don’t scale well, concatenate values like user_id#date to ensure uniqueness and distribution.
• Model for read patterns: Think in terms of what exact queries your application needs to run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are different options available to configure Throughput for DynamoDB? What are different units to measure Throughput of DynamoDB ?

A

Two options
1. Provisioned
We will have to calculate RCU and WCU and provision them beforehand

  1. On Demand
    No need to provision, can specify beforehand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to calculate WCU and RCU wrt DynamoDB ?

A

WCU (Write Capacity Unit) and RCU (Read Capacity Unit) represent how you provision throughput capacity for your tables when using provisioned mode.

  1. Write Capacity Unit (WCU)
    • Definition: 1 WCU = 1 write per second for an item up to 1 KB in size.
    • If item > 1 KB: Each additional KB needs 1 more WCU.
    • Example:
    • Writing a 2 KB item = 2 WCUs
    • Writing 10 0.5 KB items/sec = 5 WCUs

  1. Read Capacity Unit (RCU)
    • Definition:
    • Strongly consistent read: 1 RCU = 1 read/sec for an item up to 4 KB
    • Eventually consistent read: 1 RCU = 2 reads/sec for 4 KB item
    • Transactional read: 1 RCU = 1 read/sec for 4 KB, but at double the cost (uses 2 RCUs)
    • If item > 4 KB: Each 4 KB chunk needs 1 more RCU.
    • Example:
    • Strongly consistent read of 8 KB item = 2 RCUs
    • Eventually consistent read of 8 KB item = 1 RCU
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are DynamoDB APIs for writing data?

A

PutItem
• Creates a new item or fully replaces an existing item with the same Primary Key
• Consumes WCUs (Write Capacity Units)

UpdateItem
• Edits an existing item’s attributes or adds a new item if it doesn’t exist
• Can be used to implement Atomic Counters – a numeric attribute that is incremented unconditionally

Conditional Writes
• Accepts write/update/delete only if conditions are met; otherwise returns an error
• Helps with concurrent access to items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are DynamoDB APIs for reading data?

A
  1. DynamoDB – Reading Data (Query)
    • Query returns items based on:
    • KeyConditionExpression:
    • Partition Key value (must use = operator) – required
    • Sort Key value (=, <, <=, >, >=, Between, Begins with) – optional
    • FilterExpression:
    • Additional filtering after the query operation (before data is returned)
    • Use only with non-key attributes (does not allow HASH or RANGE attributes)
    • Returns:
    • The number of items specified in Limit
    • Or up to 1 MB of data
    • Supports pagination on the results
    • Can query:
    • Table
    • Local Secondary Index (LSI)
    • Global Secondary Index (GSI)

  1. DynamoDB – Reading Data (GetItem)
    • GetItem
    • Read based on Primary Key
    • Primary Key can be HASH or HASH + RANGE
    • Default is Eventually Consistent Read
    • Option for Strongly Consistent Reads (uses more RCUs, may take longer)
    • Use ProjectionExpression to retrieve only specific attributes
  2. DynamoDB – Reading Data (Scan)
    • Scans entire table and then filters (inefficient)
    • Returns up to 1 MB; use pagination
    • Consumes high RCU
    • Use Limit to control scan size
    • Parallel Scan improves throughput
    • Use ProjectionExpression and FilterExpression (no change in RCU)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are DynamoDB APIs for deleting data?

A

• DeleteItem: Delete individual item (supports conditional delete)
• DeleteTable: Delete entire table (faster than deleting items one by one)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are DynamoDB APIs for batch operations?

A

• Reduces latency by minimizing API calls
• Parallel execution improves efficiency

BatchWriteItem:
• Up to 25 PutItem/DeleteItem
• Up to 16 MB total or 400 KB per item
• Cannot update items
• Use UnprocessedItems for retries

BatchGetItem:
• Up to 100 items or 16 MB total
• Parallel reads
• Use UnprocessedKeys for retries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is DynamoDB – PartiQL

A

• SQL-compatible query language for DynamoDB
• Allows select, insert, update, delete using SQL
• Run queries across multiple tables
• Run PartiQL queries from:
• AWS Management Console
• NoSQL Workbench for DynamoDB
• DynamoDB APIs
• AWS CLI
• AWS SDK

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is DynamoDB – Conditional Writes?

A

• For PutItem, UpdateItem, DeleteItem, and BatchWriteItem
• You can specify a Condition expression to determine which items should be modified:

  • attribute_exists
  • attribute_not_exists
  • attribute_type
  • contains (for string)
  • begins_with (for string)
  • ProductCategory IN (:cat1, :cat2) and Price between :low and :high
  • size (string length)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Conditional Writes – provide Example on Update Item

A

aws dynamodb update-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–update-expression “SET Price = Price - :discount” \
–condition-expression “Price > :limit” \
–expression-attribute-values file://values.json
————————// values.json //————————
{
“:discount”: { “N”: “150” },
“:limit”: { “N”: “500” }
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Conditional Writes – provide Example on Delete Item

A

attribute_not_exists – only succeeds if the attribute doesn’t exist

aws dynamodb delete-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–condition-expression “attribute_not_exists(Price)”

attribute_exists – opposite of attribute_not_exists

aws dynamodb delete-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–condition-expression “attribute_exists(ProductReviews.OneStar)”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Conditional Writes – provide Example Complex Condition

A

aws dynamodb delete-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–condition-expression “(ProductCategory IN (:cat1, :cat2)) and (Price between :lo and :hi)” \
–expression-attribute-values file://values.json

————————// values.json //————————

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Conditional Writes – provide Example of String Comparisons?

A

begins_with – check if prefix matches

contains – check if string is contained in another string

aws dynamodb delete-item \
–table-name ProductCatalog \
–key ‘{ “Id”: { “N”: “456” } }’ \
–condition-expression “begins_with(Pictures.FrontView, :v_sub)” \
–expression-attribute-values file://values.json

———————————// values.json //———————————

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the DynamoDB indexes?

A
  1. Local Secondary Index (LSI)

Definition:
An LSI is an index that has the same partition key as the base table but a different sort key.

Use Case:
• When you need multiple sort key options for the same partition key.
• For example, if your base table is UserID (partition key) and OrderDate (sort key), you can create an LSI with UserID and OrderAmount to sort orders by amount instead.

Key Points:
• Defined at table creation.
• Strongly consistent reads supported.
• Shares the same partition space as the base table (same capacity units and storage limits).

  1. Global Secondary Index (GSI)

Definition:
A GSI can have completely different partition and sort keys from the base table.

Use Case:
• When you want to query the table based on other attributes, not the original primary key.
• For example, if your base table has UserID as the primary key, and you want to query by Email, you can create a GSI with Email as the partition key.

Key Points:
• Can be created any time (not limited to table creation).
• Supports eventual consistency by default (strong consistency not supported).
• Separate throughput and storage from the base table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are throttling considerations for DynamoDB indexes

A

Throttling in DynamoDB happens when your table or index exceeds the provisioned or burstable read/write throughput capacity. This can affect both main tables and indexes (LSIs and GSIs). Here’s a breakdown of the throttling considerations specifically for DynamoDB indexes:

  1. Global Secondary Index (GSI) Throttling
    • Separate Capacity:
    GSIs have their own read and write capacity settings (if provisioned mode is used). If you don’t provision enough capacity, the GSI can throttle even if the base table is healthy.
    • Write Throttling Risk:
    Every write to the base table that includes GSI keys results in a write to the GSI. If the GSI write capacity is lower than the base table’s, the GSI will throttle and the base table write will also fail.
    • Burst Traffic:
    GSIs can be overwhelmed by sudden spikes in writes. Using on-demand capacity or enabling auto-scaling can mitigate this.

  1. Local Secondary Index (LSI) Throttling
    • Shared Capacity:
    LSIs share the same provisioned capacity and partition space as the base table. So:
    • High query traffic on an LSI can consume the base table’s read capacity.
    • Writes to LSIs are tied to the base table’s write capacity.
    • Write Size Impact:
    Each LSI copy consumes additional write capacity units (WCU) based on the size of the indexed attributes. Large items will consume more WCUs.
17
Q

What is DynamoDB PartiQL (pronounced “particle”)

A

DynamoDB PartiQL is a SQL-compatible query language that allows you to interact with Amazon DynamoDB using familiar SQL-like syntax. It simplifies data operations like reading, inserting, updating, and deleting items without needing to use complex DynamoDB-specific APIs.

PartiQL works with:
• Tables
• Global Secondary Indexes (GSI)
• Local Secondary Indexes (LSI)

Key Features of PartiQL
• SQL-like syntax for NoSQL data
• Works on JSON-style DynamoDB items
• No need to write ExpressionAttributeNames or ExpressionAttributeValues
• Compatible with DynamoDB’s standard APIs (via SDK, CLI, or Console)

Important PartiQL Statements in DynamoDB

  1. SELECT – Read items
  2. INSERT – Add new items
  3. UPDATE – Modify existing items
  4. DELETE – Remove items
18
Q

What is optimistic locking in DynamoDB?

A

DynamoDB has a feature called Conditional Writes

Optimistic locking in DynamoDB is a strategy to prevent concurrent write conflicts when multiple users or processes attempt to update the same item simultaneously.

It works by using a special attribute—usually called a version number—that tracks changes to an item. Before updating the item, the client checks that the version hasn’t changed. If it has, the update is rejected, preventing unintended overwrites.

19
Q

What is DynamoDB accelerator or DAX?

A

DynamoDB Accelerator (DAX) is a fully managed, in-memory caching service designed to speed up read operations for DynamoDB tables by 10x, reducing read latency from milliseconds to microseconds.

DAX is especially useful for read-heavy and read-latency-sensitive applications.

Key Features of DAX
• Microsecond latency for cached reads
• Fully managed by AWS — handles replication, failover, patching
• Compatible with existing DynamoDB API calls (uses same SDK with minimal changes)
• Write-through caching: Data is written to both DAX and DynamoDB
• Highly available: Supports clustering and automatic failover

20
Q

What are DynamoDB streams?

A

DynamoDB streams are an ordered streams of item level modifications in a table (create/update/delete)

Stream records can be
1. Sent to Kinesis Data Streams
2. Read by AWS Lambda
3. Read by Kinesis Client Library applications

Data retention for upto 24 hours

Use Cases
1. React to changes in real time (send email to users)
2. Analytics
3. Insert into derivative tables
4. Insert into OpenSearch service
5. Implement cross-region replication

21
Q

What are various other options to store session state cache apart from DynamoDB?

A

It’s very common to use DynamoDB to store session state cache. But there are other options as well like

  1. ElastiCache (preferred) - it’s in memory but DynamoDB is serverless, both are key/value stores
  2. EFS (preferred) - but must be attached to an EC2 instance as a network drive
  3. EBS & Instance Store (not preferred) -good for local caching but not good for global caching
  4. S3 (not preferred) - high latency, meant to store large objects and not to store small objects like session cache
22
Q

What is DynamoDB TTL (time to live) ?

A

DynamoDB TTL (Time to Live) is a feature that automatically deletes expired items from a table based on a timestamp attribute. It helps manage storage costs and keeps data fresh without manual cleanup.

How TTL Works
1. You choose a specific attribute (e.g., expiryTime) to act as the TTL attribute.
2. The value of that attribute must be a Unix timestamp (in seconds) indicating when the item should expire.
3. DynamoDB automatically deletes the item within 48 hours after the timestamp is reached.

23
Q

List some good to know DynamoDB CLI options

A

DynamoDB CLI – Good to Know
• –projection-expression: one or more attributes to retrieve
• –filter-expression: filter items before returned to you

General AWS CLI Pagination options (e.g., DynamoDB, S3, …)
• –page-size: specify that AWS CLI retrieves the full list of items but with a larger number of API calls instead of one API call (default: 1000 items)
• –max-items: max. number of items to show in the CLI (returns NextToken)
• –starting-token: specify the last NextToken to retrieve the next set of items

24
Q

What are DynamoDB Transactions?

A

What are DynamoDB Transactions?

•	Transactional Operations: Enable coordinated, all-or-nothing operations (add/update/delete) across one or more items and multiple tables.
•	ACID Guarantees: Transactions in DynamoDB support Atomicity, Consistency, Isolation, and Durability.
•	Read Modes:
•	Eventual Consistency
•	Strong Consistency
•	Transactional Read (as part of transactions)
•	Write Modes:
•	Standard
•	Transactional
•	Cost:
•	Transactions consume 2x WCUs (Write Capacity Units) and 2x RCUs (Read Capacity Units).
•	This is because each operation includes a prepare and commit phase.
25
List out some DynamoDB transaction APIs
Transaction APIs 1. TransactGetItems: • For one or more GetItem operations. 2. TransactWriteItems: • For one or more PutItem, UpdateItem, and DeleteItem operations.
26
How Capacity is computed for Transactions ?
Example 1: Write Transactions • Scenario: 3 transactional writes/sec, item size = 5 KB WCU = 3 × (5KB / 1KB) × 2 = 30 WCUs Example 2: Read Transactions • Scenario: 5 transactional reads/sec, item size = 5 KB RCU = 5 × (8KB / 4KB) × 2 = 20 RCUs (Note: 5 KB gets rounded up to 8 KB for read capacity calculation)
27
What are DynamoDB partition strategies?
DynamoDB Write Sharding Problem: • Imagine a voting app with two candidates: Candidate A and Candidate B. • If the Partition Key is Candidate_ID, then all votes go into only two partitions. • This leads to hot partitions, which cause performance bottlenecks. ⸻ Solution: Write Sharding To distribute writes more evenly across partitions: • Add a suffix to the Partition Key. • This creates multiple distinct partition keys (e.g., Candidate_A_1, Candidate_A_2, …), allowing data to spread across more partitions. ⸻ Sharding Methods: 1. Sharding Using Random Suffix: • Append a random number/string to the Candidate_ID (e.g., Candidate_A_17). • Useful for uniform distribution. 2. Sharding Using Calculated Suffix: • Use a deterministic hash or modulo logic to generate suffixes. • Allows for controlled and repeatable partitioning. ⸻ This technique improves write scalability and avoids throttling issues in high-write applications like vote counters, logs, or clickstreams.
28
Explain different write types in DynamoDB?
1. Concurrent Writes • Description: When two clients write to the same item at the same time. • Behavior: • First write: value = 1 • Second write: value = 2 • Result: The second write overwrites the first write. This happens because there’s no built-in conflict resolution unless explicitly handled. • Risk: Data inconsistency due to race conditions. ⸻ 2. Atomic Writes • Description: Ensures operations are safely applied without overwriting each other. • Behavior: • One write increases value by 1 • Another write increases value by 2 • Result: Both writes succeed independently, and the final value is cumulatively updated (i.e., 0 + 1 + 2 = 3). • Use Case: Useful for counters and concurrent updates on numeric values. • Achieved Using: UpdateExpression with ADD or similar operations. ⸻ 3. Conditional Writes • Description: A write happens only if a specified condition is true. • Behavior: • Both writers try to update the item only if value = 0 • Result: The first write succeeds and updates the value. The second write fails because the condition value = 0 is no longer true. • Use Case: Enforce uniqueness, implement optimistic locking, avoid overwrites.
29
What are some common DynamoDB patterns used with Amazon S3 to manage and index large objects and their metadata ?
DynamoDB – Large Objects Pattern Purpose: Efficiently store large files (like images, videos) by separating their binary data and metadata. How It Works: • The application uploads a large object (e.g., 617055.jpg) to an Amazon S3 bucket (media-assets-bucket). • Instead of storing the large binary in DynamoDB, only metadata (like Product_ID, Product_Name, and the Image_URL) is stored in the DynamoDB Products table. • Later, the application can: • Query DynamoDB to get the image URL. • Download the actual image from S3 using the stored URL. Use Case: • Media-heavy applications (e.g., e-commerce, social media) where storing images/files directly in DynamoDB would exceed its item size limit (400 KB). ⸻ DynamoDB – Indexing S3 Objects Metadata Purpose: Automatically index S3 object metadata in DynamoDB for querying and reporting. How It Works: 1. An application uploads an object to an S3 bucket. 2. S3 triggers a Lambda function using an event notification. 3. The Lambda function extracts metadata (e.g., filename, timestamp, size, custom tags). 4. The metadata is stored in a DynamoDB table. 5. A client or application can now query DynamoDB for: • Searching by upload date. • Calculating total storage used. • Listing objects with specific attributes. • Finding objects uploaded within a date range. Use Case: • Building an index or search engine for files in S3. • Analytics on file uploads (usage, history, billing reports, etc.).
30
Outline strategies for DynamoDB Operations, specifically around table cleanup and copying tables
Table Cleanup Option 1: Scan + DeleteItem • How it works: • You scan the entire table to get all items. • Then individually delete each item. • Cons: • Very slow. • Consumes both RCUs (Read Capacity Units) and WCUs (Write Capacity Units). • Expensive due to high resource usage. • When to use: Only when you can’t afford to drop the table or need selective cleanup. Option 2: Drop Table + Recreate Table • How it works: • Delete the table entirely and create a new one with the same schema. • Pros: • Fast, efficient, and cheap. • When to use: Best when you want to delete all data and can afford to lose everything instantly. ⸻ Copying a DynamoDB Table Option 1: Using AWS Data Pipeline • AWS-native service for copying data between tables (or even regions/accounts). • Automates the process, but setup may be involved. Option 2: Backup and Restore • How it works: • Use DynamoDB’s built-in on-demand backup feature. • Restore the backup into a new table. • Cons: • Takes some time. • Pros: • Easy and native solution. Option 3: Scan + PutItem or BatchWriteItem • How it works: • You write custom code to scan data from the source table and write it to the destination. • Pros: • Highly customizable. • Cons: • Requires manual effort and handling of throttling, batching, and retries.
31
Explain these features in DynamoDB 1. Security 2. Backup and Restore 3. Global Tables 4. DynamoDB Local 5. AWS Database migration service
Security • VPC Endpoints: Access DynamoDB privately (no internet exposure). • IAM Policies: Full control over access. • Encryption: • At rest: via AWS KMS. • In transit: via SSL/TLS. Backup and Restore • Point-in-time recovery (PITR) similar to RDS. • No impact on performance. Global Tables • Multi-region, multi-active setup. • Fully replicated with high availability and performance. DynamoDB Local • Local testing and development without needing internet access to AWS services. AWS DMS (Database Migration Service) • Can migrate from MongoDB, Oracle, MySQL, S3, etc. to DynamoDB.
32
How the direct client access to DynamoDB using temporary AWS credentials works ?
Flow: 1. User logs in via an Identity Provider like: • Amazon Cognito • Google • Facebook • OpenID Connect • SAML 2. The app gets temporary AWS credentials. 3. These credentials allow the user to interact directly with DynamoDB. Use Case: • Common in mobile/web apps where users interact with data directly without a backend API layer.
33
How Fine-Grained Access Control works with DynamoDB?
Key Features: • Web Identity Federation / Cognito Identity Pools: • Each user receives unique AWS credentials. • IAM Role with Conditions: • Use IAM policies to restrict access based on user identity or context. • Fine-Grained Controls: • LeadingKeys: Restrict access based on the Partition Key (e.g., users can access only their own data). • Attributes: Limit which fields (attributes) a user can access. Example IAM Policy: • Grants GetItem, PutItem, etc. • Restricts access to rows where the partition key matches the user’s ID (via ${cognito-identity.amazonaws.com:sub}).