Amazon DynamoDB Flashcards

Question

DynamoDB performs two underlying reads or writes of every item in the transaction ? T/F

Answer 1

DynamoDB performs **two underlying reads** or **writes** of every item in the transaction: **one** to **prepare** the **transaction** and **one** to **commit** the transaction. These two underlying read/write operations are visible in your Amazon CloudWatch metrics.

Answer 2

The Scan operation returns **one or more** items and item attributes by accessing **every item** in a table or a secondary index. A **single Scan** operation reads up to the **maximum number of items set** (if using the **Limit** parameter) or a **maximum of 1 MB.** Scan API calls can use a **lot of RCUs** as they **access every item** in the table. You can use the **ProjectionExpression** parameter so that Scan only **returns some** of the **attributes**, rather than all of them. If you need to further **refine** the Scan results, you can optionally provide a **filter expression.** A filter expression is **applied after** a Scan finishes but **before** the **results** are **returned**. **Scan** operations proceed **sequentially**. For faster performance on a **large table** or secondary index, **applications** can request a **parallel Scan** operation by providing the **Segment** and **TotalSegments** parameters. Scan uses **eventually consistent reads** when accessing the data in a table. If you need a **consistent copy** of the data, as of the time that the Scan begins, you can set the **ConsistentRead parameter** to **true**.

Answer 3

A **query** operation **finds** items in your **table** based on the **primary** **key** attribute and a **distinct value** to search for. *For example, you might search for a user ID value and all attributes related to that item would be returned.* You can use an **optional sort key name** and **value** to **refine** the results. *For example, if your sort key is a timestamp, you can refine the query to only select items with a timestamp of the last 7 days.* By default, **a query returns all the attributes for the items,** but you can use the **ProjectionExpression** parameter if you want the query to only return the attributes you want to see. Results are always **sorted** by the **sort key.** Numeric **order** is used – by **default** in **ascending** order (e.g. 1,2,3,4). ASCII character code values are used. You can **reverse** the **order** by setting the **ScanIndexForward** (yes, it’s a query, not a scan) parameter to **false**. By **default**, **queries** are **eventually consistent**. To use **strongly consistent** you need to **explicitly** set this in the query.

Answer 4

**Query** is more **efficient** than **Scan**. **Scan dumps** the **entire** table, then filters out the values that provide the desired result (removing unwanted data). This adds an extra step of removing the data you don’t want. As the **table grows**, the scan operation takes **longer**. A Scan operation on a large table can use up the **provisioned throughput** for a **large table** in just a **single operation**.

Answer 5

You can **reduce** the **impact** of a query or scan by setting a **smaller page** size which uses **fewer read** operations. A larger number of **smaller operations** will allow other requests to succeed without throttling. **Avoid** using **scan operations** if you can: design tables in a way that you can use the **Query**, **Get**, or **BatchGetItem APIs**. **Scan performance** optimization: By **default**, a **scan** operation processes data **sequentially** and returns data in 1MB increments before moving on to retrieve the next 1MB of data. It can **only scan 1** partition at a time. You can **configure** DynamoDB to use **Parallel** scans instead by logically dividing a table or index into segments and **scanning each segment** in **parallel**. Note: best to **avoid parallel** scans if your table or index is already **incurring** heavy **read** / **write** activity from other **applications**.

Answer 6

An index is a **data structure** which allows you to **perform** fast queries on specific columns in a table. You select **columns** that you want **included** in the **index** and run your searches on the **index** instead of the entire dataset. There are 2 types of index supported for speeding up queries in DynamoDB: * Local Secondary Index. * Global Secondary Index.

Answer 7

**An LSI** provides an **alternative sort key** to use for scans and queries. It provides an **alternative** range key for your table, **local** to the hash key. You can have up to **five LSIs** per **table**. The sort key consists of exactly one **scalar attribute.** The **attribute** that you **choose** must be a **scalar String, Number, or Binary.** An **LSI** must be **created** at table **creation** time. It can only be created when you are creating your table. You **cannot add**, **remove**, or **modify** it later. It has the same partition key as your original table (different sort key). It gives you a **different view** of your **data**, organized by an alternative sort key. **Any queries** based on this **sort key** are **much faster** using the **index** than the main table. An example might be having a user ID as a partition key and account creation date as the sort key. The **key benefit** of an **LSI** is that you can **query** on **additional values** in the **table other** than the **partition** key / **sort** key.

Answer 8

A GSI is used to **speed up queries** on **non-key** attributes use a GSI You can **create** when you create your table or **at any time later.** A GSI has a different partition key as well as a different sort key. It gives a completely different view of the data. It **speeds up** any queries relating to this alternative partition and sort key. An example might be an email address as the partition key, and last login date as the sort key. With a GSI the index is a new “**table**”, and you can **project attributes** on it. * The partition key and sort key of the original table are always projected (KEYS\_ONLY). * Can specify extra attributes to project (INCLUDE). * Can use all attributes from main table (ALL). You must define **RCU / WCU** for the **index** It is possible to **add / modify GSI** at **any** time. If **writes** are **throttled** on the **GSI**, the **main** table **will** be **throttled** (even if there’s enough WCUs on the main table). **LSIs do not** cause any special throttling considerations. **Exam tip:** You typically need to ensure that you have **at least the same**, or more, **RCU/WCU** specified in your **GSI as** in your **main** table to **avoid throttling** on your main table.

Answer 9

With **provisioned capacity** mode you **specify** the number of **data reads** and **writes per second** that you require for your application. You can use **auto scaling** to automatically adjust your table’s **capacity** based on the specified utilization rate to ensure **application performance** while **reducing costs**. When you **create** your **table** you specify your requirements using **Read Capacity Units** (RCUs) and **Write Capacity** **Units** (WCUs). **Note: WCUs and RCUs are spread between partitions evenly.** You can also use **Auto Scaling** with **provisioned** capacity.

Answer 10

* Each API call to **read data** from your **table is a read** request. * Read requests can be **strongly** consistent, **eventually** consistent, or **transactional**. * For items up to **4 KB i**n size, one RCU can perform one **strongly consistent** read request per second. * Items **larger** than 4 KB **require additional** RCUs. * For items up to 4 KB in size, **one RCU** can perform **two eventually consistent read requests per second.** * **Transactional** read requests require **two** RCUs to perform **one read** per second for items up to 4 KB. * **For example, a strongly consistent read of an 8 KB item would require two RCUs, an eventually consistent read of an 8 KB item would require one RCU, and a transactional read of an 8 KB item would require four RCUs.**

Answer 11

**Each API** call to write data to your table is a **write request**. For items up to **1 KB** in size, **one WCU** can perform **one standard write** request per second. Items **larger** than 1 KB require **additional WCUs**. **Transactional** write requests require **two** WCUs to perform **one write per second** for items up to 1 KB. For example, a **standard write** request of a 1 KB item would require **one** WCU, a **standard** write request of a **3 KB** item would require **three** WCUs, and a **transactional** write request of a 3 KB item would require **six** WCUs.

Answer 12

When using **DynamoDB global** tables, your data is written automatically to **multiple AWS Regions** of your choice. Each write occurs in the **local Region** as well as the **replicated Regions**.

Answer 13

Each **GetRecords API** call to **DynamoDB Streams** is a **streams read** request unit. **Each** streams **read** request unit can return up to **1 MB of data.**

Answer 14

In **DynamoDB**, a **transactional read** or **write differs** from a standard read or write because it **guarantees** that all operations **contained** in a **single transaction** set **succeed** or **fail** as a **set**.

Answer 15

With **on-demand**, you **don’t need** to specify your requirements. DynamoDB **instantly** scales up and down based on the **activity** of your application. Great for **unpredictable** / **spikey** workloads or new workloads that aren’t well understood. You **pay for** what you use (**pay per reques**t). You can switch between the **provisioned** capacity and **on-demand** pricing models **once** per day.

Answer 16

Throttling occurs when the configured **RCU or WCU are exceeded.** May receive the **ProvisionedThroughputExceededException** error. This error indicates that your request rate is too **high for the read / write capacity provisioned** for the table. The AWS **SDKs** for DynamoDB **automatically retry** requests that receive this exception. Your request is eventually successful, unless your retry queue is too large to finish. **Possible causes of performance issues:** **Hot keys** – one partition key is being read too often. **Hot partitions –** when data access is imbalanced, a “hot” partition can receive a higher volume of read and write traffic compared to other partitions. **Large items** – large items consume more RCUs and WCUs. **Resolution**: **Reduce** the frequency of requests and use **exponential** backoff. **Try** to **design** your **application** for **uniform activity** across all logical partition keys in the table and its secondary indexes. **Use burst capacity effectively** – DynamoDB currently retains up to 5 minutes (300 seconds) of unused read and write capacity which can be consumed quickly.

Answer 17

Amazon DynamoDB Accelerator **(DAX)** is a **fully managed**, highly **available**, in-**memory** cache for **DynamoDB** that delivers up to a 10x performance improvement. Improves performance from **milliseconds** to **microseconds**, even at millions of requests per second. DAX is a **managed service** that provides **in-memory acceleration** for DynamoDB tables. Provides managed **cache invalidation**, **data population**, and **cluster management**. **DAX** is used to improve **READ** performance (not writes). You do not need to modify application logic, since DAX is compatible with existing DynamoDB API calls. **Ideal for read-heavy** and bursty workloads such as **auction applications**, **gaming**, and **retail** sites when running **special** **sales** / **promotions**.

Answer 18

* DAX is a **write-through caching service** – this means the data is **written** to the **cache** as **well** as the **back-end store** at the **same time.** * Allows you to point your **DynamoDB API calls** at the **DAX cluster** and if the item is in the cache (**cache hit**), DAX returns the result to the application. * If the item requested is not in the cache (**cache miss**) then D**AX performs an Eventually Consistent GetItem** operation against DynamoDB * **Retrieval** of **data** from **DAX** reduces the **read load** on **DynamoDB** tables. * This may result in being able to **reduce the provisioned** read capacity on the table.

Answer 19

DAX is **optimized** for DynamoDB. DAX **does not support lazy loading** (uses **write-through caching**). With **ElastiCache** you have more **management overhead** (e.g. invalidation). With **ElastiCache** you need to **modify** application **code** to point to cache. **ElastiCache supports** more datastores.

Answer 20

**Automatically** deletes an item **after an expiry date / time.** **Expired** items are **marked** for **deletion**. **Great** for **removing irrelevant** or **old data** such as: * Session data. * Event logs. * Temporary data. The TTL is enabled per row (you define a TTL column and add the expiry date / time there). DynamoDB typically deletes expired items within **48 hours of expiration.** DynamoDB streams can help **recover expired** items.

Answer 21

Many components in a network can generate errors when overloaded. In addition to simple retries all AWS SDKs use Exponential Backoff. Progressively longer waits will occur between retries for improved flow control. If after 1 minute this does not work, your request size may be exceeding the throughput for your read/write capacity. If your workload is mainly reads consider offloading using DAX or ElastiCache. If your workload is mainly writes consider increasing the WCUs for the table.

Answer 22

DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours. Applications can access this log and view the data items as they appeared before and after they were modified, in near-real time. Logs are encrypted at rest and stored for 24 hours. Data is stored in stream for 24 hours only. The StreamSpecification parameter determines how the stream is configured: StreamEnabled — Specifies whether a stream is enabled (true) or disabled (false) for the table. StreamViewType — Specifies the information that will be written to the stream whenever data in the table is modified: KEYS\_ONLY — Only the key attributes of the modified item. NEW\_IMAGE — The entire item, as it appears after it was modified. OLD\_IMAGE — The entire item, as it appeared before it was modified. NEW\_AND\_OLD\_IMAGES — Both the new and the old images of the item.

Answer 23

* PutItem – create data or full replacement (consumes WCU). * UpdateItem – update data, partial update of attributes (can use atomic counters). * Conditional writes – accept a write / update only if conditions are met. * DeleteItem – delete an individual row (can perform conditional delete). * DeleteTable – delete a whole table (quicker than using DeleteItem on all items). * BatchWriteItem – can put or delete up to 25 items in one call (max 16MB write / 400KB per item). Batching allows you to save in latency by reducing the number of API calls. Operations are done in parallel for better efficiency.

Answer 24

* **GetItem** – read based on primary key (eventually consistent by default, can request strongly consistent read). Projection expression can be specified to include only certain attributes. * **BatchGetItem** – up to 100 items, up to 16MB per item. Items are retrieved in parallel to minimize latency. * **Query** – return items based on PartitionKey value and optionally a sort key. FilterExpression can be used for filtering. Returns up to 1MB of data or number of items specified in Limit. Can do pagination on results. Can query table, local secondary index, or a global secondary index. * **Scan** – scans the entire table (inefficient). Returns up to 1MB of data – use pagination to view more results. Consumes a lot of RCU. Can use a ProjectionExpression + FilterExpression.

Answer 25

Optimistic locking is a strategy to ensure that the client-side item that you are updating (or deleting) is the same as the item in Amazon DynamoDB. Protects database writes from being overwritten by the writes of others, and vice versa.

Answer 26

To manipulate data in an Amazon DynamoDB table, you use the PutItem, UpdateItem, and DeleteItem operations. You can optionally specify a condition expression to determine which items should be modified. If the condition expression evaluates to true, the operation succeeds; otherwise, the operation fails.

Answer 27

VPC endpoints are available for DynamoDB. Encryption at rest can be enabled using AWS KMS. Encryption in transit uses SSL / TLS.

Answer 28

Keep item sizes small. If you are storing serial data in DynamoDB that will require actions based on date/time use separate tables for days, weeks, months. Store more frequently and less frequently accessed data in separate tables. If possible compress larger attribute values. Store objects larger than 400KB in S3 and use pointers (S3 Object ID) in DynamoDB.

Answer 29

ElastiCache can be used in front of DynamoDB for performance of reads on infrequently changed data. Triggers integrate with AWS Lambda to respond to triggers. **Integration with RedShift:** * RedShift complements DynamoDB with advanced business intelligence. * When copying data from a DynamoDB table into RedShift you can perform complex data analysis queries including joins with other tables. * A copy operation from a DynamoDB table counts against the table’s read capacity. * After data is copied, SQL queries do not affect the data in DynamoDB. **DynamoDB is integrated with Apache Hive on EMR. Hive can allow you to:** * Read and write data in DynamoDB tables allowing you to query DynamoDB data using a SQL-like language (HiveQL). * Copy data from a DynamoDB table to an S3 bucket and vice versa. * Copy data from a DynamoDB table into HDFS and vice versa. * Perform join operations on DynamoDB tables. *

Answer 30

Push button scaling without downtime. You can scale down only 4 times per calendar day. AWS places default limits on the throughput you can provision. DynamoDB can throttle requests that exceed the provisioned throughput for a table. DynamoDB can also throttle read requests for an Index to prevent your application from consuming too many capacity units. When a request is throttled it fails with an HTTP 400 code (Bad Request) and a ProvisionedThroughputExceeded exception.

Answer 31

Amazon DynamoDB global tables provide a fully managed solution for deploying a multi-region, multi-master database. When you create a global table, you specify the AWS regions where you want the table to be available. DynamoDB performs all the necessary tasks to create identical tables in these regions and propagate ongoing data changes to all of them. DynamoDB global tables are ideal for massively scaled applications, with globally dispersed users. Global tables provide automatic multi-master replication to AWS regions world-wide, so you can deliver low-latency data access to your users no matter where they are located.

Answer 32

A global table is a collection of one or more replica tables, all owned by a single AWS account. A replica table (or replica, for short) is a single DynamoDB table that functions as a part of a global table. Each replica stores the same set of data items. Any given global table can only have one replica table per region. You can add replica tables to the global table, so that it can be available in additional AWS regions. With a global table, each replica table stores the same set of data items. DynamoDB does not support partial replication of only some of the items. An application can read and write data to any replica table. If your application only uses eventually consistent reads, and only issues reads against one AWS region, then it will work without any modification. However, if your application requires strongly consistent reads, then it must perform all its strongly consistent reads and writes in the same region. DynamoDB does not support strongly consistent reads across AWS regions. It is important that each replica table and secondary index in your global table has identical write capacity settings to ensure proper replication of data.

Answer 33

DynamoDB auto scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns. This enables a table or a global secondary index to increase its provisioned read and write capacity to handle sudden increases in traffic, without throttling. When the workload decreases, Application Auto Scaling decreases the throughput so that you don’t pay for unused provisioned capacity. **How Application Auto Scaling works:** * You create a scaling policy for a table or a global secondary index. * The scaling policy specifies whether you want to scale read capacity or write capacity (or both), and the minimum and maximum provisioned capacity unit settings for the table or index. * The scaling policy also contains a target utilization—the percentage of consumed provisioned throughput at a point in time. * Uses a target tracking algorithm to adjust the provisioned throughput of the table (or index) upward or downward in response to actual workloads, so that the actual capacity utilization remains at or near your target utilization. Currently, Auto Scaling does not scale down your provisioned capacity if your table’s consumed capacity becomes zero. If you use the AWS Management Console to create a table or a global secondary index, DynamoDB auto scaling is enabled by default.

Answer 34

If possible, choose DynamoDB over RDS because of inherent fault tolerance. If DynamoDB can’t be used, choose Aurora because of redundancy and automatic recovery features. If Aurora can’t be used, choose Multi-AZ RDS. Frequent RDS snapshots can protect against data corruption or failure, and they won’t impact performance of Multi-AZ deployment. Regional replication is also an option but will not be strongly consistent. If the database runs on EC2, you must design the HA yourself.

Amazon DynamoDB Flashcards

(59 cards)