Storage Flashcards

1
Q

S3

A
  • Bucket must have a globally unique name
  • Bucket are defined at the region level
  • Naming convention
    • No uppercase
    • No underscore
    • 3-63 characters long
    • Not an IP
    • Must start with lowercase letter or number
  • Object key is its full path
  • Max 5TB
  • More than 5GB, must use “multi-part upload”
  • Strong Consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

S3 Standard General Purpose

A
  • Use for frequently accessed data
  • Low latency and high throughput
  • Sustain 2 concurrent facility failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Infrequent Access

A
  • Use for less frequently access but require rapid access when needed
  • Standard IA
    • Use Cases : Disaster Recovery and Backups
  • One Zone IA
    • Use Cases : Secondary backup
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Glacier

A
  • Glacier Instant Retrieval
    • milli second retrieval
    • min 90 days storage
    • Use Case : Data access once a quarter
  • Glacier Flexible Retrieval
    • Expedited 1-5 mins
    • Standard 3-5 hours
    • Bulk 5-12 hours
    • min 90 days storage
  • Glacier Deep Archive
    • Standard 12 hours
    • Bulk 48 hours
    • min 180 days storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

S3 Intelligent Tiering

A
  • Small monthly monitoring and auto-tiering fee
  • Moves objects automatically between Access Tiers based on usage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

S3 Moving Between Storage Classes

A
  • For IA accessed object, move them to STANDARD_IA
  • For archive objects, move to Glacier or Deep_Archive
  • Moving objects can be automated using a lifecycle configuration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

S3 Lifecycle Rules

A
  • Transition Rules
    • defines when objects are transitioned to another storage class
  • Expiration Rules
    • Configure objects to expire after some time
      • can be used to delete old versions of files if versioning is enabled
      • can be used to delete incomplete multi part uploads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

S3 Versioning

A
  • Enable at bucket level
  • Same key overwrite will increment the version
  • Use Cases : 1 Protect against unintended delete 2 Easy roll back
  • Any file is not versioned prior to enabling versioning will have version “null”
  • Suspending versioning does not delete the previous versions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

S3 Replication

A
  • Must enable versioning in source and destination
  • Cross Region Replication
  • Same Region Replication
  • Buckets can be in different account
  • Copying is asynchronous
  • After activation, only new objects are replicated
  • Optionally, you can replicate existing objects using S3 Batch Replication to replicate existing objects and objects that failed replication
  • For DELETE operation
    • can replicate delete markers from source to target
    • deletions with a version ID are not replicated
  • There is no chaining replication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

S3 Performance

A
  • Durability 99.999999999%
  • Availability 99.99%
  • 100-200ms latency
  • 3500 Put Copy Post Delete /sec /prefix in a bucket
  • 5500 Get Head /sec /prefix in a bucket
  • Multi Part Upload
    • Recommended for files > 100 MB
    • Required for files > 5GB
  • S3 Transfer Acceleration
    • Increase transfer speed by transferring file on an AWS edge location which will forward the data to the S3 Bucket in the target region
    • Compatible with multi-part upload
  • S3 Byte-Range Fetches
    • Parallelize GETs by requesting specific byte ranges
    • Better resilience in case of failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

S3 KMS

A
  • SSE-KMS will be impacted by KMS limit
  • When upload, calls GenerateDataKey KMS API
  • When download, calls Decrypt KMS API
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SSE-S3

A
  • “x-amz-server-side-encryption” : “AES256”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SSE-KMS

A
  • Pros : Use Control + Audit Trail
  • “x-amz-server-side-encryption” : “aws:kms”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SSE-C

A
  • HTTPS must be used
  • Encryption key must be provided in HTTP header
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

S3 Bucket Settings for Block Public Access

A
  • Block public access to buckets and objects
  • Block public and cross-account access to buckets and objects thr any public bucket or access point policies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

S3 Event Notification with Amazon EventBridge

A
  • Advanced Filtering options with JSON rules (metadata, object size, name, …)
  • Multiple Destinations –> Step Functions, Kinesis Streams/Firehose
  • EventBridge Capability –> Archive, Replay Events, Reliable delivery
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

NoSQL

A
  • Distributed
  • Support limited query joins
  • NoSQL databases scale horizontally
18
Q

DynamoDB

A
  • Fully managed
  • Highly available with replication across multiple AZs
  • Scales to massive workloads and distributed databases
  • Millions of requested per second
  • 100TBs storage
  • Standard and Infrequent Access Table Class
19
Q

DynamoDB Basics

A
  • DynamoDB is made of Tables
  • Each table has a primary key
  • Each table can have an infinite number of items
  • Each item has attributes
  • max size of an item is 400KB
  • 3 Data Types (Number, String, Binary in both scalar and multi-valued sets)
  • Supports document stores such as JSON, XML, HTML
20
Q

DynamoDB Primary Key

A
  • Choose the column which has highest cardinality
  • Partition Key (HASH)
    • partition key must be unique for each item
    • partition key must be diverse so that the data is distributed
  • Partition Key+Sort Key (HASH+RANGE)
    • Combination must be unique for each item
    • Data is grouped by partition key
21
Q

DynamoDB Read Write Capacity Mode

A
  • Provisioned Mode
    • Throughput can be exceeded temporarily using “Burst Capacity”
    • If Burst Capacity has been consumed, you will get a “ProvisionedThroughputExcceded” Exception
  • On-Demand Mode
  • Switch between different modes once every 24 hours
22
Q

DynamoDB Write Capacity Unit

A
  • 1 WCU represents 1 write per second for an item up to 1KB in size
23
Q

DynamoDB Strongly Consistent Read vs Eventually Consistent Read

A
  • SCR
    • If we read after a write, we will get the correct data
    • Set ConsistentRead to True
    • Consumes twice the RCU
  • ECR
    • If we read just after a write, it is possible to get some stale data because of replication
24
Q

Read Capacity Unit

A
  • 1 RCU represents 1 Strongly Consistent Read per second
  • 1 RCU represents 2 Eventually Consistent Read per second
  • for an item up to 4KB
25
DynamoDB Throttling
- If we exceed provisioned WCU or RCU, we get "ProvisionedThroughputExceeded" Exception - Solution - Distributie Partition Key - Use DynamoDB Accelerator (DAX)
26
DynamoDB Writing Data
- PutItem - Creates a new item or fully replace an old item - UpdateItem - Edits an existing items attribute or adds a new item if it does not exist
27
DynamoDB Reading Data
- GetItem - Read based on Primary Key - Primary Key can be HASH or HASH+RANGE - Eventual Consistent Read - Returns up to 1MB - Scan - Scan the entire table and filter out data - Can use Parallel Scan
28
DynamoDB Batch Operation
- Save latency by reducing the number of API calls - Operations are done in parallel for better efficiency - BatchWriteItem - Up to 25 PutItem or DeleteItem in one call - Up to 16MB of data written and up to 400KB per item - Cannot update items - BatchGetItem - Return items from one or more tables - Up to 100 items and up to 16 MB per item - Items are retrieved in parallel to minimize latency
29
DynamoDB Local Secondary Index
- Alternative Sort Key - Up to 5 Local Secondary Indexes per table - Must be defined at table creation time
30
DynamoDB Global Secondary Index
- Alternative Primary Key (HASH or HASH+RANGE) from base table - Speed up queries on non-key attributes - Must provision RCUs and WCUs for the index - Can be added / modified after table creation
31
DynamoDB Indexes and Throttling
- GSI - If writes are throttled on GSI, then the main table will be throttled - LSI - Use the WCUs and RCUs of the main table - No special throttling considerations
32
DynamoDB PartiQL
- Use a SQL like syntax to manipulate DynamoDB tables - Support some insert, update, select and delete statements
33
DynamoDB Accelerator
- Fully managed in-memory cache for DynamoDB - Microseconds latency for cached reads and queries - Does not require application logic modification (compatible with existing DynamoDB APIs) - 5 mins TTL for cache - Up to 10 nodes in the cluster - Multi-AZ
34
DynamoDB Streams
- Ordered stream of item-level modifications (create/update/delete) in a table - Stream records can be sent to KDS - Retention up to 24 hours - Ability to choose the information that will be written to the stream - KEYS_ONLY : Only the key attributes of the modified item - NEW_IMAGE : the entire new item - OLD_IMAGE : the entire old item - NEW_AND_OLD_IMAGE : both new and old images of the item - DynamoDB Streams are made of shards just like Kinesis Data Streams - Records are not retroactively populated in a stream after enabling it
35
DynamoDB Streams and AWS Lambda
- We need to define an Event Source Mapping to read from a DynamoDB Streams - We need to ensure the Lambda function has the appropriate permissions - Lambda function is invoked synchronously
36
DynamoDB Time To Live
- Automatically delete items after an expiry timestamp - Does not consume WCUs - TTL attribute must be a "Number" data type with "Unix Epoch timestamp" value - Expired items deleted within 48 hours of expiration - Expired items that have not been deleted will appear in reads/queries/scans - Expired items are deleted from both LSIs and GSIs - A delete operation for each expired item enters the DynamoDB Streams
37
DynamoDB Security
- Security - VPC Endpoint - EAR : KMS - EIF : SSL / TLS - BackUp - Point in time restore like RDS
38
AWS ElastiCache
- To manage Redis or Memcached - Caches are in memory database with high performance and low latency - Helps reduce load off databases for read intensive workloads - Multi AZ with Failover Capability - AWS takes care of OS maintenance / patching, optimizations, setup etc
39
Redis
- In-memory key-value store - Super low latency (sub ms) - Cache survive reboots by default - Multi AZ with Automatic Failover for disaster recovery - Support for Read Replicas - Use Cases : Gaming, Relieve pressure on databases, etc
40
Memcached
- Memcached is an in-memory object store - Cache does not survive reboots - Overall, Redis is better