Storage Flashcards

1
Q

S3

A
  • Bucket must have a globally unique name
  • Bucket are defined at the region level
  • Naming convention
    • No uppercase
    • No underscore
    • 3-63 characters long
    • Not an IP
    • Must start with lowercase letter or number
  • Object key is its full path
  • Max 5TB
  • More than 5GB, must use “multi-part upload”
  • Strong Consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

S3 Standard General Purpose

A
  • Use for frequently accessed data
  • Low latency and high throughput
  • Sustain 2 concurrent facility failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Infrequent Access

A
  • Use for less frequently access but require rapid access when needed
  • Standard IA
    • Use Cases : Disaster Recovery and Backups
  • One Zone IA
    • Use Cases : Secondary backup
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Glacier

A
  • Glacier Instant Retrieval
    • milli second retrieval
    • min 90 days storage
    • Use Case : Data access once a quarter
  • Glacier Flexible Retrieval
    • Expedited 1-5 mins
    • Standard 3-5 hours
    • Bulk 5-12 hours
    • min 90 days storage
  • Glacier Deep Archive
    • Standard 12 hours
    • Bulk 48 hours
    • min 180 days storage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

S3 Intelligent Tiering

A
  • Small monthly monitoring and auto-tiering fee
  • Moves objects automatically between Access Tiers based on usage
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

S3 Moving Between Storage Classes

A
  • For IA accessed object, move them to STANDARD_IA
  • For archive objects, move to Glacier or Deep_Archive
  • Moving objects can be automated using a lifecycle configuration
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

S3 Lifecycle Rules

A
  • Transition Rules
    • defines when objects are transitioned to another storage class
  • Expiration Rules
    • Configure objects to expire after some time
      • can be used to delete old versions of files if versioning is enabled
      • can be used to delete incomplete multi part uploads
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

S3 Versioning

A
  • Enable at bucket level
  • Same key overwrite will increment the version
  • Use Cases : 1 Protect against unintended delete 2 Easy roll back
  • Any file is not versioned prior to enabling versioning will have version “null”
  • Suspending versioning does not delete the previous versions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

S3 Replication

A
  • Must enable versioning in source and destination
  • Cross Region Replication
  • Same Region Replication
  • Buckets can be in different account
  • Copying is asynchronous
  • After activation, only new objects are replicated
  • Optionally, you can replicate existing objects using S3 Batch Replication to replicate existing objects and objects that failed replication
  • For DELETE operation
    • can replicate delete markers from source to target
    • deletions with a version ID are not replicated
  • There is no chaining replication
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

S3 Performance

A
  • Durability 99.999999999%
  • Availability 99.99%
  • 100-200ms latency
  • 3500 Put Copy Post Delete /sec /prefix in a bucket
  • 5500 Get Head /sec /prefix in a bucket
  • Multi Part Upload
    • Recommended for files > 100 MB
    • Required for files > 5GB
  • S3 Transfer Acceleration
    • Increase transfer speed by transferring file on an AWS edge location which will forward the data to the S3 Bucket in the target region
    • Compatible with multi-part upload
  • S3 Byte-Range Fetches
    • Parallelize GETs by requesting specific byte ranges
    • Better resilience in case of failures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

S3 KMS

A
  • SSE-KMS will be impacted by KMS limit
  • When upload, calls GenerateDataKey KMS API
  • When download, calls Decrypt KMS API
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SSE-S3

A
  • “x-amz-server-side-encryption” : “AES256”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SSE-KMS

A
  • Pros : Use Control + Audit Trail
  • “x-amz-server-side-encryption” : “aws:kms”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

SSE-C

A
  • HTTPS must be used
  • Encryption key must be provided in HTTP header
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

S3 Bucket Settings for Block Public Access

A
  • Block public access to buckets and objects
  • Block public and cross-account access to buckets and objects thr any public bucket or access point policies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

S3 Event Notification with Amazon EventBridge

A
  • Advanced Filtering options with JSON rules (metadata, object size, name, …)
  • Multiple Destinations –> Step Functions, Kinesis Streams/Firehose
  • EventBridge Capability –> Archive, Replay Events, Reliable delivery
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

NoSQL

A
  • Distributed
  • Support limited query joins
  • NoSQL databases scale horizontally
18
Q

DynamoDB

A
  • Fully managed
  • Highly available with replication across multiple AZs
  • Scales to massive workloads and distributed databases
  • Millions of requested per second
  • 100TBs storage
  • Standard and Infrequent Access Table Class
19
Q

DynamoDB Basics

A
  • DynamoDB is made of Tables
  • Each table has a primary key
  • Each table can have an infinite number of items
  • Each item has attributes
  • max size of an item is 400KB
  • 3 Data Types (Number, String, Binary in both scalar and multi-valued sets)
  • Supports document stores such as JSON, XML, HTML
20
Q

DynamoDB Primary Key

A
  • Choose the column which has highest cardinality
  • Partition Key (HASH)
    • partition key must be unique for each item
    • partition key must be diverse so that the data is distributed
  • Partition Key+Sort Key (HASH+RANGE)
    • Combination must be unique for each item
    • Data is grouped by partition key
21
Q

DynamoDB Read Write Capacity Mode

A
  • Provisioned Mode
    • Throughput can be exceeded temporarily using “Burst Capacity”
    • If Burst Capacity has been consumed, you will get a “ProvisionedThroughputExcceded” Exception
  • On-Demand Mode
  • Switch between different modes once every 24 hours
22
Q

DynamoDB Write Capacity Unit

A
  • 1 WCU represents 1 write per second for an item up to 1KB in size
23
Q

DynamoDB Strongly Consistent Read vs Eventually Consistent Read

A
  • SCR
    • If we read after a write, we will get the correct data
    • Set ConsistentRead to True
    • Consumes twice the RCU
  • ECR
    • If we read just after a write, it is possible to get some stale data because of replication
24
Q

Read Capacity Unit

A
  • 1 RCU represents 1 Strongly Consistent Read per second
  • 1 RCU represents 2 Eventually Consistent Read per second
  • for an item up to 4KB
25
Q

DynamoDB Throttling

A
  • If we exceed provisioned WCU or RCU, we get “ProvisionedThroughputExceeded” Exception
  • Solution
    • Distributie Partition Key
    • Use DynamoDB Accelerator (DAX)
26
Q

DynamoDB Writing Data

A
  • PutItem
    • Creates a new item or fully replace an old item
  • UpdateItem
    • Edits an existing items attribute or adds a new item if it does not exist
27
Q

DynamoDB Reading Data

A
  • GetItem
    • Read based on Primary Key
    • Primary Key can be HASH or HASH+RANGE
    • Eventual Consistent Read
    • Returns up to 1MB
  • Scan
    • Scan the entire table and filter out data
    • Can use Parallel Scan
28
Q

DynamoDB Batch Operation

A
  • Save latency by reducing the number of API calls
  • Operations are done in parallel for better efficiency
  • BatchWriteItem
    • Up to 25 PutItem or DeleteItem in one call
    • Up to 16MB of data written and up to 400KB per item
    • Cannot update items
  • BatchGetItem
    • Return items from one or more tables
    • Up to 100 items and up to 16 MB per item
    • Items are retrieved in parallel to minimize latency
29
Q

DynamoDB Local Secondary Index

A
  • Alternative Sort Key
  • Up to 5 Local Secondary Indexes per table
  • Must be defined at table creation time
30
Q

DynamoDB Global Secondary Index

A
  • Alternative Primary Key (HASH or HASH+RANGE) from base table
  • Speed up queries on non-key attributes
  • Must provision RCUs and WCUs for the index
  • Can be added / modified after table creation
31
Q

DynamoDB Indexes and Throttling

A
  • GSI
    • If writes are throttled on GSI, then the main table will be throttled
  • LSI
    • Use the WCUs and RCUs of the main table
    • No special throttling considerations
32
Q

DynamoDB PartiQL

A
  • Use a SQL like syntax to manipulate DynamoDB tables
  • Support some insert, update, select and delete statements
33
Q

DynamoDB Accelerator

A
  • Fully managed in-memory cache for DynamoDB
  • Microseconds latency for cached reads and queries
  • Does not require application logic modification (compatible with existing DynamoDB APIs)
  • 5 mins TTL for cache
  • Up to 10 nodes in the cluster
  • Multi-AZ
34
Q

DynamoDB Streams

A
  • Ordered stream of item-level modifications (create/update/delete) in a table
  • Stream records can be sent to KDS
  • Retention up to 24 hours
  • Ability to choose the information that will be written to the stream
    • KEYS_ONLY : Only the key attributes of the modified item
    • NEW_IMAGE : the entire new item
    • OLD_IMAGE : the entire old item
    • NEW_AND_OLD_IMAGE : both new and old images of the item
  • DynamoDB Streams are made of shards just like Kinesis Data Streams
  • Records are not retroactively populated in a stream after enabling it
35
Q

DynamoDB Streams and AWS Lambda

A
  • We need to define an Event Source Mapping to read from a DynamoDB Streams
  • We need to ensure the Lambda function has the appropriate permissions
  • Lambda function is invoked synchronously
36
Q

DynamoDB Time To Live

A
  • Automatically delete items after an expiry timestamp
  • Does not consume WCUs
  • TTL attribute must be a “Number” data type with “Unix Epoch timestamp” value
  • Expired items deleted within 48 hours of expiration
  • Expired items that have not been deleted will appear in reads/queries/scans
  • Expired items are deleted from both LSIs and GSIs
  • A delete operation for each expired item enters the DynamoDB Streams
37
Q

DynamoDB Security

A
  • Security
    • VPC Endpoint
  • EAR : KMS
  • EIF : SSL / TLS
  • BackUp
    • Point in time restore like RDS
38
Q

AWS ElastiCache

A
  • To manage Redis or Memcached
  • Caches are in memory database with high performance and low latency
  • Helps reduce load off databases for read intensive workloads
  • Multi AZ with Failover Capability
  • AWS takes care of OS maintenance / patching, optimizations, setup etc
39
Q

Redis

A
  • In-memory key-value store
  • Super low latency (sub ms)
  • Cache survive reboots by default
  • Multi AZ with Automatic Failover for disaster recovery
  • Support for Read Replicas
  • Use Cases : Gaming, Relieve pressure on databases, etc
40
Q

Memcached

A
  • Memcached is an in-memory object store
  • Cache does not survive reboots
  • Overall, Redis is better