Storage & Access Flashcards

1
Q

What is EBS?

A

Elastic Block Storage is a storage volume you can attach to your instances while the run to persist data.

  • It’s a network drive, NOT a physical drive
  • locked to a single AZ, but can be copied using a snapshot
  • provisioned by size in GBs and IOPS (I/O Ops Per Sec)
  • can be attached to only one instance at a time
  • EBS root volume is removed when EC2 instance is terminated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the EBS types?

A
  • GP2 (SSD): General purpose,
    can be used for boot volume,
    3 IOPS/GB, min 100, max 3000,
    1GB-16TB
  • IO1 (SSD): Highest performing for high throughput workloads,
    can be used for boot volume,
    provisioned IOPS, min 100, max 64,000 (Nitro) or 32,000 (other)
    4GB-16TB (no tie to IOPS)
  • ST1 (HDD): low cost designed for frequent intensive workload
    500GB-16TB
    500MB/sec throughput
  • SC1 (HDD): lowest cost less frequent workloads
    500GB-16TB
    250MB/sec throughput
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the characteristics of an EBS snapshot?

A
  • they are incremental (only changed blocks) snapshots of the data on the volume
  • they use IO so should not be done while app is running
  • stored in S3
  • can be copied across AZ or Region
  • can make AMI from snapshot
  • can be automated using Data Lifecycle Manager
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does EBS Encryption work?

A
  • Leverages keys from KMS
  • includes data at rest and in-flight and snapshots
  • steps
    1. create snapshot
    2. encrypt snapshot using copy
    3. create new EBS from snapshot which will automatically encrypt the volume
    4. Attach the encrypted volume to the original instance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an IS?

A

Instance Store is ephemeral block storage, physically attached to the machine

  • provides better I/O performance than any of the EBS types
  • up to 7.5TB, or 30TB with stripping
  • backups are our responsibility
  • can’t be resized
  • lost once the instance is stopped or terminated
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the EBS RAID options?

A

RAID 0: 1 logical volume but multiple volumes are combined to make a larger volume
RAID1: 1 logical volume but writes data to two or more volumes for fault tolerance
RAID5: not recommended for EBS
RAID6: not recommended for EBS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is EFS?

A

A managed Network File System (NFS) that can be mounted on many EC2s.

  • Multi AZ
  • access controlled by security groups
  • scales automatically
  • pay per use
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the performance modes of EFS?

A
  • General purpose (default)

- Max I/O

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are storage tiers?

A

A lifecycle management feature for moving storage after N days.

  • Standard for frequently accessed files
  • Infrequent Access (EFS-IA) for lower cost storage of files rarely accessed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can i instantiate an EBS volumes quickly?

A

Restore from a snapshot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is S3?

A

Simple Storage Service which is a bucket storage system.

  • must have a globally unique name
  • buckets are defined at Region level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an S3 Key?

A

A full path to the buckets file composed of a prefix + object name.

  • e.g. s3://my-bucket/my_file.txt
  • Note: the ‘/’ does not indicate a folder path, it is simply a very long prefix name.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the largest object size that can be uploaded in a S3?

A

5TB. Anything larger will require a multi-part upload

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the components of an S3 bucket?

A
  • Metadata (list of text key/value pairs for system or user)
  • Tags (unicode key/value pair for security or lifecycle)
  • Version ID (if versioning is enabled provides easy rollback to previous version)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happens to existing files in an S3 bucket when versioning is turned on?

A

Nothing, the version will remain as null.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens to existing files in an S3 bucket when versioning is suspended?

A

Nothing, all previous version will remain available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the 4 methods of encryption for S3?

A
  • SSE-S3
  • SSE-KMS
  • SSE-C: Client Encryption Key Management on AWS
  • Client side encryption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is SSE-S3?

A

AWS encryption of S3 objects

  • uses AES-256 keys
  • handled by AWS
  • server side encryption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is SSE-KMS?

A

AWS encryption Key Management Service

  • handled by AWS
  • provides user control and audit trail
  • server side encryption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is SSE-C

A

AWS Client Encryption Key Management

  • keys fully managed by the customer outside of AWS
  • S3 does not store the key provided by user
  • key passed in (HTTPS only) header for every request made
  • server side encryption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Client side encryption?

A

Encryption and Decryption done on client side

  • encryption done before adding to S3
  • decryption done after retrieving from S3
  • client side encryption
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is another name for encryption in-flight?

A

SSL/TLS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the types of S3 Security?

A

USER BASED:
- IAM policies

RESOURCE BASED:

  • Bucket Policies
  • Object Access Control List (ACL)
  • Bucket Access Control List (ACL) - less common
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the settings in an S3 Bucket Policy?

A

Resources (i.e. buckets and objects)
Actions (Set of API methods: i.e. get, put, etc.)
Effect (allow or deny)
Principal (account or user)f

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

How can we combat Company data leaks on an S3 bucket?

A

Block public access to buckets and objects granted through

  • NEW access control lists (ACLs)
  • ANY access control lists (ACLs)
  • NEW public bucket or access point policies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the most common features supported by S3?

A
  • Networking (i.e. VPC endpoints without internet)
  • Logging (stored in another S3 bucket)
  • Auditing (API calls logged in CloudTrail)
  • MFA (multi-factor authentication)
  • Pre-signed URLs for a limited duration (max 3600 sec)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Can S3 host a website?

A

Yes, but be sure to make it public if needed externally

i.e. .s3-website..amazonaws.com

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is CORS?

A

Cross-Origin Resource Sharing which means you can visit other websites through a main site using CORS header in the request and the other website MUST allow access.

i.e. http://www.example.com to http://other.example.com

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is the consistency model for S3?

A
  • Immediate for new PUTs (PUT 200 –> GET 200)
  • Eventually for GET before a put (GET 404 –> PUT 200 –> GET 404)
  • Eventually for DELETES and PUTS of existing objects
  • no way to enforce ‘strong consistency’
30
Q

What is S3 MFA-Delete?

A

A means for the bucket owner (root account) to enable MFA before deletes can be performed on an object in an S3 bucket. Requires versioning to be enabled as well.

31
Q

What security is evaluated first? Default Encrytpion or Bucket Policies?

A

Bucket Policies

32
Q

What are S3 Access logs?

A

A log of any requests made to S3, from any account, authorized or denied. Must be stored in a separate S3 Bucket to avoid an infinite logging loop.

33
Q

What are the characteristics of an S3 Bucket replication?

A
  • Can be SRR (Same Region Replication) or CRR (Cross Region Replication)
  • Versioning must be enabled in source and destination bucket
  • Buckets can be in different accounts
  • Copying is asynchronous
  • Must have proper IAM permissions to S3
  • only new objects are replicated after activation
  • Deletes are not replicated
  • Does not support replication chaining (i.e. Bucket 1 –> Bucket 2 –> Bucket 3)
34
Q

What are the storage classes of an S3 Bucket?

A
  • Standard (General Purpose)
  • Standard Infrequent Access (IA)
  • One Zone Infrequent Access
  • Intelligent Tiering
  • Glacier
  • Glacier Deep Archive
  • Reduced Redundancy Storage (deprecated)
35
Q

What are the characteristics of S3 Standard?

A
  • Durability: HIGH (99.999999999%) Multi-AZ
  • Availability: 99.99% over a given year
  • Duration: N/A
  • Cost: Retrieval + GB/month fees
  • Sustainability: 2 concurrent facility failures
  • Use Cases: Big Data Analytics, mobile & gaming apps, and content distribution
36
Q

What are the characteristics of S3 IA?

A
  • Durability: HIGH (99.999999999%) Multi-AZ
  • Availability: 99.99% over a given year
  • Duration: min 30 days
  • Cost: Retrieval + GB/month fees (min 128KB per object)
  • Sustainability: 2 concurrent facility failures
  • Use Cases: data store for disaster recovery, backups
37
Q

What are the characteristics of S3 One Zone IA?

A
  • Durability: HIGH (99.999999999%) Single-AZ
  • Availability: 99.5% over a given year
  • Duration: min 30 days
  • Cost: Retrieval + GB/month fees (min 128kb per object)
  • Latency: Low
  • Throughput: High
  • Supports: SSL for data transit and encryption at rest
  • Use Cases: Secondary backups of on-prem data, storing data that can be recreated
38
Q

What are the characteristics of S3 Intelligent Tiering?

A
  • Durability: HIGH (99.999999999%) Multi-AZ
  • Availability: 99.9% over a given year
  • Duration: min 30 days
  • Cost: monitoring + Retrieval + GB/month fees
  • Latency: Low
  • Throughput: High
  • Automatically moves objects between two access tiers based on changing access patterns
39
Q

What are the characteristics of S3 Glacier?

A
  • Durability: HIGH (99.999999999%) Multi-AZ
  • Availability: 99.9% over a given year
  • Duration: min 90 days
  • Cost: Retrieval + GB/month fees (min 40KB per object)
  • Use Cases: longer term storage (10+ years)
  • Each item stored is called an ‘archive’ (up to 40TB)
  • Archives are stored in ‘vaults’
40
Q

What are the characteristics of S3 Glacier Deep Archive?

A
  • Durability: HIGH (99.999999999%) Multi-AZ
  • Availability: 99.9% over a given year
  • Duration: min 180 days
  • Cost: Retrieval + GB/month fees (min 40KB per object)
  • Use Cases: Big Data Analytics, mobile & gaming apps, and content distribution
41
Q

What are the retrieval options for S3 Glacier?

A
  • Expedited (1-5 min)
  • Standard (3-5 hours)
  • Bulk (5-12 hours)
42
Q

What are the retrieval option for S3 Galcier Deep Archive?

A
  • Standard (12 hours)

- Bulk (48 hours)

43
Q

How can you automate moving between storage classes?

A
  • Using a lifecycle configuration (using rules managed by us)
  • Using an S3 Intelligent Tiering class (limited to movement between two tiers)
44
Q

What are the charactersitic of an S3 Lifecycle configuration?

A
  • Transition action defines when objects are moved to another storage class
  • Expiration action defines when objects expire and are deleted
  • Prefix defines the object path (aka prefix) of the objects to be actioned (i.e. s3://mybucket/mp3/*)
  • Tags defines which objects actionable based on its set tags
45
Q

Are there limits to the number of prefixes in a bucket?

A

No

46
Q

What is the retrieval speed of a prefix within a bucket?

A
  • 3500 PUT/COPY/POST/DELETE per second
  • 5500 GET/HEAD per second

So spreading across multiple prefixes can give you a greater retrieval speeds. Note: these timings will be impacted if KMS encryption is included

47
Q

How can i increase the upload performance on an S3 bucket?

A
  • Use Multi-part uploads by breaking down large files into multiple files that can be uploaded simultaneously
  • Use AWS edge locations which will automatically forward the data to the S3 bucket in the target region. Note: it is also compatible with Multi-part uploads
48
Q

How can I increase the download (GET) performance on an S3 bucket?

A
  • Use specific byte range requests
49
Q

How can I increase network transfer and CPU cost when using an S3 Glacier bucket?

A
  • Use Glacier Select. It can retrieve the data using SQL by performing server side filtering and allows us to filter by rows and columns
50
Q

What are S3 event notifications?

A

They are notifications that are used to let you know when specific events have occurred on your objects. Such as object creation, deletion, restoration, replication, etc.

51
Q

What is Athena?

A

A serverless service that performs analytics DIRECTLY against S3 files using a SQL query language

52
Q

What are the characteristics of Athena?

A
  • has JDBC/ODBC driver
  • Cost: per query + amount of data scanned
  • Supports: CSV, JSON, ORC, Avro and Parquet
53
Q

How can i keep an S3 object from being deleted?

A
  • S3 Object Lock
    • Adopt WORM (Write Once Read Many)
    • Block an object version deletion for a specified time
  • S3 Glacier Vault Lock
    • Adopt WORM (Write Once Read Many)
    • Lock the policy for future edits (can no longer be changed)
54
Q

What is Snowball?

A

An alternative way to move large amounts of data (TB or PB) over the network. If it takes more than a week to transfer the data over the network, use Snowball.

55
Q

What are the characteristics of Snowball?

A
  • Secure, tamper resistant, uses KMS 256 bit encryption
  • can be tracked using SNS and text messages using an E-ink shipping label
  • Cost: per data transfer job
  • Use Cases: large data cloud migrations, DC decommission, disaster recovery
56
Q

How does Snowball work?

A
  1. Request snowball device from the AWS console for delivery
  2. Install the snowball client on our servers
  3. Connect the snowball to your servers and copy files using the client
  4. Ship back the device when your’re done (goes to the right AWS facility)
  5. Data will be loaded into an S3 bucket
  6. Snowball completely wiped
  7. Tracking is done using SNS, text messages and the AWS console
57
Q

What is Snowball Edge?

A

The same as Snowball but with added computational capability on the device. Very useful for pre-processing data while it is moving.

  • supports a custom EC2 AMI so you can perform processing on the go
  • supports custom Lambda functions
  • capacity: 100 TB
    • storage optimized (24 vCPU)
    • compute optimized (52 vCPU & optional GPU)
58
Q

What is Snowmobile?

A

The same as Snowball but has a much larger capacity.

  • Each Snowmobile has 100 PB capacity
  • Transfers exabytes of data ( 1 EB = 1,000 PB = 1,000,000 TB) when you use multiple Snowmobiles in parallel.
59
Q

Can Snowball be imported into Glacier directly?

A

No, it requires the use of S3 first and a lifecycle policy

60
Q

What does Hybrid Cloud mean?

A

Part of your infrastructure is on the cloud and part of it is on-premise.

  • Use Cases: Security requirements, compliance requirements, IT strategy, long cloud migrations
61
Q

How do I expose S3 data on-premise?

A

AWS Storage Gateway

62
Q

What are Cloud Native storage options?

A
  • Block: EBS, EC2
  • File: EFS
  • Object: S3, Glacier
63
Q

What is the AWS Storage Gateway?

A

A bridge between on-premise data and cloud data in S3.

64
Q

What are the Storage Gateway types?

A
  • File Gateway: NFS (backed by S3)
  • Volume Gateway: iSCSI (backed by S3 and EBS snapshots)
  • Tape Gateway: iSCSI (backed by S3 and Glacier)
65
Q

What are the characteristics of a File Gateway?

A
  • Configured S3 buckets (std, IA, OneZone IA) are accessible through NFS and SMB protocol
  • Uses IAM roles for each gateway
  • most recently used data is cached
  • can be mounted on many servers
66
Q

What are the characteristics of a Volume Gateway?

A
  • uses Block Storage (iSCSI protocol backed by S3)
  • backed by EBS snapshots which can help restore on-premise volumes
  • cached volumes: low latency access to most recent data
  • stored volumes: entire dataset is on premise, scheduled backups to S3
67
Q

What are the characteristics of a Tape Gateway?

A
  • backed by S3 and Glacier
  • back up data using existing tape-base processes (and iSCSI interface)
  • works with leading backup software vendors
68
Q

What is Amazon FSx for Windows?

A

A fully managed Windows file system share drive

  • Microsoft AD integration, ACLs, user quotas
  • Built on SSD, scales to 10s of GB, millions of IOPS, 100s PB of data
  • can be accessed from your on-premise infrastructure
  • can be configure to be Multi AZ for high availability
  • data is backed-up daily to S3
69
Q

What is Amazon FSx for Lustre?

A

A type of parallel distributed file system for large-scale computing. Name is derived from ‘Linux’ and ‘cluster’

  • Use Cases: Machine Learning, High Performance Computing (HPC), Video Processing, Financial Modeling, Electronic Design Automation
  • Scales up to 100s GBs, millions of IOPS, sub-ms latencies
  • Seamless integration with S3
  • can be used from your on-premise infrastructure
70
Q

What the current storage options from AWS?

A
  • S3: Object Storage
  • Glacier: Object Archival
  • EFS: NFS for Linux, POSIX file system
  • FSx for Windows: NFS for Windows
  • FSx for Lustre: High Performance computing Linux file system
  • EBS Volume: Network storae for one EC2 (high IOPS)
  • Instance Storage: Physical storage for EC2
  • Storage Gateway: File, Volume (cached & stored), or Tape Gateway
  • Snowball/Snowmobile: to move large amounts of data to the cloud, physically
  • Database: for querying and index data