Module 5 - Storage & Transfer Flashcards
What are the 3 kinds of storage? What AWS services handle each?
Block (EBS, instance store)
File (AWS EFS, FSx)
Object (S3, Glacier)
What are some ways to migrate data online?
AWS Storage Gateway Kinesis (Firehose and Streams) DataSync S3 Transfer Acceleration AWS Direct Connect
What are some ways to migrate data offline?
AWS Snow family
What is AWS S3? How do you change data?
Simple Storage Service. It is object-level storage, meaning if you want to change a part of a file, you must make the change and then reupload the entire modified file. Max 5TB per file.
How can you access S3?
Through the web-based AWS Management Console,
or programmatically through the API and SDKs
What is an S3 object made up of?
- key (full path to file including folders),
- the file itself (value),
- version ID (if enabled)
- any metadata that describes the file (key/value pairs),
- tags (Unicode key/value pair, for security/lifecycle)
Identify the parts of this S3 URL
http://doc.s3-us-west-1.amazonaws.com/2006-03-01/AmazonS3.html
“doc” is the NAME of the bucket
“2006-03-01/AmazonS3.html” is the KEY
(“2006-03-01/” in the object key is called the PREFIX)
How do buckets work?
How many buckets can an account have?
• Objects are stored in buckets.
• 1 AWS account can have 1-100 buckets.
• You can choose one REGION and control access.
You can access bucket logs.
How do you control access to a bucket? What is the default access to a bucket?
Everything is PRIVATE by default. The account that created the resource can grant permissions by writing access policies (CONTROLLED ACCESS).
You can also make it PUBLIC if necessary (uncommon).
How do you make sure your buckets are never exposed to public access?
Turn on “block all public access” at the account level. These settings apply account-wide for all current and future buckets.
What are the ways that access is granted to a bucket?
ACLs and bucket policies.
How can you recover objects from accidental deletion or overwrite?
1) Enable versioning on your bucket.
2) S3 Object Lock - uses the write once, read many (WORM) model.
• Use retention periods for locking an object for a fixed period of time
• use Legal Hold for a lock until explicitly removed.
What are the different S3 storage classes? What are they used for?
- S3 Standard - general-purpose storage of frequently accessed data
- S3 Intelligent-Tiering for data with unknown or changing access patterns (uses ML to determine your needs)
- S3 Standard-Infrequent Access (S3 Standard-IA) and S3 One Zone-Infrequent Access (S3 One Zone-IA which is cheaper still) for long-lived, but less frequently accessed data. Cheap if you access less than once a month. 50% less than standard. There is a retrieval fee.
- S3 Glacier and S3 Glacier Deep Archive for long-term archive and digital preservation
What is an archive?
Any object stored in a vault in S3 Glacier. It has a unique ID and optional description. When you store it, Glacier returns a regionally unique archive ID.
What do you use to manage S3 Glacier vaults?
Via the management console to create and delete.
For everything else, use the CLI, REST API, or SDKs
What is a vault?
A container for storing archives.
You specify its name and region.
You can lock it with Vault Lock. (for compliance; data and lock can’t be removed)
What are the retrieval times for S3 Glacier?
Instant retrieval - milliseconds
Flexible Retrieval:
•Expedited - 1-5 minutes
• Standard - 3-5 hours
• Bulk - 5-12 hours
Deep Archive:
• standard - 12 hours
• bulk - 48 hours
What is an S3 Lifecycle Policy in Lifecycle Management?
An automated system to move (transition) or delete (expire) your data based on age. (Saves you money on storage).
You can set rules per object or per bucket.
Works with versioning.
What are 3 ways to encrypt data at rest in S3?
1) SSE-S3: Server-side encryption (SSE) with Amazon S3-managed keys. AWS handles the key, uses AES-256 algorithm. How? Put in the header: “x-amz-server-side-encryption”:”AES256”
2) SSE-KMS: Server-side encryption with AWS KMS keys (KMS keys) stored in AWS KMS. Envelope encryption, you and AWS manage the keys. Why? You can control who has access and you get an audit trail. Put in the header: “x-amz-server-side-encryption”:”KMS”
3) SSE-C: Server-side encryption with customer-managed keys. You manage the keys. Must use HTTPS. CLI only. Must include key in header because it’s discarded every time.
How does S3 handle replication?
All data is replicated in at least 3 AZs (except for S3 One Zone-IA).
What is S3 Transfer Acceleration? When would you choose this?
It’s a way to move data faster over long distances. It uses CloudFront global edge locations with a distinct URL (…s3-accelerate…). Once it’s uploaded, it is automatically routed to S3 using an optimized network path (AWS backbone network).
You only get charged if there is a performance improvement. Enabled at the bucket level.
Good for when :
• you have customers worldwide using the same bucket
• you transfer giga- or terabytes of data worldwide
⭐️ What is S3 multipart upload? When would you want to do this?
This is a way to break large objects up into manageable parts. Once they are uploaded, S3 reassembles the object. You can’t do this with the console. Recommended for files > 100 MB. Required if > 5 GB.
Use for:
• Improved throughput: You can upload parts in parallel to improve throughput.
• Quick recovery from any network issues: Smaller part sizes minimize the impact of restarting a failed upload due to a network error.
• Pausing and resuming object uploads: You can upload object parts over time. When you have initiated a multipart upload, there is no expiration. You must explicitly complete or cancel the multipart upload.
• Beginning an upload before you know the final object size: You can upload an object as you are creating it.
• Uploading large objects: Using the multipart upload API, you can upload large objects, up to 5 TB.
What are S3 Access Points?
Named network endpoints that you can use to perform S3 object operations, such as GetObject and PutObject.
They each have their own policy for permissions and network controls.
These only work for objects, not S3 operations like modifying buckets.
What is Object Lambda?
You add your own code to process data from a GET request Access Point.
E.g. convert data format (XML to JSON), resize images, augment data with another service, etc.