Storage Flashcards
(37 cards)
Your big data application is taking a lot of files from your local on-premise NFS storage and inserting them into S3. As part of the data integrity verification process, the application downloads the files right after they’ve been uploaded. What will happen?
The application will receive a 200 as S3 for new PUT is strongly consistent
You are gathering various files from providers and plan on analyzing them once every month using Athena, which must return the query results immediately. You do not want to run a high risk of losing files and want to minimise costs. Which storage type do you recommend?
S3 Infrequent Access
As part of your compliance as a bank, you must archive all logs created by all applications and ensure they cannot be modified or deleted for at least 7 years. Which solution should you use?
Glacier with a Vault Lock Policy
You are generating thumbnails in S3 from images. Images are in the images/ directory while thumbnails in the thumbnails/ directory. After running some analytics, you realized that images are rarely read and you could optimise your costs by moving them to another S3 storage tiers. What do you recommend that requires the least amount of changes?
Create a Lifecycle Rule for the images/prefix
In order to perform fast big data analytics, it has been recommended by your analysts in Japan to continuously copy data from your S3 bucket in us-east-1. How do you recommend doing this at a minimal cost?
Enable Cross Region Replication
Your big data application is taking a lot of files from your local on-premise NFS storage and inserting them into S3. As part of the data integrity verification process, you would like to ensure the files have been properly uploaded at minimal cost. How do you proceed?
Compute the local ETag for each file and compare them with AWS S3’s ETag
Your application plans to have 15,000 reads and writes per second to S3 from thousands of device ids. Which naming convention do you recommend?
/yyyy-mm-dd/… (you get about 3k reads per second per prefix, so using the device-id will help having many prefixes and parallelize your writes)
You are looking to have your files encrypted in S3 and do not want to manage the encryption yourself. You would like to have control over the encryption keys and ensure they’re securely stored in AWS. What encryption do you recommend?
SSE-KMS
Your website is deployed and sources its images from an S3 bucket. Everything works fine on the internet, but when you start the website locally to do some development, the images are not getting loaded. What’s the problem?
S3 CORS
What’s the maximum number of fields that can make a primary key in DynamoDB?
2 (partition key + sort key)
What’s the maximum size of a row in DynamoDB ?
400 KB
You are writing item of 8 KB in size at the rate of 12 per seconds. What WCU do you need?
96 (8x12)
You are doing strongly consistent read of 10 KB items at the rate of 10 per second. What RCU do you need?
30 (10 KB gets rounded to 12 KB, divided by 4KB = 3, times 10 per second = 30)
You are doing 12 eventually consistent reads per second, and each item has a size of 16 KB. What RCU do you need?
24 (we can do 2 eventually consistent reads per seconds for items of 4 KB with 1 RCU)
We are getting a ProvisionedThroughputExceededExceptions but after checking the metrics, we see we haven’t exceeded the total RCU we had provisioned. What happened?
We have a hot partition / hot key (remember RCU and WCU are spread across all partitions)
You are about to enter the Christmas sale and you know a few items in your website are very popular and will be read often. Last year you had a ProvisionedThroughputExceededException. What should you do this year?
Create a DAX cluster
You would like to react in real-time to users de-activating their account and send them an email to try to bring them back. The best way of doing it is to…
Integrate Lambda with a DynamoDB stream
You would like to have DynamoDB automatically delete old data for you. What should you use?
Use TTL
You are looking to improve the performance of your RDS database by caching some of the most common rows and queries. Which technology do you recommend?
ElastiCache
Which operation/feature or service would you use to locate all items in a table with a particular sort key value? (Choose 2)
- GetItem
- Query with a local secondary index
- Scan against a table, with filters
- Query with a global secondary index
- Query
- Scan against a table, with filters
- Query with a global secondary index
(Local secondary indexes can’t be used: they only allow an alternative sort key, and query can only work against 1 partition key, with a single or range of sort. Global secondary indexes will allow a new index with the sort key as a partition key, and query will work. Scan will allow it, but is very inefficient. GetItem wont work: it needs a single P-KEY and S-KEY.)
You have an application based on the Amazon Kinesis Streams API, and you are not using the Kinesis Produce Library as part of your application. While you won’t be taking advantage of all the benefits of the KPL in your application, you still need to ensure that you add data to a stream efficiently. Which API operation allows you to do this?
- PutItems
- PutRecord
- PutItem
- PutRecords
PutRecords
(The PutRecords operation writes multiple data records into an Amazon Kinesis stream in a single call. Use this operation to send data into the stream for data ingestion and processing.)
What are the max deliverables from one Dynamo DB Partition.
1,000 WCU, 3,000RCU, 10GB Data volume
(DynamoDB is capable of delivering 1,000 WCU, 3,000 RCU and 10GB of data from a single partition – any more causes additional partitions to be created and data split between them. Further information: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GuidelinesForTables.html#GuidelinesForTables.Partitions)
Which of the following statements is true?
- A shard supports up to 1000 transactions per second for reads, and 5 transactions per second for writes.
- A shard supports up to 5 transactions per second for reads, and 10 records per second for writes.
- A shard supports up to 5 transactions per second for reads, and 100 records per second for writes.
- A shard supports up to 5 transactions per second for reads, and 1000 records per second for writes.
A shard supports up to 5 transactions per second for reads, and 1000 records per second for writes.
(Each shard can support up to 5 transactions per second for reads, and up to 1,000 records per second for writes.)
The Kinesis Connector Library allows you to emit data from a stream to various AWS services. Which of the following services can receive data emitted from such a stream? (Choose 4)
- DynamoDB
- S3
- Elasticsearch
- Redshift
- Lambda
- RDS
- DynamoDB
- S3
- Elasticsearch
- Redshift
(The Kinesis Connector Library includes implementations for use with Amazon DynamoDB, Amazon Redshift, Amazon S3, and Elasticsearch. If you want to use Lambda with Kinesis Streams, you need to create Lambda functions to automatically read batches of records off your Amazon Kinesis stream and process them if records are detected on the stream. AWS Lambda then polls the stream periodically (once per second) for new records.)