Data Stores Flashcards

1
Q

What are the 3 types (concepts) of data store in AWS?

A

1) Persistent datastore
2) Transient datastore
3) Ephemeral datastore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Define persistent data storage and give 2 examples…

A

Data that is durable and sticks around after a reboot, restart or power cycles

e.g. Glacier, RDS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define a transient data store and give 2 examples…

A

Data is just temporary stored and passed along to another process or persistent store

e.g. SQS, SNS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define an ephemeral data store and give 2 examples…

A

Data is lost when stopped.

e.g. EC2 instance store, Elasticache- Memcached

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does IOPS stand for and what does it measure?

A

IOPS- Input output Operations Per Second

It is a measure of how fast we can read and write to a device

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does throughput measure?

A

It is the measure of how much data can be moved at a time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two types of data storage consistency models?

A

1) ACID

2) BASE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does ACID stand for?

A

Atomic- Transactions are all or nothing
Consistent- Transactions must be valid
Isolated- Transactions can’t mess with one another
Durable- Completed transactions must stick around

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does BASE stand for?

A

Basic Availablility- Values available even if stale
Soft-state- Might not be instantly consistent across stores
Eventually consistent- Will achieve consistency at some point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why would you want a model (BASE) that was not consistent?

A

Because as accurate and precise ACID is they don’t scale very well.

BASE is not inconsistent just not parallel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of store is S3?

A

An Object store

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the maximum object size in S3 and what is the largest object in a single PUT?

A

Max object size is 5TB

Largest single put 5GB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How can you increase the efficiency of uploads with files larger than 100MB?

A

You can use multi-part uploads

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How are objects referenced in S3?

A

By a KEY, essentially a URL path like key.

s3:///finance/April/16/invoice.pdf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is S3’s consistency model for read-after-writes? and what does this mean in lay terms?

A

S3 provides read-after-write consistency for PUTS of new objects

If a new file is added that S3 has never seen before once written you can read it immediately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is S3’s consistency model for HEAD or GET requests of a KEY before the object exists? and what does this mean in lay terms?

A

HEAD or GET requests for a KEY before the object exists will result in eventual consistency.

Until an object has been fully written and replicated across AZs S3 will say that they don’t know what the object is. So I’ll let you read it eventually.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is S3’s consistency model for overwrite PUTS and DELETES of objects? and what does this mean in lay terms?

A

S3 offers eventual consistency for overwrite PUTS (updates) and DELETES.

S3 will serve the original object until it has updated or deleted the file and has replicated this change across all other AZs. It will serve the updated/delete once it has been fully replicated eventually.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is S3’s consistency model for updates to a single KEY? and what does this mean in lay terms?

A

Updates to a single KEY are atomic

Whoa there, only one person can update this object at a time. If I get two requests I’ll process them in order of their timestamps and you’ll see the updates as soon as I replicate them elsewhere.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the 3 methods of securing objects in an S3 bucket?

A

1) Resource-based (object ACL bucket policy)
2) User-based (IAM policies)
3) Object-based (Object ACL)
4) Optional MFA before delete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In what order does S3 evaluate the security access of an object?

A

User-based (IAM policy) > Resourced based (bucket policy) > Object-based (Object ACL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does versioning in S3 enable?

A

Enables “roll-back” and “un-delete” capabilities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Do you get charged for old versions of objects?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why use MFA in S3?

A

1) If you require safeguarding against accidental deletion of an object
2) If you would like to change the versioning state of your bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Why use cross-region replication in S3?

A

1) increased durability
2) reduced latency
3) To meet compliance requirements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are the 7 storage classes of S3? and what types of data are they suited for?

A

1) Standard- Frequently accessed
2) Standard IA- Long-lived, infrequently accessed
3) One Zone IA- Long-lived, non-critical
4) Reduced redundancy- Frequently accessed, non-critical
5) Intelligent tiering- Long-lived with changing or unknown access patterns
6) Glacier- Long-term data archiving with retrieval mins-hours
7) Glacier Deep Archive- Long term retrieval within 12-48 hours.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Why use S3 lifecycle management in S3?

A

1) optimise storage costs
2) Adhering to a data retention policy
3) Keep S3 volumes well-maintained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Name 4 ways S3 can be used in analytics…

A

1) Data lake concept- S3 data used as a data lake to be accessible to Athena, Redshift or quick sight
2) IoT streaming data repo- Stream data into Kinesis Firehose
3) Machine learning and AI storage- Rekognition, Lex, Mxnet
4) Storage class analysis- Analyses current usage… used by S3 management analytics to recommend areas where you can save

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Name the 3 encryption at rest options available with S3?

A

1) SSE-S3 - S3’s existing encryption key for AES-256
2) SSE-C - Upload your own custom AES-256 encryption key which S3 will use when it writes the objects
3) SSE-KMS - Use a key generated and managed by AWS key management service
4) Client-side - Encrypt objects using own local encryption process before uploading to S3 (i.e. PGP. GPG)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is transfer acceleration in S3?

A

A process of speeding up data uploads using CloudFront in reverse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What does the requester pays mean in S3?

A

The user pays for requests and data transfer rather than the owner.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is a tag in the context of S3?

A

Assign tags to objects for use in costing, billing and security etc…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is an event in the context of S3?

A

Events can be used when certain events happen in your S3 bucket (modification/add/delete). These events can trigger notifications to SNS, SQS or Lambda when certain events happen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is static web hosting in S3?

A

Simple and massively scalable static website hosting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How can BitTorrent be used with S3?

A

You can use BitTorrent protocol to retrieve any publically available object by automatically generating a .torrent file

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What type of data is AWS Glacier useful for?

A

Seldomly accessed data, cold storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Which hybrid cloud service uses Glacier for storage?

A

AWS storage gateway virtual tape library

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Is Glacier integrated with lifecycle cycle manager?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is a glacier vault?

A

A way to group archives together in S3 galcier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is an archive in Glacier?

A

Any object such as a photo, video or document. It is a base unit Glacier storage. Each archive has a unique ID and an optional description. This archive ID is unique in the AWS region the archive is stored.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is the max size of an archive?

A

40TB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are the two levels and ways access to a vault is controlled?

A

1) Resource-based- Vault access policy

2) Identity-based- IAM policies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What is a vault access policy? Give an example of it’s use…

A

Sets rules which vaults must abide by.

e.g. no one can delete an object or before anyone deletes an object they must use MFA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

How are IAM policies used for access to vaults? Also, Vault locks are ___….

A

Access managed though IAM give users permissions to administer a vault or to overwrite or delete a vault lock.

Immutable… They cannot be changed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are the 4 steps of locking a vault?

A

1) Create a lock
2) Initiate vault lock
3) wait 24 hours and then confirm the lock is performing
a) if lock confirmed the lock is applied forever… no changes
b) if the lock is not confirmed then the lock dissolves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is EBS? (2 points)

A

Elastic Block Storage. Essentially virtual hard drives. Can be unplugged and used with a different instance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Can EBS volumes be used in mutli-AZ

A

No, confined to a single AZ. Only one instance can access volume by default.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What backup strategy can you use with EBS volumes?

A

EBS snapshots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

When would you use an instance store over and EBS?

A

when you want very fast access e.g. cache/buffer/scratch.

EBS is over the network so not as fast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What are the 3 benefits of using EBS snapshots?

A

1) Provides a cost-effective and easy backup-snapshot
2) Easy to share data sets with other users/accounts
3) Easy to migrate a system to a volume a new AZ or region

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What are the 4 steps to convert an unencrypted volume to an encrypted volume?

A

1) take a snapshot
2) Use snapshot to create a new volume
3) Check encryption when creating
4) mount voume in EC2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What information is stored in a volume snapshot?

A

Changes only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Given that we have 1 snapshots 1,2,3. If we delete 2, do we loose 3?

A

No, we still have 1 and 3, but we cannot re-created 2 at that point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is a snapshot?

A

A collection of pointer data which is stored in S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What are the 2 ways we can use lifecycle manager to manage EBS snapshots?

A

1) Schedule snapshots to be created for volumes e.g. every hour
2) Set retention rules to remove stale snapshots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What is EFS? and what is it an implementation of?

A

Elastic File System. An implementation of NFS- Network File Share protocol

56
Q

What is the pay model for EBS?

A

You pay for a set about of GB per month, regardless of use!

57
Q

What is the pay model for EFS?

A

You only pay for the amount of storage you use

58
Q

Is EFS multi-AZ?

A

Yes

59
Q

How do EC2 instances access files on EFS?

A

Through mount points in one or many AZs

60
Q

Can you use EFS to mount on prem?

A

Yes, but caution here… you would need to have a stable connection e.g. direct connect or Amazon Data Sync with EFS sync

61
Q

How does EFS compare price-wise to EBS and S3?

A

3x more expensive than EBS

20x more expensive than S3

62
Q

What is Amazon Storage Gateway?

A

A virtural machine that you run on-prem with VMware. It provides local resources and backends onto S3 and Glacier

63
Q

What are 2 common use cases for Amazon Storage Gateway?

A

1) Disaster recovery

2) Cloud migrations

64
Q

What are the four types of Amazon Storage Gateway?

A

1) File gateway
2) Volume gateway stored mode
3) Volume gateway cached mode
4) Tape gateway

65
Q

What is file gateway and which interfaces does it allow?

A

Allows on prem to store objects in S3 via NFS or SMB mount points

NFS, SMB

66
Q

What is volume gateway and which interface does it use?

A

Asynchronous replication of on-prem to S3

iSCSI

67
Q

What is volume gateway cached mode and which interface does it use?

A

Primary data stored in S3 with frequently accessed data cached locally on prem

iSCSI

68
Q

What is the tape gateway and which interface does it use?

A

A virtual media change and tape library for use with existing backup software

iSCSI

69
Q

What is Amazon WorkDocs?

A

AWS’s version of DropBox or google drive

70
Q

When would you run a database on EC2?

A

When you want ultimate flexibility or a database that is not currently supported by RDS e.g. SAP HANNA

71
Q

What are the disadvantages or running your own database on EC2?

A

You are responsible for backup, patching and scaling…

72
Q

What RDS?

A

A manage option for mySQL, PostgreSQL, MariaDB,Aurora….

73
Q

What type of data is RDS most suited?

A

Structured and relational data

74
Q

What are the benefits of using and RDS? (3 points)

A

1) Automates backup and patching in customer-defined maintenance windows
2) push-button scaling
3) redundancy

75
Q

What service do you use if you need to store large binary objects (BLOBS)?

A

S3

76
Q

What service do you use if you need automated scalability for the data you want to store?

A

DynamoDB

77
Q

What service do you use if you need to store name/value data?

A

DynamoDB

78
Q

Which service do you use if you want to store data that not well structured or unpredictable?

A

DynamoDB

79
Q

Which service you use if you require a non-supported database such as SAP HANNA or you want complete control?

A

EC2

80
Q

What are the two types of multi-AZ replication available for RDS databases?

A

1) Synchronous replication

2) Asynchronous replication

81
Q

What is synchronous replication and how does a master and standby use this type of replication in a multi-AZ architecture?

A

Instant replication of data from a master to a standby in same AZ.

82
Q

What happens if a master RDS fails?

A

The standby RDS gets promoted to the master and has ALL of the data that the master had

83
Q

What is asynchronous replication and how does a read replica use this type of replication in a multi-AZ architecture?

A

Read replicas are seconds or mins behind the master.

84
Q

What happens if a region fails that contains a master and standby? (Read replicas are in a different region)

A

Read replicas are promoted to master and new standby created (This will be done manually)

85
Q

What is DynamoDB?

A

DynamoDB is a managed multi-AZ noSQL datastore with cross-region replication option.

86
Q

What is the consistency model for DynamoDB?

A

BASE, Eventual consistency by default.

87
Q

How does the pricing model work for DynamoDB?

A

Based on throughput

88
Q

How does autoscaling work for DynamoDB? what is alternative can you use to allow full scaling?

A

Set min/max level in anticipation of need. Can you on demand capacity if you do not know the amount of capacity you need.

89
Q

Can DynamoDB be ACID?

A

Yes you can force ACID

90
Q

What is an attribute (DynamoDB)?

A

A name and value pair

91
Q

What is an item (similar to a record) (DynamoDB)?

A

A collection of attributes

92
Q

What is a table (DynamoDB)?

A

A collection of items

93
Q

Each item has a partition (aka primary) key associated with it. What does DynamoDB do with this key?

A

It creates a HASH of the key value. Used to assign a partition or the underlying physical storage to use. AKA a hash attribute.

94
Q

What is a composite key (DynamoDB)?

A

Partition key + sort key

95
Q

What is the role of the partition key and the sort key?

A

Partition key- the location the data will be physically stored
Sort key- The order the data will be stored in for all keys with the same partition key

96
Q

Name 2 secondary indexes…

A

1) Global secondary index

2) Local secondary index

97
Q

What is a Global secondary index?

A

Partition key and sort key can be different that those on the table

I AM GLOBAL BABY!

98
Q

What is a local secondary index?

A

Same partition key as the table, but a different sort key

99
Q

When would you use a global secondary index?

A

When you want a fast query of attributes outside of the primary key without having to do a table scan

e.g. querying sales orders by customer number rather than sales by order number

100
Q

When would you use a local secondary index?

A

When you already know the partition key and want to quickly query on some other attribute

e.g. I have a sales order number but I would like to retrieve only those records with a certain material number

101
Q

(DynamoDB use case- solution, cost and benefit) What would you do if you need to… access just A FEW attributes the fastest way possible?

A

solution- project just those few attributes in a global secondary index
cost- minimal
benefit- lowest possible latency access for non-key items

102
Q

(DynamoDB use case- solution, cost and benefit) What would you do if you need to… frequently SOME access non-key attributes

A

solution- project those attributes in a global secondary index
cost- moderate, aims to offset table scan cost
benefit- low latency for access to non-key items

103
Q

(DynamoDB use case- solution, cost and benefit) What would you do if you need to… frequently access MOST non-key attributes

A

Solution- Project those attributes or even the entire table into a global secondary index
cost- up to double
benefit- Maximim flexibility

104
Q

(DynamoDB use case- solution, cost and benefit) What would you do if you need to… rarely query but write or update frequently

A

Solution- Project keys only for the global secondary index
cost- minimal
benefit- very fast or updates for non-partition key items

105
Q

Why would you use global secondary for table replicas?

A

To apply different WRU (write capacity unit) and RCU (read…) to tables e.g. free and premium customers.

106
Q

What is Redshift?

A

A cost-effective scalable data warehouse, you this to query large data sets and identify correlations between disparate datasets. You can also query S3 using RedShift spectrum

107
Q

What is Neptune?

A

A graph database. Allows you to store and query relationship data.

108
Q

What is elasticache?

A

An in memory data store (not persistent in traditional sense)

109
Q

Which two memory store does Elasticache provide?

A

1) Memcached

2) Redis

110
Q

Which is faster Elasticache or DynmoDB?

A

Elasticache

111
Q

Which in-memory store is most appropriate for (and why?)…. Web session storage?

A

Redis, using Redis avoids storing session data on server

112
Q

Which in-memory store is most appropriate for (and why?)…. database caching?

A

Memcached, cheap and fast!

113
Q

Which in-memory store is most appropriate for (and why?)…. leader boards

A

Redis, uses sorted sets! can keep order of millions of users instantly

114
Q

Which in-memory store is most appropriate for (and why?)…. streaming

A

Use either! e.g. lading spot for streaming sensor data on the factor floor

115
Q

What are the 4 key reasons to choose memchaced?

A

1) simple and straightforward
2) you need to scale and and in as demand changes
3) you need mulitlple CPU cores and threads
4) you need to cache objects like database queries

116
Q

What are the 8 reasons to choose Redis?

A

1) you need encryption
2) you need HIPPA compliance
3) you need support for clustering
4) you need complex data types
5) you need HA
6) you need pub/sub compatibility
7) you need geospatial indexing
8) you need backup and restore

117
Q

What is Amazon manage blockchain and what is QLDB

A

Managed bockchain framework

QLBD is a ordering service that is used to maintain complete history of all transactions

118
Q

What is Amazon timestream database? and when would you use it?

A

Fully managed database designed to manage time-series data e.g. industrial machinery

119
Q

What is Amazon DocumentDB

A

AWS based MongoDB - HA, multiAZ, scalable

120
Q

What is Amazon Elastisearch?

A

Search engine but also a doc store, also known as an ELK stack… basically just a way to perform analytics on data.

121
Q

Choose a database option based on the scenario below….

You need ultimate control over the database and the preferred DB is not available on RDS

A

Database on EC2

122
Q

Choose a database option based on the scenario below….

Need traditional relational database for OLTP (online transactional processing), data is well structured

A

Amazon RDS

123
Q

Choose a database option based on the scenario below….

Your data is in name/value pairs or in an unpredictable structure.
you also need in-memory performance with persistence

A

DynamoDB

124
Q

Choose a database option based on the scenario below….

You have massive amounts of data that will primarily be used for OLAP workloads

A

Amazon Redshift

125
Q

Choose a database option based on the scenario below….

Relationships between objects a major portion of the data value

A

Amazon Neptune

126
Q

Choose a database option based on the scenario below….

You need fast temporary storage for small amounts of data

The data is highly volatile

A

Amazon Elasticache

127
Q

What does file gateway expose its interface as?

A

NFS only it does not expose as NFS!

128
Q

How would you improve the performance of a queries against your DynamoDB table, if most of the queries do not use the partition key, what should you do?

A

Create a global secondary index with the most common queried attribute as the hash key (partition key)

129
Q

You try and get a file that doesn’t exist, then you add the file and try and fetch again… What are the two outcomes of fetching metadata from a newly added file in S3? and why is this?

A

1) get 404 error as the upload had not propagated
2) you get the metadata

Because of eventual consistency for read after write

130
Q

What is a lazy write?

A

Another name for eventual consistency

131
Q

What does FQDN stand for and which service is this used in?

A

FULLY QUALIFIED DOMAIN NAME

e.g. when specifying a mount point in EFS

132
Q

How would you ensure that EFS can tolerate an AZ failure?

A

Create EFS mount targets in each AZ and configure each EC2 instance to mound the common mount target via it’s FQDN

133
Q

How do EC2 use the FQDN for EFS?

A

The EC2 instances use the common FQDN as a mount target. The EFS file system will resolve to its local mount target in each AZ

134
Q

What type of databse is SAP HANNA or Neo4j?

A

Graph databases

135
Q

What 3 formats does Amazon Athena support?

A

1) JSON
2) Apache Paraquet
3) Apache ORC

NOT XML

136
Q

What 2 features can be used to increase the speed of read operations?

A

1) DynamoDB Accelerator (DAX)- in memory cache in front of DynamoDB
2) Secondary indexes