AWS Redshift Flashcards

Question 1

Q

When you create a cluster, what do you get as a base configuration?

Answer

A

You get two nodes, leader and a data node, giving 160GB.

Question 2

Q

Do you get to select the disk size for RedShift?

Answer

A

No, you do not get to select the dist size. You do get to select the overall size of the Redshift cluster, through a slider in the console or parameter in CLI & API. AWS will then figure the number of disks in each data node.

Question 3

Q

I need to add capacity to my redshift cluster, how can I do this?

Answer

A

You have two options, you can scale up or out. Scaling up means you can change the size of the instance or you can add more node by scaling out.

Question 4

Q

What interfaces does RedShift support?

Answer

A

ODBC
JDBC
Postgres

Question 5

Q

What is RedShift built on?

Answer

A

AWS Postgress, AWS separated the storage from the query engine and then replaced the storage engine with a columnar database.

Question 6

Q

What is RedShift used for?

Answer

A

Data Wherehouse

- Analytics

Question 7

Q

I have data in S3, is it possible to query this data from RedShift?

Answer

A

Yes, RedShift has a service called RedShift Spectrum, the data in S3 must be in a CVS format.

Question 8

Q

What type of database is Redshift?

Answer

A

It is a columnar database, it is designed to scan columns of data fast. With columnar data, it is easy, to sum up, a column or find the min and max fast.

Question 9

Q

What is the architecture of a RedShift cluster?

Answer

A

You have a leader node and data nodes, data nodes have slices and these slices are the storage where data is stored and searched.

Question 10

Q

What is the purpose of the leader node?

Answer

A

The leader node distributes the query to the data nodes in the cluster, the leader node is the query planner node..

Question 11

Q

Is RedShift an OLAP or OLTP?

Answer

A

It is OLAP (online analytic processing).

Question 12

Q

Is RedShift a regional, Global?

Answer

A

Redshift just lives in a single Subnet in a single AZ, the reason for this is the components need to be fast and keeping them together requires the components kept local.

Question 13

Q

Is data compressed in Redshift?

Answer

A

You can have data compressed in Redshift, this is not blanket compression but is defined when you create a table and is defined per field in the table.

Question 14

Q

Is Redshift a service or do you get a cluster of nodes?

Answer

A

You get a cluster of nodes, one leader and the rest are data nodes.

Question 15

Q

Dose Redshift support encryption?

Answer

A

Yes, you can use KMW or CloudHSM, with KMW you can use AWS Managed CMK’s or you can use you own CMK

Question 16

Q

Can I resize a cluster?

Answer

A

You have two options, elastic resize and classic resize. Elastic resize makes a new cluster and copies from one node to another. Elastic resize just adds node and rebalance the data.

Question 17

Q

I wnat to increase the size of the Redshift cluster nodes, how can I do this?

Answer

A

You have to use classic resize as it enables the resizing of nodes. A new cluster will be created and the data will be copied over to the new cluster.

Question 18

Q

How is Redshift backed up?

Answer

A

When the cluster is created the default is, automatic backups, backup snapshots are taken of the Redshift cluster and you can also use manual snapshots. Snapshot data is stored in S3.

Question 19

Q

What services can push or load data into Redshift?

Answer

A

Kinesis
S3
DataPipeline

Question 20

Q

How often does AWS take snapshots of the Redshift cluster?

Answer

A

every 6 - 8 hrs or every 5gb of data changes.

Question 21

Q

Is it possible to take a manual snapshot of the Redshift cluster?

Answer

A

Yes, 100%, you also set how long you wnat the snapshot to be retained, -1 forever.

Question 22

Q

I am concerned about DR for my Redshift, what options do I have?

Answer

A

You can configure to have the snapshots replicated to another region, you select the region and retention period.

Question 23

Q

If I wnat to restore a table form a snapshot/backup, is this possible?

Answer

A

Yes, you can select the backup/snapshot and then the database and the table.

Question 24

Q

I wnat to be able to restore my cluster in the event of a disaster, what options do I have?

Answer

A

You can have the Redshift cluster take snapshots/backups and then you will be able to restore.

Question 25

Q

What is the Max data the RedShift can manage?

Question 26

Q

What type of database is RedShift

Answer

A

Colomer database

Question 27

Q

Is Redshift an OLAP or OLTP?

Question 28

Q

I what to increase the amount of data in my Redshift cluster, how can I do this?

Answer

A

Increase the number of nodes as each node is a computer and storage unit.

Question 29

Q

What types of nodes do you get in a redshift cluster?

Answer

A

You get a leader node and data nodes

Question 30

Q

For data nodes, are there different types of nodes?

Answer

A

Yes, you have two instance type options,

Instance DC2 (SSD)
Instance DS2 (Magnetic)

Question 31

Q

I have one large file (1TB), what should I do when loading into Redshift and why?

Answer

A

You need split the file into a smaller file so that each of the files will get loaded on to separate nodes in the RedShift cluster.

Question 32

Q

What are the two operations you perform on a Redshift cluster to get dat in and out?

Answer

A

load and unload

Question 33

Q

Where is Redshift deployed to?

Question 34

Q

Can you purchase reservations?

Question 35

Q

I wnat to be able to store user information and update individual user data fields, is Redshift suitable, give reson?

Answer

A

Redshift is an OLAP (Colum DB) and not suitable for OLTP type data.

Question 36

Q

Can you make a redshift cluster public?

Question 37

Q

How do backups work on Redshift?

Answer

A

You get to take snapshots manually and automatically, these are incremental and like other databases, you can restore to any point in time

Question 38

Q

What are the AWS services that can put data into Redshift?

Answer

A

Datapipeline
Kinesis firehose
S3

Question 39

Q

How can I increase the DR capabilities of Redshift?

Answer

A

Ensure snapshot are automatically take/configured, enable cross-regions snapshots to copy the s3 snapshot to another region.

Question 40

Q

Can I just restore a Table and not the whole database?

Answer

A

Yes, you have the ability to restore just a table.

Question 41

Q

What is RedShift?

Answer

A

Redshift is a fully managed, fast and powerful, petabyte-scale data warehouse service

Question 42

Q

What is the smallest redshift cluster you can have?

Answer

A

1 one it acts are both the compute and lead node.

Question 43

Q

What is the purpose of the lead mode?

Answer

A

it is to distribute the incoming request to the leads and collect the result.

Question 44

Q

In a redshift cluster, how may lead nodes will it take to store 1PB of data?

Answer

A

None, lead nodes do not store data, data nodes store data in redshift.

Question 45

Q

How can you query data in a redshift cluster?

Answer

A

Using Postgres SQL.

Question 46

Q

Can I select form a T2 micro and a T2 standard instance when creating a RedShift cluster?

Answer

A

No, there are only two instance types supported,

DC2 instance types
S2

Question 47

Q

When we load data, what are we doing?

Answer

A

Putting data in the RedShift cluster.

Question 48

Q

Where is data stored in the RedShift cluster?

Answer

A

Data is stored in slices in the data nodes, a data can cna either have 2 or 16 slices. Each slice will query its own data to get a result.

Question 49

Q

I am using ODBC and I need to load data into RedShift, do I need a third party product to load that data?

Answer

A

No, ODBC is supported by RedShift

Question 50

Q

I am using JDBC and I need to load data into RedShift, do I need a third party product to load that data?

Answer

A

No, JDBC is supported by RedShift

Question 51

Q

As redshift is a managed service form AWS, I am concerned I will not be able to have RedShift deployed to my VPC, is this valid?

Answer

A

No, RS can be deployed to your VPC

Question 52

Q

I wnat to have a RedShift cluster public-facing, how can i do this?

Answer

A

Yes, its an option, you deploy to you public VPC

Question 53

Q

By default where is the RS cluster deployed?

Answer

A

To the default VPC

Question 54

Q

I am architecting a solution, my org does not wnat any data in transit over the public internet, I have data in S3 to be loaded into redshift, how can I architect this solution so no data goes over the internet?

Answer

A

Put the RS cluster in a private VPC with VPC endpoint gateway for access to s3 data so we cna load the data without going over the public internet.

Question 55

Q

I know I am going to be using my RS cluster for the next 3 years, I wnat to reduce the cost, how can I do this?

Answer

A

You can do a reservation for the nodes in RS, just like EC2 reservations and this will save you on cost.

Question 56

Q

My org requires data at rest encryption, how can I implement this in RS?

Answer

A

Just like other AWS products/services, RS supports encryption of data, SSE with customer or AWS managed CMKs

Question 57

Q

Can I resize RS?

Answer

A

Yes, two options,

Classic: Create a new cluster and copy data.
Elastic resize: You cna just change the number of node

Question 58

Q

When you resize an RS clutter is there some disruption?

Question 59

Q

What I am using RS to access S3 through a VPC endpoint, what option do I need to enable?

Answer

A

Enhanced routing

Question 60

Q

How are backups created in RS?

Answer

A

Scheduled and manual snapshots

Question 61

Q

How long are automatic snapshots retained for RS?

Answer

A

You set the retention period, afterwords the data is deleted.

Question 62

Q

Can I take manual snapshots for RS?

Question 63

Q

How long are manual snapshots retained for RS?

Answer

A

You set the retention period, afterwords the data is deleted.

Question 64

Q

How can I load data into an RS cluster?

Answer

A

Data pipeline

Answer 58

A

Data pipeline has a template to load MySQL data into the RS cluster.

Answer 59

A

Data pipeline has a template for copying data and using a schedule.

Answer 60

A

You can use the Datapipeline template to copy S3 data to RS and modify it so it uses a schedule.

Answer 61

A

No, RS has this ability native.

Answer 62

A

every 6 - 8 hrs or after 5GB of data changes, whichever comes first.

Answer 63

A

Yes, there is a parameter to have ‘no backup’ on a per-table base.

Answer 64

A

The backup will never expire and be deleted.

Answer 65

A

Enable cross-region replication to have a copy of the data in another region.

Answer 66

A

No, you have two options,

Full restore
Table restore

Answer 67

A

No, each node also copies its data to another node in the cluster, you do not pay extra for this replicated data.

Answer 68

A

No, RS is a managed service and patching is performed by AWS.

Answer 69

A

Up or out, you cna change the size of the instance or add more instances.

Answer 70

A

Now if you select it

- or in the maintenance window.

Answer 71

A

You cna use the CloudHSM for key management.

Answer 72

A

Add an SQS queue to the ingestion layer to buffer writes to the RDS instance (RDS instance will not support data for 2 years)
Ingest data into a DynamoDB table and move old data to a Redshift cluster (Handle 10K IOPS ingestion and store data into Redshift for analysis)
Replace the RDS instance with a 6 node Redshift cluster with 96TB of storage (Does not handle the ingestion issue)
Keep the current architecture but upgrade RDS storage to 3TB and 10K provisioned IOPS (RDS instance will not support data for 2 years)

Answer 73

A

Yes, 100%. RS provides the ability to have you RS data backed up, this happens every 6-8 hrs or hewn you get a 5GB data change. Retention is settable for 1 - 35 days.

Answer 74

A

Opt to use reserved instances for RS the is running for very long periods.

Answer 75

A

You can use RedShift spectrum.

Answer 76

A

You can use RedShift spectrum, RSS can use ODBS just like redshift and can also query S3

Answer 77

A

You cant, sport instances can be taken back by AWS at andy point and can not be used with RedShift Leader or data nodes as they require always running instances.

Answer 78

A

In a single region and in a single AZ.

Answer 79

A

No, with RedShift you can select from a limited number of instance types,

ra3.16xlarge (48 vCPU)
dc2.large (2 vCPU)
dc2.8xlarge ( 32 vCPU)

Answer 80

A

No, you can create a new cluster.

Answer 81

A

It depends, each node as an instance size and an amount of storage, there are 3 instances sizes and each has a different size of storage. You can scale up RS to 8PB storage by scaling out the number of data nodes. to 128, this is 128 nodes with 64TB each

Answer 82

A

Yes, 100%, it just in a VPC, so you cna add an INternetGateway and give it an EIP. All available through many network provisioning.

Answer 83

A

You can use,

KMS - AWS Managed CMKs
KMS - Customer managed CMK
CloudHSM

Answer 84

A

1 - 35 days

Answer 85

A

User CloudFormation to recreate the RS cluster in another region if needed.
Use backups with regional replication enabled it to ensure backups are offloaded to another region and can be used to recreate the data.

Answer 86

A

RS has two types of backups,

Manual (You take them)
Automatic (Every 8hrs or 5GB per node of data change)

Answer 87

A

You have to create a snapshot schedule, this schedule defines when you wnat snapshots created for your cluster. This schedule is attached to one or more clusters.

Answer 88

A

It is not restored to the cluster, you will get a new cluster.

Answer 89

A

You can use the cluster straight away, data is streamed as needed.

Answer 90

A

Sort of, you get to backup using to the same size storage on the RS data nodes, after you snapshots go over thet you are charged at normal rate.

Answer 91

A

RS snapshots are managed by AWS RS and not visible to you in an S3 bucket.

Answer 92

A

Yes, you select the snapshot, database and table

Answer 93

A

You will need a subnet group and security group

Brainscape's Knowledge GenomeTM

AWS Redshift Flashcards

Brainscape's Knowledge Genome^TM