Storage and Data Management: 22% (Redshift, S3, Lake Formation, Glue Data Catalog, HDFS, EMRFS) Flashcards

Be able to: a) determine the operational characteristics of a storage solution for analytics b) determine data access and retrieval patterns c) select an appropriate data layout, schema, structure, and format d) define a data lifecycle based on usage patterns and business requirements e) determine an appropriate system for cataloging data and managing metadata

1
Q

Which services are appropriate for building data lakes on AWS?

A

S3, Lake Formation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which services are appropriate for building data warehouses on AWS?

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which storage service is appropriate for highly structured data serving as a single point of truth?

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name three roles that Lake Formation fills

A

a) organising and curating ingested data
b) securing lake data
c) orchestrating transformation jobs with other services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What sort of data can be stored in an S3 data lake, structured, semistructured or unstructured?

A

All three

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is Lake Formation used to create ETL operations?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name three user-defined components of an S3 object url

A

a) region
b) bucket name
c) object key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Is Redshift a relational or columnar database?

A

Columnar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Name the key difference between columnar and relational databases

A

Relational databases are optimised for fast retrieval of rows, typically for transactional applications

Columnar databases are optimised for fast retrieval of columns, typically for analytical applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Name two Apache columnar databases that can be hosted on AWS

A

Cassandra and HBase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the fastest way to load data into Redshift?

A

Bulk copying of multiple compressed files from S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can a manifest file be used with the Redshift copy command?

A

TBD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly