Data Engineering Fundamentals Flashcards

1
Q

What is Avro?

A

A Binary storage format that keeps information about the schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Parquet?

A

Columnar storage optimized for analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does random sampling do?

A

It gives everything an equal chance at being selected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is stratified sampling?

A

It splits the population, but ensures representation of each subgroup.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is systemic sampling?

A

When you are going to select every N item.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is data skew?

A

Unequal distribution between partitions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What can be done to address data skew?

A

Adaptive partitionig

Salting

Repartitioning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does the YEAR() function in SQL do?

A

It selects only the year from a date field.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does a pivot table do?

A

It makes row level data into columnar data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the default SQL join?

A

An inner join?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does inner join work?

A

It select all the rows from table A that have a matching identifier in table B.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does a left outer join work?

A

It selects everything in Table A regardless of whether there is a match in Table B. Only records with a match in Table B are returned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does a right outer join work?

A

It selects everything in Table B regardless of whether there is a match in Table A. Only records with a match in Table A are returned. Opposite of Left Join.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does a full outer join work?

A

Data from Table A and Table B is returned, but only matching records will have values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Regex do?

A

It pattern matches.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the RegEx operator for case sensitivity?

17
Q

What is the RegEx expression operator?

18
Q

What is the RegEx expression to not match?

19
Q

In GIT, how do I get files from the repository to my local workspace?

20
Q

How would you initialize a new Git repository?

21
Q

What does GIT Config do?

A

Sets configuration values for user info and aliases.

22
Q

How do you clone or download a repository from an existing URL?

23
Q

What does git status do?

A

It checks the status of your changes in your working directory. This is local.

24
Q

How do you view commit logs in git?

25
What does git branch do?
It shows all branches
26
How would you create a new branch
git branch newBranchName
27
How do you switch branches?
git checkout branchname
28
How do you create a new branch and switch to it?
git checkout -b
29
How do you delete a branch?
git branch -d
30
How do you push your changes to the remote repository?
git push
31
What does git pull do?
Pulls changes from a remote repository branch into the current local branch
32
What is a transition action in s3?
It is used to move objects from one storage glass to another.
33
What are expiration actions in S3?
They are used to configure object expiration / delete after N period of time.
34
Can lifecycle rules be created based on tags or prefixes?
Yes, on Both
35
What is the level hierarchy for S3?
Standard Standard IA Intelligent Tiering One Zone IA Glacier Instant Retrieval Glacier Flexible Retrieval Glacier Deep Archive
36
What does S3 analytics do?
Helps you decide when to transition objects to the right storage class.
37
What are the targets for S3 event notifications?
Lambda, SNS, and SQS