{ "@context": "https://schema.org", "@type": "Organization", "name": "Brainscape", "url": "https://www.brainscape.com/", "logo": "https://www.brainscape.com/pks/images/cms/public-views/shared/Brainscape-logo-c4e172b280b4616f7fda.svg", "sameAs": [ "https://www.facebook.com/Brainscape", "https://x.com/brainscape", "https://www.linkedin.com/company/brainscape", "https://www.instagram.com/brainscape/", "https://www.tiktok.com/@brainscapeu", "https://www.pinterest.com/brainscape/", "https://www.youtube.com/@BrainscapeNY" ], "contactPoint": { "@type": "ContactPoint", "telephone": "(929) 334-4005", "contactType": "customer service", "availableLanguage": ["English"] }, "founder": { "@type": "Person", "name": "Andrew Cohen" }, "description": "Brainscape’s spaced repetition system is proven to DOUBLE learning results! Find, make, and study flashcards online or in our mobile app. Serious learners only.", "address": { "@type": "PostalAddress", "streetAddress": "159 W 25th St, Ste 517", "addressLocality": "New York", "addressRegion": "NY", "postalCode": "10001", "addressCountry": "USA" } }

Data Engineering Part 4 Flashcards

(19 cards)

1
Q

What is data cleaning?

A

The process of detecting and correcting errors or inconsistencies in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are common data quality issues?

A

Missing values, duplicates, inconsistent formats, and outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is deduplication?

A

Identifying and removing duplicate records.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is schema validation?

A

Ensuring that incoming data matches a predefined schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is outlier detection?

A

Identifying data points that deviate significantly from others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is referential integrity?

A

Ensuring that foreign keys match primary keys in related tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a null constraint?

A

A rule that disallows null values in a column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a uniqueness constraint?

A

A rule that ensures all values in a column are unique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a check constraint?

A

A rule that enforces a specific condition on a column value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is data integrity important?

A

It ensures trustworthiness and usability of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is schema evolution?

A

The ability to handle changes to a data schema over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is backward compatibility in schema evolution?

A

New data can be read by systems expecting the old schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is forward compatibility?

A

Old data can be read by systems expecting the new schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is schema evolution important?

A

To support ongoing development and changes in data sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which formats support schema evolution?

A

Avro, Parquet, Protobuf.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is data validation?

A

The process of checking data against rules or constraints.

17
Q

What is a validation rule?

A

A rule used to assess the correctness or quality of data.

18
Q

What is anomaly detection?

A

Identifying data patterns that do not conform to expected behavior.

19
Q

What is the difference between validation and cleaning?

A

Validation checks data quality; cleaning fixes detected issues.