{ "@context": "https://schema.org", "@type": "Organization", "name": "Brainscape", "url": "https://www.brainscape.com/", "logo": "https://www.brainscape.com/pks/images/cms/public-views/shared/Brainscape-logo-c4e172b280b4616f7fda.svg", "sameAs": [ "https://www.facebook.com/Brainscape", "https://x.com/brainscape", "https://www.linkedin.com/company/brainscape", "https://www.instagram.com/brainscape/", "https://www.tiktok.com/@brainscapeu", "https://www.pinterest.com/brainscape/", "https://www.youtube.com/@BrainscapeNY" ], "contactPoint": { "@type": "ContactPoint", "telephone": "(929) 334-4005", "contactType": "customer service", "availableLanguage": ["English"] }, "founder": { "@type": "Person", "name": "Andrew Cohen" }, "description": "Brainscape’s spaced repetition system is proven to DOUBLE learning results! Find, make, and study flashcards online or in our mobile app. Serious learners only.", "address": { "@type": "PostalAddress", "streetAddress": "159 W 25th St, Ste 517", "addressLocality": "New York", "addressRegion": "NY", "postalCode": "10001", "addressCountry": "USA" } }

Data Engineering Part 2 Flashcards

(15 cards)

1
Q

What does ETL stand for?

A

Extract, Transform, Load.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the purpose of ETL?

A

To move data from source systems to a centralized location for analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between ETL and ELT?

A

ETL transforms data before loading; ELT loads raw data and transforms later.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the ‘extract’ step in ETL?

A

Retrieving raw data from source systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the ‘transform’ step in ETL?

A

Cleaning, standardizing, or reshaping data for analysis or loading.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is batch processing?

A

Processing large volumes of data at once on a scheduled basis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is stream processing?

A

Processing data in real time as it arrives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is batch processing preferred?

A

For historical analysis or when real-time isn’t required.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a micro-batch?

A

Small, time-bounded data chunks used in near-real-time processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is latency in data pipelines?

A

The delay between data generation and processing/output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a DAG in data engineering?

A

A Directed Acyclic Graph representing dependencies between tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Apache Airflow?

A

An open-source tool for authoring, scheduling, and monitoring workflows as code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is task scheduling?

A

Setting when and how often tasks in a pipeline should run.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is task dependency?

A

A rule that defines which task must complete before another can begin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is task failure recovery?

A

A method to retry or resume failed steps in a pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly