{ "@context": "https://schema.org", "@type": "Organization", "name": "Brainscape", "url": "https://www.brainscape.com/", "logo": "https://www.brainscape.com/pks/images/cms/public-views/shared/Brainscape-logo-c4e172b280b4616f7fda.svg", "sameAs": [ "https://www.facebook.com/Brainscape", "https://x.com/brainscape", "https://www.linkedin.com/company/brainscape", "https://www.instagram.com/brainscape/", "https://www.tiktok.com/@brainscapeu", "https://www.pinterest.com/brainscape/", "https://www.youtube.com/@BrainscapeNY" ], "contactPoint": { "@type": "ContactPoint", "telephone": "(929) 334-4005", "contactType": "customer service", "availableLanguage": ["English"] }, "founder": { "@type": "Person", "name": "Andrew Cohen" }, "description": "Brainscape’s spaced repetition system is proven to DOUBLE learning results! Find, make, and study flashcards online or in our mobile app. Serious learners only.", "address": { "@type": "PostalAddress", "streetAddress": "159 W 25th St, Ste 517", "addressLocality": "New York", "addressRegion": "NY", "postalCode": "10001", "addressCountry": "USA" } }

Data Engineering Part 5 Flashcards

(20 cards)

1
Q

What is observability in data pipelines?

A

The ability to understand system behavior through logs, metrics, and traces.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is pipeline monitoring?

A

Tracking execution status, errors, and performance of ETL jobs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are metrics commonly tracked in pipelines?

A

Job success rates, duration, latency, and data volume.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is alerting?

A

Automatically notifying users when metrics or jobs exceed thresholds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is log aggregation?

A

Collecting logs from distributed systems into a searchable central repository.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is horizontal scaling?

A

Adding more machines or nodes to handle increased workload.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is vertical scaling?

A

Increasing the resources (CPU, RAM) of a single machine.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is partitioning in data storage?

A

Splitting data into segments based on attributes like date or region.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is parallelism in data processing?

A

Executing multiple tasks or jobs simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is throughput?

A

The amount of data processed per unit of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is idempotence?

A

The property that an operation can be repeated without changing the result.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is atomicity in data processing?

A

Operations are all-or-nothing — they either complete entirely or not at all.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is fault tolerance?

A

The ability of a system to continue operating despite failures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a retry policy?

A

A rule that defines how failed operations should be re-attempted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is backoff strategy in retries?

A

A method to progressively delay retry attempts after failures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is data lineage?

A

The tracking of data’s origin, transformations, and flow through systems.

17
Q

Why is data lineage important?

A

It ensures transparency, debugging, and auditability.

18
Q

What is data governance?

A

The policies and practices for managing data quality, access, and usage.

19
Q

What is metadata in data engineering?

A

Descriptive information about data — structure, origin, type, etc.

20
Q

What is a data catalog?

A

A searchable inventory of data assets and their metadata.