Data Engineering Part 5 Flashcards

Question 1

Q

What is observability in data pipelines?

Answer

A

The ability to understand system behavior through logs, metrics, and traces.

Question 2

Q

What is pipeline monitoring?

Answer

A

Tracking execution status, errors, and performance of ETL jobs.

Question 3

Q

What are metrics commonly tracked in pipelines?

Answer

A

Job success rates, duration, latency, and data volume.

Question 4

Q

What is alerting?

Answer

A

Automatically notifying users when metrics or jobs exceed thresholds.

Question 5

Q

What is log aggregation?

Answer

A

Collecting logs from distributed systems into a searchable central repository.

Question 6

Q

What is horizontal scaling?

Answer

A

Adding more machines or nodes to handle increased workload.

Question 7

Q

What is vertical scaling?

Answer

A

Increasing the resources (CPU, RAM) of a single machine.

Question 8

Q

What is partitioning in data storage?

Answer

A

Splitting data into segments based on attributes like date or region.

Question 9

Q

What is parallelism in data processing?

Answer

A

Executing multiple tasks or jobs simultaneously.

Question 10

Q

What is throughput?

Answer

A

The amount of data processed per unit of time.

Question 11

Q

What is idempotence?

Answer

A

The property that an operation can be repeated without changing the result.

Question 12

Q

What is atomicity in data processing?

Answer

A

Operations are all-or-nothing — they either complete entirely or not at all.

Question 13

Q

What is fault tolerance?

Answer

A

The ability of a system to continue operating despite failures.

Question 14

Q

What is a retry policy?

Answer

A

A rule that defines how failed operations should be re-attempted.

Question 15

Q

What is backoff strategy in retries?

Answer

A

A method to progressively delay retry attempts after failures.

Question 16

Q

What is data lineage?

Answer

Study These Flashcards

A

The tracking of data’s origin, transformations, and flow through systems.

Question 17

Q

Why is data lineage important?

Answer

Study These Flashcards

A

It ensures transparency, debugging, and auditability.

Question 18

Q

What is data governance?

Answer

Study These Flashcards

A

The policies and practices for managing data quality, access, and usage.

Question 19

Q

What is metadata in data engineering?

Answer

Study These Flashcards

A

Descriptive information about data — structure, origin, type, etc.

Question 20

Q

What is a data catalog?

Answer

Study These Flashcards

A

A searchable inventory of data assets and their metadata.

Data Engineering Part 5 Flashcards

(20 cards)