Chapter 2 Flashcards

(7 cards)

1
Q

Data pipeline stages

A
  1. Ingestion
  2. Transformation
  3. Storage
  4. Analysis

Note: Not all pipelines have all stages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Ingestion in terms of data pipeline

A

Ingestion (see Figure 3.3) is the process of bringing data into the GCP environment. This
can occur in either batch or streaming mode.

In batch mode, data sets made up of one or more files are copied to GCP. Often these
files will be copied to Cloud Storage first.

Streaming ingestion receives data in increments, typically a single record or small
batches of records, that continuously flow into an ingestion endpoint, typically a Cloud
Pub/Sub topic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Transformation in terms of data pipeline

A

Transformation is the process of mapping data from the structure used in the source system
to the structure used in the storage and analysis stages of the data pipeline.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are (4) Data Warehousing Pipelines

A
  1. Extraction, transformation, and load (ETL)
  2. Extraction, load, and transformation (ELT)
  3. Extraction and load
  4. Change data capture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Dag

A

A DAG (Directed Acyclic Graph) is a graph of tasks where:

Directed: Each connection (edge) has a direction—task A → task B.
Acyclic: No cycles are allowed—you cannot loop back to a task already completed.

This ensures that tasks are executed in a defined, non-repeating sequence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is ETL

A

Extract, transformation, and load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is ELT

A

Extract, load, and transformation.
Data is loaded into database before transforming the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly