Chapter 2 Flashcards
(7 cards)
Data pipeline stages
- Ingestion
- Transformation
- Storage
- Analysis
Note: Not all pipelines have all stages.
What is Ingestion in terms of data pipeline
Ingestion (see Figure 3.3) is the process of bringing data into the GCP environment. This
can occur in either batch or streaming mode.
In batch mode, data sets made up of one or more files are copied to GCP. Often these
files will be copied to Cloud Storage first.
Streaming ingestion receives data in increments, typically a single record or small
batches of records, that continuously flow into an ingestion endpoint, typically a Cloud
Pub/Sub topic.
What is Transformation in terms of data pipeline
Transformation is the process of mapping data from the structure used in the source system
to the structure used in the storage and analysis stages of the data pipeline.
What are (4) Data Warehousing Pipelines
- Extraction, transformation, and load (ETL)
- Extraction, load, and transformation (ELT)
- Extraction and load
- Change data capture
What is Dag
A DAG (Directed Acyclic Graph) is a graph of tasks where:
Directed: Each connection (edge) has a direction—task A → task B.
Acyclic: No cycles are allowed—you cannot loop back to a task already completed.
This ensures that tasks are executed in a defined, non-repeating sequence.
What is ETL
Extract, transformation, and load
What is ELT
Extract, load, and transformation.
Data is loaded into database before transforming the data.