Flows Flashcards
(20 cards)
… (existing content unchanged) …
What is a flow in Lakeflow Declarative Pipelines?
A flow is a component of an ETL pipeline that processes a query and writes its result to a target, either as a batch or a stream.
How is a flow triggered in Lakeflow Declarative Pipelines?
A flow is triggered each time its pipeline is updated. It may perform a full refresh or an incremental refresh based on the flow type and data changes.
How do you create a default flow in SQL?
By using a CREATE OR REFRESH STREAMING TABLE
statement with a query. For example: CREATE OR REFRESH STREAMING TABLE customers_silver AS SELECT * FROM STREAM(customers_bronze)
.
How do you create a default flow in Python?
Use a @dlt.table()
decorator with a function returning a Spark DataFrame. For example: @dlt.table() def customers_silver(): return spark.readStream.table("customers_bronze")
.
What is an append flow in Lakeflow Declarative Pipelines?
An append flow adds new records to the target during each update. It corresponds to append mode in structured streaming.
Can multiple flows write to the same target?
Yes, multiple append flows can write to the same target. This supports cases like appending data from multiple regions or backfilling historical data.
How do you explicitly create an append flow in Python?
Use the @dlt.append_flow(target="<target_table>")
decorator with a function returning the source DataFrame.
What are some common use cases for multiple append flows?
Ingesting new regional data, backfilling historical records, and combining multiple sources without using UNION in a query.
How does Lakeflow track state for streaming flows?
Flows are tracked using flow names, which serve as identifiers for streaming checkpoints. Renaming a flow creates a new flow context.
What is an Auto CDC flow?
An Auto CDC flow handles change data capture (CDC) events, supports SCD Type 1 & 2, and is only used with streaming tables.
Can a streaming table be targeted by both Auto CDC and other flows?
No, streaming tables targeted by Auto CDC flows can only receive data from other Auto CDC flows.
What are the types of flows supported in Lakeflow Declarative Pipelines?
The main types are Append and Auto CDC. Append flows are for standard incremental updates, while Auto CDC handles change data capture streams.
… (existing content unchanged) …
How do you create an append flow in SQL?
Use CREATE FLOW <flow_name> AS INSERT INTO <target> BY NAME SELECT * FROM STREAM(<source>)
. This separates flow creation from target definition and allows multiple flows for the same table.
How does Lakeflow handle multiple flows writing to one target?
It supports this pattern using append flows, allowing each to append data independently to a shared target without full refreshes.
How does Lakeflow support backfilling historical data?
You can define an append flow using a batch source (e.g., historical table) that writes once to the target without interfering with ongoing streams.
What is the role of flow names in checkpointing?
Flow names serve as identifiers for streaming checkpoints. Changing a flow name resets its checkpoint, effectively creating a new flow instance.
Where should data quality expectations be defined in a flow setup?
Expectations must be defined on the target table during creation (e.g., using create_streaming_table()
) and not within the @append_flow
or CREATE FLOW
definitions.