Flows Flashcards

Question 1

Q

… (existing content unchanged) …

Question 2

Q

What is a flow in Lakeflow Declarative Pipelines?

Answer

A

A flow is a component of an ETL pipeline that processes a query and writes its result to a target, either as a batch or a stream.

Question 3

Q

How is a flow triggered in Lakeflow Declarative Pipelines?

Answer

A

A flow is triggered each time its pipeline is updated. It may perform a full refresh or an incremental refresh based on the flow type and data changes.

Question 4

Q

How do you create a default flow in SQL?

Answer

A

By using a CREATE OR REFRESH STREAMING TABLE statement with a query. For example: CREATE OR REFRESH STREAMING TABLE customers_silver AS SELECT * FROM STREAM(customers_bronze).

Question 5

Q

How do you create a default flow in Python?

Answer

A

Use a @dlt.table() decorator with a function returning a Spark DataFrame. For example: @dlt.table() def customers_silver(): return spark.readStream.table("customers_bronze").

Question 6

Q

What is an append flow in Lakeflow Declarative Pipelines?

Answer

A

An append flow adds new records to the target during each update. It corresponds to append mode in structured streaming.

Question 7

Q

Can multiple flows write to the same target?

Answer

A

Yes, multiple append flows can write to the same target. This supports cases like appending data from multiple regions or backfilling historical data.

Question 8

Q

How do you explicitly create an append flow in Python?

Answer

A

Use the @dlt.append_flow(target="<target_table>") decorator with a function returning the source DataFrame.

Question 9

Q

What are some common use cases for multiple append flows?

Answer

A

Ingesting new regional data, backfilling historical records, and combining multiple sources without using UNION in a query.

Question 10

Q

How does Lakeflow track state for streaming flows?

Answer

A

Flows are tracked using flow names, which serve as identifiers for streaming checkpoints. Renaming a flow creates a new flow context.

Question 11

Q

What is an Auto CDC flow?

Answer

A

An Auto CDC flow handles change data capture (CDC) events, supports SCD Type 1 & 2, and is only used with streaming tables.

Question 12

Q

Can a streaming table be targeted by both Auto CDC and other flows?

Answer

A

No, streaming tables targeted by Auto CDC flows can only receive data from other Auto CDC flows.

Question 13

Q

What are the types of flows supported in Lakeflow Declarative Pipelines?

Answer

A

The main types are Append and Auto CDC. Append flows are for standard incremental updates, while Auto CDC handles change data capture streams.

Question 14

Q

Question 15

Q

… (existing content unchanged) …

Question 16

Q

How do you create an append flow in SQL?

Answer

Study These Flashcards

A

Use CREATE FLOW <flow_name> AS INSERT INTO <target> BY NAME SELECT * FROM STREAM(<source>). This separates flow creation from target definition and allows multiple flows for the same table.

Question 17

Q

How does Lakeflow handle multiple flows writing to one target?

Answer

Study These Flashcards

A

It supports this pattern using append flows, allowing each to append data independently to a shared target without full refreshes.

Question 18

Q

How does Lakeflow support backfilling historical data?

Answer

Study These Flashcards

A

You can define an append flow using a batch source (e.g., historical table) that writes once to the target without interfering with ongoing streams.

Question 19

Q

What is the role of flow names in checkpointing?

Answer

Study These Flashcards

A

Flow names serve as identifiers for streaming checkpoints. Changing a flow name resets its checkpoint, effectively creating a new flow instance.

Question 20

Q

Where should data quality expectations be defined in a flow setup?

Answer

Study These Flashcards

A

Expectations must be defined on the target table during creation (e.g., using create_streaming_table()) and not within the @append_flow or CREATE FLOW definitions.

Flows Flashcards

(20 cards)