Flows Flashcards

(20 cards)

1
Q

… (existing content unchanged) …

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a flow in Lakeflow Declarative Pipelines?

A

A flow is a component of an ETL pipeline that processes a query and writes its result to a target, either as a batch or a stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is a flow triggered in Lakeflow Declarative Pipelines?

A

A flow is triggered each time its pipeline is updated. It may perform a full refresh or an incremental refresh based on the flow type and data changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you create a default flow in SQL?

A

By using a CREATE OR REFRESH STREAMING TABLE statement with a query. For example: CREATE OR REFRESH STREAMING TABLE customers_silver AS SELECT * FROM STREAM(customers_bronze).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you create a default flow in Python?

A

Use a @dlt.table() decorator with a function returning a Spark DataFrame. For example: @dlt.table() def customers_silver(): return spark.readStream.table("customers_bronze").

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an append flow in Lakeflow Declarative Pipelines?

A

An append flow adds new records to the target during each update. It corresponds to append mode in structured streaming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can multiple flows write to the same target?

A

Yes, multiple append flows can write to the same target. This supports cases like appending data from multiple regions or backfilling historical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you explicitly create an append flow in Python?

A

Use the @dlt.append_flow(target="<target_table>") decorator with a function returning the source DataFrame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some common use cases for multiple append flows?

A

Ingesting new regional data, backfilling historical records, and combining multiple sources without using UNION in a query.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does Lakeflow track state for streaming flows?

A

Flows are tracked using flow names, which serve as identifiers for streaming checkpoints. Renaming a flow creates a new flow context.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an Auto CDC flow?

A

An Auto CDC flow handles change data capture (CDC) events, supports SCD Type 1 & 2, and is only used with streaming tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can a streaming table be targeted by both Auto CDC and other flows?

A

No, streaming tables targeted by Auto CDC flows can only receive data from other Auto CDC flows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the types of flows supported in Lakeflow Declarative Pipelines?

A

The main types are Append and Auto CDC. Append flows are for standard incremental updates, while Auto CDC handles change data capture streams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

… (existing content unchanged) …

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you create an append flow in SQL?

A

Use CREATE FLOW <flow_name> AS INSERT INTO <target> BY NAME SELECT * FROM STREAM(<source>). This separates flow creation from target definition and allows multiple flows for the same table.

17
Q

How does Lakeflow handle multiple flows writing to one target?

A

It supports this pattern using append flows, allowing each to append data independently to a shared target without full refreshes.

18
Q

How does Lakeflow support backfilling historical data?

A

You can define an append flow using a batch source (e.g., historical table) that writes once to the target without interfering with ongoing streams.

19
Q

What is the role of flow names in checkpointing?

A

Flow names serve as identifiers for streaming checkpoints. Changing a flow name resets its checkpoint, effectively creating a new flow instance.

20
Q

Where should data quality expectations be defined in a flow setup?

A

Expectations must be defined on the target table during creation (e.g., using create_streaming_table()) and not within the @append_flow or CREATE FLOW definitions.