Sources Flashcards

(49 cards)

1
Q

What are sources in dbt used for?

A

Sources allow you to declare external tables loaded by EL tools and define lineage, test assumptions, and track freshness in your dbt project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you declare a source in dbt?

A

Define it in a YAML file under a sources: key, specifying the source name, database, schema, and its tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does the {{ source() }} function do in dbt?

A

It references a declared source table, creating a dependency and ensuring proper lineage and compilation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does dbt compile {{ source('jaffle_shop', 'orders') }} to?

A

It compiles to the fully qualified name of the source table, e.g., raw.jaffle_shop.orders.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can you document and test sources in dbt?

A

Yes, you can add descriptions and data tests in YAML just like for models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What kind of tests can be applied to source columns?

A

You can apply unique, not_null, and other data tests to ensure data integrity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is source freshness in dbt?

A

It checks how recently a table’s data was updated, helping ensure pipeline timeliness and reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you enable source freshness checks?

A

Add a freshness block and specify loaded_at_field in the source or table config.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What values can be set in the freshness block?

A

You can set warn_after and/or error_after with a count and time period (e.g., hours).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What command checks source freshness in dbt?

A

Run dbt source freshness to evaluate the freshness of declared source tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What query does dbt run behind the scenes for freshness checks?

A

It selects max(loaded_at_field) and compares it to the current time to determine data age.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can you filter rows in freshness checks to avoid full table scans?

A

Use the filter config with a WHERE clause (e.g., loaded_at_field >= date_sub(...)) to limit rows.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What happens when a freshness threshold is violated?

A

dbt marks the table as stale and logs a warning or error depending on your configuration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What command builds models downstream of fresher sources?

A

Use dbt build --select source_status:fresher+ to rebuild only models with fresh source data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do freshness configs cascade in dbt?

A

Top-level source configs apply to all tables unless overridden at the table level.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the role of loaded_at_field in source freshness?

A

It indicates the column used to track when a source row was last loaded. It’s required for freshness checks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Can you skip freshness checks for specific tables?

A

Yes, set freshness: null in the table’s config to exclude it from freshness evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the benefit of defining source freshness snapshots?

A

They allow dbt to periodically check source freshness and trigger builds only when needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Why is it important to define the schema and database in source declarations?

A

It ensures that dbt can correctly locate and access the external tables, especially in multi-database environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you declare source freshness in dbt?

A

You add a freshness block under the source or table config, and specify a loaded_at_field to track when data was last loaded.

21
Q

What parameters can be included in a freshness block?

A

You can set warn_after and/or error_after with a count and period (e.g., hours).

22
Q

What is the purpose of loaded_at_field in source freshness?

A

It specifies the timestamp column dbt uses to calculate the freshness of the data.

23
Q

What happens if loaded_at_field is not defined for a table?

A

dbt will not calculate freshness for that table, even if freshness thresholds are set.

24
Q

Can you override the default freshness config for a specific table?

A

Yes, you can define a separate freshness config in a specific table block, which will override the source-level config.

25
What does setting `freshness: null` for a table do?
It disables freshness checks for that specific table, even if the source has a default freshness config.
26
How does dbt handle freshness config hierarchy?
Freshness and `loaded_at_field` declared at the source level apply to all child tables unless individually overridden.
27
What is an example of a strict freshness configuration?
Setting `warn_after: {count: 6, period: hour}` and `error_after: {count: 12, period: hour}` for a table ensures data is refreshed at least every 6–12 hours.
28
What does the `config:` block do in a table definition when declaring freshness?
It allows you to specify settings such as `freshness` or `loaded_at_field` specific to that table.
29
How can you check if your source freshness configurations are working as intended?
Run the `dbt source freshness` command and review the output for freshness status per table.
30
Where are sources declared in a dbt project?
In `.yml` files, typically under the `models/` directory, using a `sources:` key.
31
What fields are typically included when declaring a source in dbt?
Fields include `name`, optional `database`, optional `schema`, and a list of `tables`.
32
What does the `name` field represent in a source declaration?
It is the logical name for the source, used in the `{{ source() }}` function to reference the source.
33
When do you need to explicitly set the `schema` field in a source?
Only when the actual schema in the warehouse is different from the `name` of the source.
34
What does the `tables` field in a source declaration contain?
It contains a list of table definitions under the source, each with at least a `name`.
35
Can a dbt project have multiple sources in the same YAML file?
Yes, you can declare multiple sources under the same `sources:` key in a single YAML file.
36
What function is used in dbt to reference a source in a model?
The `{{ source(source_name, table_name) }}` function is used to reference source tables.
37
Why is declaring sources useful in dbt?
It allows for clear lineage tracking, easier testing, and better documentation of external data tables.
38
What is the purpose of tracking source data freshness in dbt?
To ensure pipelines are delivering up-to-date data and to support SLAs by monitoring data latency.
39
What command checks source freshness in dbt?
`dbt source freshness` evaluates whether declared sources meet configured freshness thresholds.
40
What configuration is required to enable freshness checks on a source table?
You must define a `freshness` block (with `warn_after` and/or `error_after`) and specify `loaded_at_field`.
41
What does the `loaded_at_field` config do in source freshness?
It specifies the timestamp column that indicates when a row was last loaded, which is used to compute freshness.
42
What do the `warn_after` and `error_after` settings in freshness do?
They define how old data can be before warnings or errors are triggered, using a count and time period.
43
Can table-level freshness override source-level freshness?
Yes, table-level configs take precedence over inherited source-level freshness settings.
44
How do you disable freshness checks for a specific source table?
Set `freshness: null` in that table's `config` block.
45
What query does dbt generate to check freshness?
A `SELECT max(loaded_at_field)` with a current timestamp comparison is used to assess freshness.
46
What does the `filter` config do in a freshness check?
It restricts the rows included in the freshness calculation to avoid full table scans, useful for large datasets.
47
How can you trigger downstream model builds only for fresh sources?
Use `dbt source freshness` followed by `dbt build --select source_status:fresher+`.
48
What is a recommended schedule for checking and reacting to freshness?
Run freshness checks every 30 minutes and rebuild models hourly based on the results.
49
How does dbt determine whether a source is fresh?
By comparing the latest `loaded_at_field` timestamp to the current time, based on defined thresholds.