Sources Flashcards
(49 cards)
What are sources in dbt used for?
Sources allow you to declare external tables loaded by EL tools and define lineage, test assumptions, and track freshness in your dbt project.
How do you declare a source in dbt?
Define it in a YAML file under a sources:
key, specifying the source name, database, schema, and its tables.
What does the {{ source() }}
function do in dbt?
It references a declared source table, creating a dependency and ensuring proper lineage and compilation.
What does dbt compile {{ source('jaffle_shop', 'orders') }}
to?
It compiles to the fully qualified name of the source table, e.g., raw.jaffle_shop.orders
.
Can you document and test sources in dbt?
Yes, you can add descriptions and data tests in YAML just like for models.
What kind of tests can be applied to source columns?
You can apply unique
, not_null
, and other data tests to ensure data integrity.
What is source freshness in dbt?
It checks how recently a table’s data was updated, helping ensure pipeline timeliness and reliability.
How do you enable source freshness checks?
Add a freshness
block and specify loaded_at_field
in the source or table config.
What values can be set in the freshness
block?
You can set warn_after
and/or error_after
with a count and time period (e.g., hours).
What command checks source freshness in dbt?
Run dbt source freshness
to evaluate the freshness of declared source tables.
What query does dbt run behind the scenes for freshness checks?
It selects max(loaded_at_field)
and compares it to the current time to determine data age.
How can you filter rows in freshness checks to avoid full table scans?
Use the filter
config with a WHERE clause (e.g., loaded_at_field >= date_sub(...)
) to limit rows.
What happens when a freshness threshold is violated?
dbt marks the table as stale and logs a warning or error depending on your configuration.
What command builds models downstream of fresher sources?
Use dbt build --select source_status:fresher+
to rebuild only models with fresh source data.
How do freshness configs cascade in dbt?
Top-level source configs apply to all tables unless overridden at the table level.
What is the role of loaded_at_field
in source freshness?
It indicates the column used to track when a source row was last loaded. It’s required for freshness checks.
Can you skip freshness checks for specific tables?
Yes, set freshness: null
in the table’s config to exclude it from freshness evaluation.
What is the benefit of defining source freshness snapshots?
They allow dbt to periodically check source freshness and trigger builds only when needed.
Why is it important to define the schema and database in source declarations?
It ensures that dbt can correctly locate and access the external tables, especially in multi-database environments.
How do you declare source freshness in dbt?
You add a freshness
block under the source or table config, and specify a loaded_at_field
to track when data was last loaded.
What parameters can be included in a freshness
block?
You can set warn_after
and/or error_after
with a count
and period
(e.g., hours).
What is the purpose of loaded_at_field
in source freshness?
It specifies the timestamp column dbt uses to calculate the freshness of the data.
What happens if loaded_at_field
is not defined for a table?
dbt will not calculate freshness for that table, even if freshness thresholds are set.
Can you override the default freshness config for a specific table?
Yes, you can define a separate freshness
config in a specific table block, which will override the source-level config.