Seeds Flashcards
(14 cards)
What are seeds in dbt?
Seeds are CSV files stored in your dbt project (typically in the seeds
directory) that dbt can load into your data warehouse using the dbt seed
command.
How does dbt load seed files into your warehouse?
By running the dbt seed
command, dbt reads CSVs from the seeds
directory and creates corresponding tables in the target schema.
How can you reference a seed in a downstream model?
Use the ref
function, just like with models, e.g., {{ ref('country_codes') }}
.
What are some good use cases for seeds in dbt?
Examples include lookup tables like country codes to names, test email exclusion lists, and employee account IDs.
What are some poor use cases for seeds?
They include loading large raw data exports and handling sensitive production data like PII or passwords.
Where should seed CSV files be located in a dbt project?
In the seeds
directory with a .csv
file extension, e.g., seeds/country_codes.csv
.
What does the dbt seed
command output after successful execution?
It reports how many seeds were found, how many were successfully loaded, and how long it took.
Are seed files version-controlled?
Yes, since they are stored in the dbt repository, they benefit from version control and code review processes.
How do you configure seeds in dbt?
Seed configurations are set in dbt_project.yml
and control properties like schema, delimiter, and quoting.
Can you document and test seeds in dbt?
Yes, using YAML properties, you can add documentation and schema tests to seed tables just like with models.
What happens if you change a seed file and rerun dbt seed
?
The seed table in your data warehouse is updated with the new contents from the CSV.
Why are seeds considered best for static data?
Because seeds are version-controlled CSVs that should not change often, making them ideal for reference or lookup tables.
What schema does a seed table get created in by default?
The table is created in your target schema, as defined by your dbt profile and project configuration.
What is the primary benefit of using seeds instead of hand-coded lookup tables in SQL?
They provide a clean, maintainable, and version-controlled way to store and deploy reference data.