Seeds Flashcards

Question 1

Q

What are seeds in dbt?

Answer

A

Seeds are CSV files stored in your dbt project (typically in the seeds directory) that dbt can load into your data warehouse using the dbt seed command.

Question 2

Q

How does dbt load seed files into your warehouse?

Answer

A

By running the dbt seed command, dbt reads CSVs from the seeds directory and creates corresponding tables in the target schema.

Question 3

Q

How can you reference a seed in a downstream model?

Answer

A

Use the ref function, just like with models, e.g., {{ ref('country_codes') }}.

Question 4

Q

What are some good use cases for seeds in dbt?

Answer

A

Examples include lookup tables like country codes to names, test email exclusion lists, and employee account IDs.

Question 5

Q

What are some poor use cases for seeds?

Answer

A

They include loading large raw data exports and handling sensitive production data like PII or passwords.

Question 6

Q

Where should seed CSV files be located in a dbt project?

Answer

A

In the seeds directory with a .csv file extension, e.g., seeds/country_codes.csv.

Question 7

Q

What does the dbt seed command output after successful execution?

Answer

A

It reports how many seeds were found, how many were successfully loaded, and how long it took.

Question 8

Q

Are seed files version-controlled?

Answer

A

Yes, since they are stored in the dbt repository, they benefit from version control and code review processes.

Question 9

Q

How do you configure seeds in dbt?

Answer

A

Seed configurations are set in dbt_project.yml and control properties like schema, delimiter, and quoting.

Question 10

Q

Can you document and test seeds in dbt?

Answer

A

Yes, using YAML properties, you can add documentation and schema tests to seed tables just like with models.

Question 11

Q

What happens if you change a seed file and rerun dbt seed?

Answer

A

The seed table in your data warehouse is updated with the new contents from the CSV.

Question 12

Q

Why are seeds considered best for static data?

Answer

A

Because seeds are version-controlled CSVs that should not change often, making them ideal for reference or lookup tables.

Question 13

Q

What schema does a seed table get created in by default?

Answer

A

The table is created in your target schema, as defined by your dbt profile and project configuration.

Question 14

Q

What is the primary benefit of using seeds instead of hand-coded lookup tables in SQL?

Answer

A

They provide a clean, maintainable, and version-controlled way to store and deploy reference data.

Seeds Flashcards

(14 cards)