Tests Flashcards

(81 cards)

1
Q

What is the primary purpose of data tests in dbt?

A

Data tests in dbt are used to make assertions about models and other resources like sources, seeds, and snapshots. They help ensure data integrity by verifying assumptions about the data, such as uniqueness or non-null values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does dbt determine if a data test passes or fails?

A

A data test passes if its corresponding SQL query returns zero failing records. If the query finds records that disprove the assertion, the test fails.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are some examples of built-in generic data tests in dbt?

A

dbt provides built-in generic data tests for checking non-null values, uniqueness, referential integrity (foreign key relationships), and values from a specified list.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the difference between a singular and a generic data test in dbt?

A

A singular data test is a custom SQL query saved in a .sql file to find failing records. A generic data test is a reusable, parameterized test defined in a test block and applied via .yml configuration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are generic data tests generally more common than singular data tests in dbt?

A

Generic data tests are reusable and flexible, allowing consistent testing across multiple models with minimal effort. Their reusability and configurability make them ideal for frequent use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a data test in dbt typically look for in its SQL query?

A

A data test SQL query looks for failing records—those that violate the test’s assertion. For instance, if testing for uniqueness, the query finds duplicate values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Where are singular data tests typically stored in a dbt project?

A

Singular data tests are stored as .sql files within the test directory of the dbt project.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How can you define a generic data test in dbt?

A

Generic data tests are defined using test blocks, similar to macros. They accept arguments and are referenced by name in .yml files attached to models, columns, or sources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What happens when a data test fails in dbt?

A

When a data test fails, dbt returns the set of rows that caused the failure. This helps identify exactly what data violated the assertion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can data tests be extended to fit specific business logic in dbt?

A

Yes, any assertion that can be expressed as a SQL SELECT query can be used as a data test, allowing teams to validate business-specific rules.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How are data tests executed in a dbt workflow?

A

Data tests are run using the dbt test command, which evaluates all defined tests and reports on their success or failure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the role of schema tests in dbt?

A

Schema tests, also known as generic data tests, validate schema-level properties like nullability, uniqueness, and relationships, and are defined declaratively in YAML files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does dbt test help prevent regressions in your codebase?

A

By running consistent tests on models, dbt ensures that data assumptions hold true even as underlying code changes, helping to catch unintended issues early.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What should a data test SQL query return to indicate a successful test?

A

It should return zero rows—meaning no failing records were found that violate the assertion.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a singular data test in dbt?

A

A singular data test is a custom SQL query stored in a .sql file that returns failing records for a specific assertion. It’s a one-off test created for a unique use case.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where are singular data tests typically stored in a dbt project?

A

They are stored in the tests directory defined by the test-paths config. Each file should contain one SELECT statement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What naming convention does dbt use for singular data tests?

A

The name of the test is derived from the file name, such as assert_total_payment_amount_is_positive.sql.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Can Jinja be used in singular data tests?

A

Yes, you can use Jinja syntax, including ref and source, in singular test SQL files just like in models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What should a singular test SQL query return?

A

It should return only the failing records. For example, a query that selects rows where total_amount < 0 to test for positive payment totals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why should you omit semicolons in singular data test SQL files?

A

Semicolons can cause execution errors during testing in dbt, so they should be left out of the final SELECT statement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How do you add a description to a singular data test?

A

You add it to a .yml file in the tests directory, specifying the test name and a description under data_tests.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is a generic data test in dbt?

A

A generic data test is a parameterized test defined using a test block. It can be reused across models and columns by passing different arguments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How is a generic test defined in dbt?

A

Generic tests are defined using a Jinja {% test %} block with parameters like model and column_name. The body contains a SELECT query using those parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How are generic data tests applied to dbt resources?

A

They are configured in YAML files under the properties of models, sources, seeds, or snapshots. You specify the test name and any required arguments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What makes a data test reusable or generic in dbt?
The use of parameterized inputs such as `model` and `column_name` allows the test to be applied in different contexts without rewriting the SQL.
26
Why might a singular test evolve into a generic test over time?
If you find yourself repeating a similar test structure for different models or columns, it’s better to refactor it into a generic form.
27
Should singular tests be referenced in model_name.yml files?
No, referencing singular tests in model YAML files causes errors, as they are not treated as generic tests or macros.
28
What is an example of a singular data test SQL query?
```\nselect\n order_id,\n sum(amount) as total_amount\nfrom {{ ref('fct_payments') }}\ngroup by 1\nhaving total_amount < 0\n```
29
What built-in generic data tests does dbt include out of the box?
dbt includes four built-in generic tests: `unique`, `not_null`, `accepted_values`, and `relationships`.
30
What does the `unique` test check in dbt?
It checks that each value in the specified column appears only once, ensuring no duplicates.
31
What is the purpose of the `not_null` test in dbt?
It ensures that the column being tested does not contain any NULL values.
32
What does the `accepted_values` test validate in dbt?
It checks that a column only contains values from a specified list, such as `['placed', 'shipped', 'completed', 'returned']`.
33
What does the `relationships` test validate in dbt?
It checks that every value in a column (usually a foreign key) exists in a specified column of another model, ensuring referential integrity.
34
How are generic data tests declared in a dbt project?
They are specified in YAML files under the `tests:` or `data_tests:` key, within the model and column definitions.
35
What YAML syntax is used to define a `relationships` test in dbt?
```\n- relationships:\n to: ref('customers')\n field: id\n```
36
What command runs all defined data tests in a dbt project?
You use the `dbt test` command, which evaluates all configured data tests.
37
What information does dbt provide after running `dbt test`?
It shows the number of models, tests, and the status of each test (PASS, WARN, ERROR, etc.) including execution time.
38
What does dbt's compiled SQL look like for a `unique` test?
It groups non-null values and filters those with count > 1 to find duplicates.\n```\nselect * from (\n select order_id from analytics.orders\n where order_id is not null\n group by order_id\n having count(*) > 1\n) validation_errors\n```
39
What does dbt's compiled SQL for a `not_null` test look like?
```\nselect * from analytics.orders\nwhere order_id is null\n```
40
What does the `--store-failures` flag do when running dbt tests?
It stores the failing records in a separate table in the database, useful for fast debugging.
41
What is the benefit of storing test failures during development?
It allows developers to quickly query and inspect the exact rows that caused the test to fail.
42
What happens to old test results when using store_failures in dbt?
The latest test results replace previous failure records for the same test.
43
What is the difference between `tests:` and `data_tests:` in dbt YAML files?
`tests:` is the legacy key for defining data tests; `data_tests:` is preferred in dbt v1.8 and later to distinguish from unit tests.
44
45
What are unit tests in dbt, and how do they differ from data tests?
Unit tests in dbt validate the SQL logic of a model using small static datasets before materializing the model. Unlike data tests, which assess the structure or quality of the data after building, unit tests operate on test inputs to verify logic.
46
What dbt version introduced unit tests?
Unit tests were introduced in dbt Core v1.8 and in the dbt Cloud 'Latest' release track.
47
What types of models can you unit test in dbt?
Only SQL models in your current dbt project are currently supported for unit testing.
48
What model configurations are not supported for unit testing in dbt v1.8?
Models using materialized views, recursive SQL, introspective queries, or unsupported model versions are not supported.
49
When should you consider adding a unit test to a dbt model?
When your model uses complex SQL logic (e.g., regex, date math, window functions), has a history of bugs, involves edge cases, or before refactoring.
50
What are some examples of SQL logic where unit tests are most beneficial?
Unit tests are useful for complex logic like window functions, `case when` structures, truncation, and custom data processing steps.
51
Why is test-driven development encouraged with dbt unit tests?
It allows developers to write logic with accompanying tests to validate expected outcomes before model deployment, increasing reliability and reducing regressions.
52
Where should unit tests be defined in a dbt project?
They must be defined in a YAML file inside the `models/` directory of your project.
53
What must you do to test join logic in unit tests?
All table names used in joins must be aliased in the SQL model definition.
54
What must be included in the unit test config when using `ref` or `source`?
You must specify all referenced models or sources as `inputs` in the test to avoid compilation errors.
55
What is the purpose of static inputs in dbt unit tests?
Static inputs allow you to test SQL transformations in isolation from production data, making the tests reproducible and controlled.
56
Where and when should unit tests be run according to dbt Labs?
Only in development or CI environments. They shouldn’t run in production since they use static inputs and add compute cost.
57
What environment variables or flags can be used to exclude unit tests from production?
Use the `--exclude-resource-type` flag or set `DBT_EXCLUDE_RESOURCE_TYPES` to prevent unit tests from running in production.
58
What adapter-specific limitations should be considered when writing unit tests?
For BigQuery, all fields in a STRUCT must be specified. Redshift requires sources to be in the same database as the model.
59
What does the unit test for `dim_customers` validate in the provided example?
It checks that the `is_valid_email_address` logic correctly identifies valid and invalid emails based on regex and domain matching.
60
What are some edge cases tested in the `is_valid_email_address` unit test?
The test includes emails without a dot, without an @ symbol, and with invalid domains.
61
How is test data provided in dbt unit tests?
Test data is defined in the `given:` block using inline `rows:` dictionaries, or can be loaded from CSV/SQL fixture files.
62
How are expected results defined in a dbt unit test?
The expected output is specified in the `expect:` block under `rows:` to show the expected results of the transformation.
63
What does the `--empty` flag do when preparing to run unit tests?
It builds an empty version of the input models (e.g., `stg_customers`) to satisfy test dependencies without incurring high warehouse costs.
64
What happens if the logic in the model is incorrect, as seen in the initial run of the `test_is_valid_email_address` unit test?
The test fails and dbt shows a diff between the actual and expected output, helping pinpoint the logic error.
65
How do you fix a logic issue discovered through a failed unit test?
Update the SQL logic (e.g., fix the regex), rerun the unit test, and verify that it now passes.
66
How do you run a specific unit test by name?
Use `dbt test --select test_is_valid_email_address` to run just that test.
67
What does the test run output show when a unit test fails?
It shows the actual vs. expected results, test name, time, and error summary to aid in debugging.
68
What should you verify before running a dbt unit test that uses `ref` inputs?
Ensure that all referenced models (e.g., `stg_customers`) are built or mocked, or run with `--empty` if testing in isolation.
69
Why did the `test_is_valid_email_address` initially fail even though the logic seemed correct?
The regex lacked proper escape characters, causing valid emails like `cool@example.com` to be marked invalid.
70
How can using fixture files instead of inline data help in unit testing?
Fixture files allow for reusable, organized test inputs and outputs, especially for large or complex datasets.
71
How does unit testing improve model reliability before production deployment?
It catches logical errors early, allows test-driven development, and ensures edge cases are handled correctly.
72
How can you unit test dbt incremental models in different modes?
You can override macros like `is_incremental` in the `overrides:` block of the unit test configuration to simulate either 'full refresh' (`false`) or 'incremental' (`true`) mode.
73
What does the `--empty` flag do for incremental model testing?
It builds an empty version of the model in the warehouse to satisfy dependencies without fully materializing the data.
74
When unit testing an incremental model, what should your expected output represent?
It should represent what will be inserted or merged (i.e., the result of the materialization), not the final state of the table.
75
What inputs are used in unit testing incremental models?
You can use `ref('events')` for new rows and `this` to simulate the current state of the model for incremental tests.
76
How do you simulate `is_incremental()` returning `false` in a unit test?
By using `overrides.macros` in your test config: `is_incremental: false`.
77
Can you currently test whether dbt performed the merge/insert correctly in an incremental model?
No, dbt does not yet support verifying whether records were actually merged/inserted; it only verifies the query logic.
78
How do you unit test a model that references an ephemeral model?
You must use `format: sql` in your unit test inputs and provide inline SQL that mimics the ephemeral model's output.
79
What format must be used in a unit test input for an ephemeral model?
Use `format: sql` with the actual SQL query defined in the `rows:` block.
80
What exit codes are used for unit tests in dbt?
Exit code 0 indicates a pass, and exit code 1 indicates a fail, regardless of how many records failed in the test.
81
How do unit test exit codes differ from data test exit codes in dbt?
Unit test exit codes reflect the pass/fail status of each test case, while data tests report the number of failing records.