materialized views Flashcards
(18 cards)
… (existing content unchanged) …
What is a materialized view in Lakeflow Declarative Pipelines?
A materialized view is a cached result of a query that is refreshed on a specified interval and can be queried like a table, providing faster access than standard views.
How does a materialized view stay up to date?
It tracks changes in upstream data and incrementally applies transformations on trigger based on a refresh interval, maintaining synchronization with source data.
What happens when a materialized view is created outside of a pipeline in Databricks SQL?
Databricks creates a hidden pipeline to manage the updates for that materialized view.
How are materialized views stored and managed in Databricks?
Metadata is managed by Unity Catalog and cached data is stored in cloud storage. System views support incremental update tracking.
What is an example use case for a materialized view in SQL?
Joining tables such as partners and sales to create a result table that is automatically kept up-to-date and optimized for query performance.
When do materialized views perform full recomputation instead of incremental updates?
If input changes are complex or can’t be computed incrementally, the materialized view may be fully recomputed for correctness.
What are the limitations of materialized views?
They may require full recomputation, are not suitable for low-latency use cases, and not all computations support incremental updates.
… (existing content unchanged) …
What is an incremental refresh in the context of materialized views?
An incremental refresh processes only the changes in the underlying data since the last refresh, rather than recomputing the entire query result.
When are materialized views incrementally refreshed in Databricks?
When the materialized view is supported by a serverless pipeline and its query supports incrementalization, otherwise a full refresh occurs.
What is a full refresh of a materialized view?
A full refresh recomputes the entire result from scratch, reprocessing all source data regardless of whether changes were detected.
How can you force a full refresh in Databricks SQL?
Use the SQL command REFRESH MATERIALIZED VIEW mv_name FULL
to override the default behavior and trigger a full recomputation.
Which data sources support incremental refresh of materialized views?
Delta tables, materialized views, streaming tables, and targets of AUTO CDC flows support incremental refresh if configured correctly.
What table properties enhance incremental refresh performance?
Enabling deletion vectors, row tracking, and change data feed on the source table improves incremental refresh capability.
What SQL constructs are supported in incrementally refreshed materialized views?
Supported constructs include SELECT expressions, GROUP BY, UNION ALL, INNER/OUTER JOINS, WHERE, HAVING, WITH, QUALIFY, and OVER clauses.
How can you check which refresh strategy was used in a materialized view update?
Query the event_log()
function for the materialized view and filter for event_type = 'planning_information'
.