3H. Optimize a model for performance in Power BI Flashcards
(36 cards)
How do report users experience a model that’s performing poorly?
From a report user’s perspective, poor performance is characterized by report pages that take longer to load and visuals taking more time to update. This poor performance results in a negative user experience.
What are the two most common reasons for poor model performance?
As a data analyst, you will spend approximately 90 percent of your time working with your data, and nine times out of ten, poor performance is a direct result of a bad semantic model, bad Data Analysis Expressions (DAX), or the mix of the two.
Why is it good to design a good semantic model from the very start, even if it might take a bit longer?
If you address performance issues during development, you will have a robust Power BI semantic model that will return better reporting performance and a more positive user experience. Ultimately, you will also be able to maintain optimized performance. As your organization grows, the size of its data grows, and its semantic model becomes more complex. By optimizing your semantic model early, you can mitigate the negative impact that this growth might have on the performance of your semantic model.
What’s usually the biggest factor impacting model performance, and how can you control it?
A smaller sized semantic model uses less resources (memory) and achieves faster data refresh, calculations, and rendering of visuals in reports. Therefore, the performance optimization process involves minimizing the size of the semantic model and making the most efficient use of the data in the model, which includes:
- Ensuring that the correct data types are used.
- Deleting unnecessary columns and rows.
- Avoiding repeated values.
- Replacing numeric columns with measures.
- Reducing cardinalities.
- Analyzing model metadata.
- Summarizing data where possible.
What is a good first step to identifying bottlenecks in model performance?
You can use Performance analyzer in Power BI Desktop to help you find out how each of your report elements is performing when users interact with them.
What does Performance analyzer do?
With it, you can determine how long it takes for a particular visual to refresh when it is initiated by a user interaction. Performance analyzer will help you identify the elements that are contributing to your performance issues, which can be useful during troubleshooting.
What do you have to do before running Performance analyzer?
You have to clear the visual cache and the data engine cache.
To do this, create a blank report page, select it, save and close the file, and re-open the file (now on the blank page).
How do you use Performance analyzer?
From a blank page, click Performance analyzer and select Start recording. Then move to the page you want to analyze, and interact with the elements of the report that you want to measure. When finished, click Stop.
For what tasks involved in loading a given visual, does Performance analyzer record the time, and what are those tasks?
- DAX query - The time it took for the visual to send the query, along with the time it took Analysis Services to return the results.
- Visual display - The time it took for the visual to render on the screen, including the time required to retrieve web images or geocoding.
- Other - The time it took the visual to prepare queries, wait for other visuals to complete, or perform other background processing tasks. If this category displays a long duration, the only real way to reduce this duration is to optimize DAX queries for other visuals, or reduce the number of visuals in the report.
If visuals are the bottleneck in model performance, how can you address this?
Consider the number of visuals on the report page; fewer visuals means better performance. Ask yourself if a visual is really necessary and if it adds value to the end user. If the answer is no, you should remove that visual. Rather than using multiple visuals on the page, consider other ways to provide additional details, such as drill-through pages and report page tooltips.
Examine the number of fields in each visual. The more visuals you have on the report, the higher chance for performance issues. In addition, the more visuals, the more the report can appear crowded and lose clarity. The upper limit for visuals is 100 fields (measures or columns), so a visual with more than 100 fields will be slow to load. Ask yourself if you really need all of this data in a visual. You might find that you can reduce the number of fields that you currently use.
If a DAX query is the bottleneck in model performance, how can you address this, and what is a good benchmark for determining if a query is too slow?
A good starting point is any DAX query that is taking longer than 120 milliseconds.
To shorten query time, try rewriting the measure with different functions.
What might be a more resource efficient alternative to FILTER?
KEEPFILTERS() which keeps filters, instead of FILTER which removes context filters and iterates again over all rows of a table, thus requiring more calculations if filters are already applied separately to a visual.
How might relationships affect model performance?
You should review the relationships between your tables to ensure that you have established the correct relationships. Check that relationship cardinality properties are correctly configured. For example, a one-side column that contains unique values might be incorrectly configured as a many-side column.
How might columns affect model performance?
It is best practice to not import columns of data that you do not need. To avoid deleting columns in Power Query Editor, you should try to deal with them at the source when loading data into Power BI Desktop. However, if it is impossible to remove redundant columns from the source query or the data has already been imported in its raw state, you can always use Power Query Editor to examine each column. Ask yourself if you really need each column and try to identify the benefit that each one adds to your semantic model. If you find that a column adds no value, you should remove it from your semantic model.
When you remove an unnecessary column, you will reduce the size of the semantic model which, in turn, results in a smaller file size and faster refresh time. Also, because the semantic model contains only relevant data, the overall report performance will be improved.
What is metadata and how can it affect model performance?
Metadata is information about other data. Power BI metadata contains information on your semantic model, such as the name, data type and format of each of the columns, the schema of the database, the report design, when the file was last modified, the data refresh rates, and much more.
When you load data into Power BI Desktop, it is good practice to analyze the corresponding metadata so you can identify any inconsistences with your semantic model and normalize the data, or identify errors and incorrect data types, amount of data, etc., before you start to build reports.
You can also look at the pbix metadata to see the size of your model.
What is the auto date/time feature, and how can it impact performance?
Another item to consider when optimizing performance is the Auto date/time option in Power BI Desktop. By default, this feature is enabled globally, which means that Power BI Desktop automatically creates a hidden calculated table for each date column, provided that certain conditions are met. The new, hidden tables are in addition to the tables that you already have in your semantic model.
The Auto date/time option allows you to work with time intelligence when filtering, grouping, and drilling down through calendar time periods. We recommend that you keep the Auto date/time option enabled only when you work with calendar time periods and when you have simplistic model requirements in relation to time.
If your data source already defines a date dimension table, that table should be used to consistently define time within your organization, and you should disable the global Auto date/time option. Disabling this option can lower the size of your semantic model and reduce the refresh time.
What are the advantages to using variables when writing DAX formulas?
- Improved performance - Variables can make measures more efficient because they remove the need for Power BI to evaluate the same expression multiple times. You can achieve the same results in a query in about half the original processing time.
- Improved readability - Variables have short, self-describing names and are used in place of an ambiguous, multi-worded expression. You might find it easier to read and understand the formulas when variables are used.
- Simplified debugging - You can use variables to debug a formula and test expressions, which can be helpful during troubleshooting.
- Reduced complexity - Variables do not require the use of EARLIER or EARLIEST DAX functions, which are difficult to understand. These functions were required before variables were introduced, and were written in complex expressions that introduced new filter contexts. Now that you can use variables instead of those functions, you can write fewer complex formulas.
What is best practice when using variables to improve readability?
When using variables, it is best practice to use descriptive names for the variables. In the previous example, the variable is called SalesPriorYear, which clearly states what the variable is calculating. Consider the outcome of using a variable that was called X, temp or variable1; the purpose of the variable would not be clear at all.
Using clear, concise, meaningful names will help make it easier for you to understand what you are trying to calculate, and it will be much simpler for other developers to maintain the report in the future.
What is cardinality of columns, and how can it impact performance?
A column that has a lot of repeated values in its range (unique count is low) will have a low level of cardinality. Conversely, a column that has a lot of unique values in its range (unique count is high) will have a high level of cardinality.
Lower cardinality leads to more optimized performance, so you might need to reduce the number of high cardinally columns in your semantic model.
What is another word for dimension table?
Lookup table
What column settings could cause a relationship between two tables to break a model?
Always ensure that both of the columns that you are using to participate in a relationship are sharing the same data type. Your model will never work if you try to build a relationship between two columns, where one column has a text data type and another column has an integer data type. Note, however, that relationships can be created between columns that contain different data types in Power BI Desktop.
Which data type is better for performance when it comes to columns that create relationships between tables?
The columns with data type Integer perform better than columns with data type Text.
Why might it be worthwhile to try and reduce cardinality (in this case you can read granularity) of a report even if it looks to be a small data set?
You might expect it to grow over time.
What is perhaps the most effective technique to reduce model size (by reducing cardinality/granularity)?
Using a summary table from the data source.