3H. Optimize a model for performance in Power BI Flashcards

1
Q

How do report users experience a model that’s performing poorly?

A

From a report user’s perspective, poor performance is characterized by report pages that take longer to load and visuals taking more time to update. This poor performance results in a negative user experience.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two most common reasons for poor model performance?

A

As a data analyst, you will spend approximately 90 percent of your time working with your data, and nine times out of ten, poor performance is a direct result of a bad semantic model, bad Data Analysis Expressions (DAX), or the mix of the two.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is it good to design a good semantic model from the very start, even if it might take a bit longer?

A

If you address performance issues during development, you will have a robust Power BI semantic model that will return better reporting performance and a more positive user experience. Ultimately, you will also be able to maintain optimized performance. As your organization grows, the size of its data grows, and its semantic model becomes more complex. By optimizing your semantic model early, you can mitigate the negative impact that this growth might have on the performance of your semantic model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What’s usually the biggest factor impacting model performance, and how can you control it?

A

A smaller sized semantic model uses less resources (memory) and achieves faster data refresh, calculations, and rendering of visuals in reports. Therefore, the performance optimization process involves minimizing the size of the semantic model and making the most efficient use of the data in the model, which includes:

  • Ensuring that the correct data types are used.
  • Deleting unnecessary columns and rows.
  • Avoiding repeated values.
  • Replacing numeric columns with measures.
  • Reducing cardinalities.
  • Analyzing model metadata.
  • Summarizing data where possible.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a good first step to identifying bottlenecks in model performance?

A

You can use Performance analyzer in Power BI Desktop to help you find out how each of your report elements is performing when users interact with them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Performance analyzer do?

A

With it, you can determine how long it takes for a particular visual to refresh when it is initiated by a user interaction. Performance analyzer will help you identify the elements that are contributing to your performance issues, which can be useful during troubleshooting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do you have to do before running Performance analyzer?

A

You have to clear the visual cache and the data engine cache.

To do this, create a blank report page, select it, save and close the file, and re-open the file (now on the blank page).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you use Performance analyzer?

A

From a blank page, click Performance analyzer and select Start recording. Then move to the page you want to analyze, and interact with the elements of the report that you want to measure. When finished, click Stop.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

For what tasks involved in loading a given visual, does Performance analyzer record the time, and what are those tasks?

A
  • DAX query - The time it took for the visual to send the query, along with the time it took Analysis Services to return the results.
  • Visual display - The time it took for the visual to render on the screen, including the time required to retrieve web images or geocoding.
  • Other - The time it took the visual to prepare queries, wait for other visuals to complete, or perform other background processing tasks. If this category displays a long duration, the only real way to reduce this duration is to optimize DAX queries for other visuals, or reduce the number of visuals in the report.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If visuals are the bottleneck in model performance, how can you address this?

A

Consider the number of visuals on the report page; fewer visuals means better performance. Ask yourself if a visual is really necessary and if it adds value to the end user. If the answer is no, you should remove that visual. Rather than using multiple visuals on the page, consider other ways to provide additional details, such as drill-through pages and report page tooltips.

Examine the number of fields in each visual. The more visuals you have on the report, the higher chance for performance issues. In addition, the more visuals, the more the report can appear crowded and lose clarity. The upper limit for visuals is 100 fields (measures or columns), so a visual with more than 100 fields will be slow to load. Ask yourself if you really need all of this data in a visual. You might find that you can reduce the number of fields that you currently use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If a DAX query is the bottleneck in model performance, how can you address this, and what is a good benchmark for determining if a query is too slow?

A

A good starting point is any DAX query that is taking longer than 120 milliseconds.

To shorten query time, try rewriting the measure with different functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What might be a more resource efficient alternative to FILTER?

A

KEEPFILTERS() which keeps filters, instead of FILTER which removes context filters and iterates again over all rows of a table, thus requiring more calculations if filters are already applied separately to a visual.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How might relationships affect model performance?

A

You should review the relationships between your tables to ensure that you have established the correct relationships. Check that relationship cardinality properties are correctly configured. For example, a one-side column that contains unique values might be incorrectly configured as a many-side column.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How might columns affect model performance?

A

It is best practice to not import columns of data that you do not need. To avoid deleting columns in Power Query Editor, you should try to deal with them at the source when loading data into Power BI Desktop. However, if it is impossible to remove redundant columns from the source query or the data has already been imported in its raw state, you can always use Power Query Editor to examine each column. Ask yourself if you really need each column and try to identify the benefit that each one adds to your semantic model. If you find that a column adds no value, you should remove it from your semantic model.

When you remove an unnecessary column, you will reduce the size of the semantic model which, in turn, results in a smaller file size and faster refresh time. Also, because the semantic model contains only relevant data, the overall report performance will be improved.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is metadata and how can it affect model performance?

A

Metadata is information about other data. Power BI metadata contains information on your semantic model, such as the name, data type and format of each of the columns, the schema of the database, the report design, when the file was last modified, the data refresh rates, and much more.

When you load data into Power BI Desktop, it is good practice to analyze the corresponding metadata so you can identify any inconsistences with your semantic model and normalize the data, or identify errors and incorrect data types, amount of data, etc., before you start to build reports.

You can also look at the pbix metadata to see the size of your model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the auto date/time feature, and how can it impact performance?

A

Another item to consider when optimizing performance is the Auto date/time option in Power BI Desktop. By default, this feature is enabled globally, which means that Power BI Desktop automatically creates a hidden calculated table for each date column, provided that certain conditions are met. The new, hidden tables are in addition to the tables that you already have in your semantic model.

The Auto date/time option allows you to work with time intelligence when filtering, grouping, and drilling down through calendar time periods. We recommend that you keep the Auto date/time option enabled only when you work with calendar time periods and when you have simplistic model requirements in relation to time.

If your data source already defines a date dimension table, that table should be used to consistently define time within your organization, and you should disable the global Auto date/time option. Disabling this option can lower the size of your semantic model and reduce the refresh time.

17
Q

What are the advantages to using variables when writing DAX formulas?

A
  • Improved performance - Variables can make measures more efficient because they remove the need for Power BI to evaluate the same expression multiple times. You can achieve the same results in a query in about half the original processing time.
  • Improved readability - Variables have short, self-describing names and are used in place of an ambiguous, multi-worded expression. You might find it easier to read and understand the formulas when variables are used.
  • Simplified debugging - You can use variables to debug a formula and test expressions, which can be helpful during troubleshooting.
  • Reduced complexity - Variables do not require the use of EARLIER or EARLIEST DAX functions, which are difficult to understand. These functions were required before variables were introduced, and were written in complex expressions that introduced new filter contexts. Now that you can use variables instead of those functions, you can write fewer complex formulas.
18
Q

What is best practice when using variables to improve readability?

A

When using variables, it is best practice to use descriptive names for the variables. In the previous example, the variable is called SalesPriorYear, which clearly states what the variable is calculating. Consider the outcome of using a variable that was called X, temp or variable1; the purpose of the variable would not be clear at all.

Using clear, concise, meaningful names will help make it easier for you to understand what you are trying to calculate, and it will be much simpler for other developers to maintain the report in the future.

19
Q

What is cardinality of columns, and how can it impact performance?

A

A column that has a lot of repeated values in its range (unique count is low) will have a low level of cardinality. Conversely, a column that has a lot of unique values in its range (unique count is high) will have a high level of cardinality.

Lower cardinality leads to more optimized performance, so you might need to reduce the number of high cardinally columns in your semantic model.

20
Q

What is another word for dimension table?

A

Lookup table

21
Q

What column settings could cause a relationship between two tables to break a model?

A

Always ensure that both of the columns that you are using to participate in a relationship are sharing the same data type. Your model will never work if you try to build a relationship between two columns, where one column has a text data type and another column has an integer data type. Note, however, that relationships can be created between columns that contain different data types in Power BI Desktop.

22
Q

Which data type is better for performance when it comes to columns that create relationships between tables?

A

The columns with data type Integer perform better than columns with data type Text.

23
Q

Why might it be worthwhile to try and reduce cardinality (in this case you can read granularity) of a report even if it looks to be a small data set?

A

You might expect it to grow over time.

24
Q

What is perhaps the most effective technique to reduce model size (by reducing cardinality/granularity)?

A

Using a summary table from the data source.

25
Q

What is the trade-off to improving model performance by reducing cardinality/granularity?

A

Deciding to summarize fact-type data will always involve a tradeoff with the detail of your data. A disadvantage is that you might lose the ability to drill into data because the detail no longer exists.

26
Q

How can you mitigate the loss of being able to trill down into data when wanting to use summary data to reduce cardinality/granularity?

A

In short, have on Import summary table, and one DirectQuery detailed fact table, and use them for separate report pages to answer different questions.

In Power BI Desktop, a Mixed mode design produces a composite model. Essentially, it allows you to determine a storage mode for each table. Therefore, each table can have its Storage Mode property set as Import or DirectQuery.

An effective technique to reduce the model size is to set the Storage Mode property for larger fact-type tables to DirectQuery. This design approach can work well in conjunction with techniques that are used to summarize your data. For example, the summarized sales data could be used to achieve high performance “summary” reporting. A drill-through page could be created to display granular sales for specific (and narrow) filter context, displaying all in-context sales orders. The drill-through page would include visuals based on a DirectQuery table to retrieve the sales order data (sales order details).

27
Q

What are big factors in determining performance of DirectQuery models?

A
  • The overall user experience depends heavily on the performance of the underlying data source. Slow query response times will lead to a negative user experience and, in the worst-case scenarios, queries might time out.
  • Also, the number of users who are opening the reports at any one time will impact the load that is placed on the data source. For example, if your report has 20 visuals in it and 10 people are using the report, 200 queries or more will exist on the data source because each visual will issue one or more queries.
  • Network latency; faster networks return data quicker.
  • The performance of the data source’s server and how many other workloads are on that server. For example, consider the implications of a server refresh taking place while hundreds of people are using the same server for different reasons.
28
Q

Import is the preferred storage mode, but why might you sometimes want to use DirectQuery instead?

A
  • It is suitable in cases where data changes frequently and near real-time reporting is required.
  • It can handle large data without the need to pre-aggregate.
  • It applies data sovereignty restrictions to comply with legal requirements.
  • It can be used with a multidimensional data source that contains measures such as SAP Business Warehouse (BW).
29
Q

An advantage to using DirectQuery is live, up-to-date data in reports and visuals. What is one exception?

A

You can pin visuals, or entire report pages, as dashboard tiles. The tiles are automatically refreshed on a schedule, for example, every hour. You can control the frequency of this refresh to meet your requirements. When you open a dashboard, the tiles reflect the data at the time of the last refresh and might not include the latest changes that are made to the underlying data source. You can always refresh an open dashboard to ensure that it’s up-to-date.

30
Q

What are the high-level limitations of DirectQuery connections?

A
  • Performance - As previously discussed, your overall user experience depends heavily on the performance of the underlying data source.
  • Security - If you use multiple data sources in a DirectQuery model, it is important to understand how data moves between the underlying data sources and the associated security implications. You should also identify if security rules are applicable to the data in your underlying source because, in Power BI, every user can see that data.
  • Data transformation - Compared to imported data, data that is sourced from DirectQuery has limitations when it comes to applying data transformation techniques within Power Query Editor. For example, if you connect to an OLAP source, such as SAP BW, you can’t make any transformations at all; the entire external model is taken from the data source. If you want to make any transformations to the data, you will need to do this in the underlying data source.
  • Modeling - Some of the modeling capabilities that you have with imported data aren’t available, or are limited, when you use DirectQuery.
  • Reporting – Almost all the reporting capabilities that you have with imported data are also supported for DirectQuery models, provided that the underlying source offers a suitable level of performance. However, when the report is published in Power BI service, the Quick Insights and Q&A features are not supported. Also, the use of the Explore feature in Excel will likely result in poorer performance.
31
Q

What is the first step to improving performance of DirectQuery models/connections?

A

Improve the underlying data source itself. Will likely require working together with a data engineer.

Consider the use of the following standard database practices that apply to most situations:
- Avoid the use of complex calculated columns because the calculation expression will be embedded into the source queries. It is more efficient to push the expression back to the source because it avoids the push down. You could also consider adding surrogate key columns to dimension-type tables.
- Review the indexes and verify that the current indexing is correct. If you need to create new indexes, ensure that they are appropriate.

32
Q

Beyond optimizing the data source, how can you improve the performance of DirectQuery models?

A

The same way as with an Import model (i.e. start by running Performance analyzer, etc., etc.).

You can also use Query Reduction (File > Options and settings > Options, scrolling down the page, and then selecting the Query reduction option). The following query reduction options are available:
- Reduce number of queries sent by - By default, every visual interacts with every other visual. Selecting this check box disables that default interaction. You can then optionally choose which visuals interact with each other by using the Edit interactions feature.
- Slicers - By default, the Instantly apply slicer changes option is selected. To force the report users to manually apply slicer changes, select the Add an apply button to each slicer to apply changes when you’re ready option.
- Filters - By default, the Instantly apply basic filter changes option is selected. To force the report users to manually apply filter changes, select one of the alternative options:
- Add an apply button to all basic filters to apply changes when you’re ready
- Add a single apply button to the filter pane to apply changes at once (preview)

Applying these options prevents queries from continuously hitting the data source, which should improve performance.

33
Q

Why might your organization decide to use aggregations in their models?

A
  • If you are dealing with a large amount of data (big data), aggregations will provide better query performance and help you analyze and reveal the insights of this large data. Aggregated data is cached and, therefore, uses a fraction of the resources that are required for detailed data.
  • If you are experiencing a slow refresh, aggregations will help you speed up the refresh process. The smaller cache size reduces the refresh time, so data gets to users faster. Instead of refreshing what could be millions of rows, you would refresh a smaller amount of data instead.
  • If you have a large semantic model, aggregations can help you reduce and maintain the size of your model.
  • If you anticipate your semantic model growing in size in the future, you can use aggregations as a proactive step toward future proofing your semantic model by lessening the potential for performance and refresh issues and overall query problems.
34
Q

There are different ways of creating aggregation tables, that will all yield the same table. Which are these?

A
  • If you have access to the database, you could create a table with the aggregation and then import that table into Power BI Desktop.
  • If you have access to the database, you could create a view for the aggregation and then import that view into Power BI Desktop.
  • In Power BI Desktop, you can use Power Query Editor to create the aggregations step-by-step.
35
Q

What is an easy way to create an aggregation table in Power Query?

A

When the selected columns display on the page, select the Group By option on the Home tab. On the window that displays, select the column that you want to group by and enter a name for the new column.

Select the Advanced option and then select the Add aggregation button to display another column row. Enter a name for the aggregation column, select the operation of the column, and then select the column to which you want to link the aggregation. Repeat these steps until you have added all the aggregations and then select OK.

36
Q

How can you manage/edit your aggregation tables?

A

You can open the Manage Aggregations window from any view in Power BI Desktop. In the Fields pane, right-click the table and then select Manage aggregations.

For each aggregation column, you can select an option from the Summarization drop-down list and make changes to the selected detail table and column. When you are finished managing the aggregations, select Apply All.