Model - Design a data model Flashcards

(133 cards)

1
Q

What is a data model?

A

A way to organise tables from data/information systems in a way which makes it easier for people to understand your data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Benefits of a good data model?

A

Data exploration is faster
Aggregations are simpler to build

Power BI Reports
Reports are more accurate
Writing reports takes less time
Reports are easier to maintain in the future

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain how data models make data easier to understand?

A

One table with 30 columns would be harder to work with compared to those 30 columns being logically decomposed into 5 data tables. Alternatively, 30 tables with limited columns would be better illustrated in 5 tables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is best practice for data models?

A

Aim for simplicity: make the data model as intuitive as possible for the user (e.g., could a lay person understand?)
Make the data model as small as possible (i.e., fewer tables and fewer columns)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are relationships in a data model?

A

Relationships are defined between tables through Primary and Foreign keys.
Primary key = columns that identify unique values
Foreign keys = reference rows or primary keys in another table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Power BI’s data modelling capability?

A

You can pull data from different data sources, create relationships between those tables and treat it as a unified dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the data schema components?

A

Tables: contain fields and values

Relationships between tables: primary keys and foreign keys

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a database schema?

A

A database schema defines how data is organized within a relational database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a star schema?

A

A specific type of schema design that is optimized for high performance and usability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Components of a star schema?

A

Tables are conceptually classified into two types:

Fact table
- Dimension table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Difference between measures and dimensions?

A

Dimensions contain qualitative values (such as names, dates, or geographical data). You can use dimensions to categorize, segment, and reveal the details in your data. Dimensions affect the level of detail in the view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the differences between fact and dimension tables?

A

Fact table

  • Observational/event data
  • Contains measures and numbers
  • Distinct values in multiple rows

Dimension table

  • Contains details about the fact table
  • Unique values appear in one row
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the relationship between fact and dimension tables?

A

Fact tables are filtered by dimension tables

Example: Total sales can be filtered by the products

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What makes a simple table structure?

A

Simple to navigate and user friendly = logical column and table properties
Have merged or appended tables to simplify the tables within your data structure.
Have good-quality relationships between tables that make sense.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to configure data model and build relationships between tables?

A

Manage relationships

Configure relationships between tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain the autodetect feature?

A

When you load you Queries into Power BI it will automatically establish relationships between columns based on column name.

When you load data, Power BI automatically looks for relationships that exist within the data by matching column names.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How to configure table and column properties?

A

Manage properties

Shortcut: Ctrl+clicking or Shift+clicking items on this page.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the main sections within the table properties pane?

A

General tab
Formatting tab
Advanced tab

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How to configure many tables and fields at once?

A

Select many fields to configure at once: Power BI has a new functionality to update these properties on many tables and fields by Ctrl+clicking or Shift+clicking items.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why is creating date tables useful?

A

Common business requirement to make calculations based on data and time

Businesses want to know how their business are doing over months, quarters and fiscal years etc

It is crucial to format time-oriented values correctly to enable time-based calculations

Some businesses might prefer date/time in particular format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Challenges with autodetect on date tables?

A

Power BI autodetects for data columns and tables however sometimes you will need to take extra steps to format the dates as the organisation requires

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why create a common date table?

A

Some tables have their own dates fields so you will need to create a common date table to enable interaction between the tables à need to build a common date table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the best practice around date tables?

A

Use a source date table because it is likely logically structured and shared with other applications

Source databases and data warehouses have their own date tables. These date tables can be used to perform tasks such as:

Identify company holidays
Separate calendar and fiscal year
Identify weekends versus weekdays

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the 3 ways to create a common date table?

A

Source data
DAX
Power Query

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
How to use DAX to create a date table?
Use DAX functions to build your common date table. CALENDAR(): returns a contiguous range of dates based on a start and end date that are entered as arguments in the function CALENDARAUTO(): returns a contiguous, complete range of dates that are automatically determined from your dataset
26
How to mark as an official date table?
Right click on fields pane | Data table settings: choose specific columns to be marked as the date
27
What does marking as date table do?
Power BI performs validations to ensure: that the data contains zero null values is unique, and contains continuous date values over a period.
28
What is auto hierarchy?
When you are building visuals Power BI automatically enters values of the date type as a hierarchy (if not marked as date table)
29
How to remove auto hierarchy?
- Selecting Mark as date table will remove autogenerated hierarchies from the Date field in the table that you marked as a date table. - Or establish a relationship between that field and the date table - Turn off the Auto Date/Time feature.
30
How to establish a new date hierarchy?
You can manually add a hierarchy to your common date table by right-clicking the year, month, week, or day columns in the Fields pane and then selecting New hierarchy.
31
How can building a visual rely on a common date table?
You will need to connect your existing date columns with the new common date table which will enable the visualisation to work
32
What are dimensions?
Dimensions are details or descriptions collected about events connected in fact tables
33
What are hierarchies?
Data hierarchy refers to the systematic organization of data layers such as a parent-child relationships or tree structure.
34
How do hierarchies relate to dimensions?
You can use hierarchies to help find detail in dimension tables
35
What are hierarchies?
Natural segments in data that are capable of being decomposed Systemic layers such as parent-child relationships or tree structures
36
What is a parent-child hierarchy?
This is a situation when a natural hierarchy exists in your data. For example multiple employees can have the same manager. A specific column will determine the hierarchy One column is only one level
37
What is flattening the parent-child hierarchy?
The process of viewing multiple child levels based on a top-level parent is known as flattening the hierarchy. These uses multiple columns to indicate multiple levels Flatten the hierarchy so you can see multiple individual levels In this process, you are creating multiple columns in a table to show the hierarchical path of the parent to the child in the same record.
38
How to use DAX to flatten the parent-child hierarchy?
PATH function - Returns a delimited text with the identifiers of all the parents to the current row, starting with the oldest or top most until current.
39
What is a role-playing dimension?
Role-playing dimensions have multiple valid relationships with fact tables, meaning that the same dimension can be used to filter multiple columns or tables of data.
40
Why are role-playing dimensions important to understand?
As a result, you can filter data differently depending on what information you need to retrieve
41
What is data granularity?
Data granularity is the level of detail that is represented within your data time-based detail Defining the correct data granularity can have a big impact on the performance and usability of your Power BI reports and visuals.
42
What are the implications of data granularity?
Slower Reports: Generally, the fewer the records that you are working with, the faster your reports and visuals will function. This approach translates to a faster refresh rate for the entire dataset, which might mean that you can refresh more frequently. More scope for analysis: if your users want to drill into every single transaction, summarizing the granularity will prevent them from doing that, which can have a negative impact on the user experience. It is important to negotiate the level of data granularity with report users so they understand the implications of these choices.
43
Why do you need to change data granularity to build a relationship?
Incongruent data granularities need to be reconciled before you build relationships between tables (e.g., weekly vs monthly time). Solution is to transform the granularities so they have common values. This is so you can match the granularity format
44
Why check autodetect relationships?
You need to make sure that the relationships accurately reflect those that exist in your data
45
What is data cardinality?
Within data modelling, the cardinality of a join between two tables is the numerical relationship between rows of one table and rows in the other.
46
What is cardinality best practice?
Avoid one-to-one: Is not recommended because this relationship stores redundant information and suggests that the model is not designed correctly. It is better practice to combine the tables. Avoid many-to-many: a lack of unique values introduces ambiguity and your users might not know which column of values is referring to what.
47
Explain the types of cardinality within Power BI?
Many-to-one (*:1) or one-to-many (1: *) cardinality: Describes a relationship in which you have many instances of a value in one column that are related to only one unique corresponding instance in another column. Describes the directionality between fact and dimension tables. Is the most common type of directionality and is the Power BI default when you are automatically creating relationships. One-to-one (1:1) cardinality: Describes a relationship in which only one instance of a value is common between two tables. Requires unique values in both tables. Many-to-many (.) cardinality: Describes a relationship where many values are in common between two tables. Does not require unique values in either table in a relationship.
48
What is cross-filter direction?
Data can be filtered on one or both sides of a relationship.
49
Explain the two different approaches to cross-filtering?
With a single cross-filter direction: Only one table in a relationship can be used to filter the data. For instance, Table 1 can be filtered by Table 2, but Table 2 cannot be filtered by Table 1. For a one-to-many or many-to-one relationship, the cross-filter direction will be from the "one" side, meaning that the filtering will occur in the table that has many values. With both cross-filter directions or bi-directional cross-filtering: One table in a relationship can be used to filter the other. For instance, a dimension table can be filtered through the fact table, and the fact tables can be filtered through the dimension table. You might have lower performance when using bi-directional cross-filtering with many-to-many relationships.
50
What is best practice for relationships and cardinality?
A word of caution regarding bi-directional cross-filtering: You should not enable bi-directional cross-filtering relationships unless you fully understand the ramifications of doing so. Enabling it can lead to ambiguity, over-sampling, unexpected results, and potential performance degradation. Arrows should point to fact tables many-to-many relationships and/or bi-directional relationships are complicated. Unless you are certain what your data looks like when aggregated, these types of open-ended relationships with multiple filtering directions can introduce multiple paths through the data.
51
Explain the link between cardinality and cross-filter direction?
One-to-one relationships: The only option available is bi-directional cross filtering Data can be filtered on either side of the relationship and result in only one value (e.g. they both return the same value both sides of the relationship) Many-to-many relationships: you can choose to filter in a single direction or in both directions by using bi-directional cross-filtering The ambiguity that is associated with bi-directional cross-filtering is amplified in a many-to-many relationship because multiple paths will exist between different tables. If you create a measure, calculation, or filter, unintended consequences can occur where your data is being filtered and, depending on which relationship that the Power BI engine chooses when applying the filter, the final result might be different.
52
What are modelling challenges?
When the relationships established in the data model are failing to interact with visualisation
53
What is a circular relationship?
In the world of relational databases circular references are schema structures where foreign keys relating the tables create a loop. Circular references cause special types of issues when trying to synchronize two relational database where the foreign keys are enforced This web of relationships is difficult to manage and becomes a daunting task to build visuals because it is no longer clear what relationships exist. Therefore, it is important that you are able to identify circular relationships so that your data is usable. If there are circular references ("loops") in a data structure, the tables are associated in such a way that there is more than one path of associations between two fields. This type of data structure should be avoided as much as possible, since it might lead to ambiguities in the interpretation of data.
54
What is a relational dependency?
When a field depends on other fields to function correctly | For example TotalSales depends on Quantity x Price
55
Name the types of relationship dependencies between tables?
``` you can also have dependencies between Columns Measures Tables Relationships ```
56
What data modelling capabilities exist in Power BI?
- 1) creating relationships - 2) Hierarchies - 3) Calculations
57
When is DAX used?
- As soon as the data is loaded into the data model | - DAX is PBI's native query language
58
What is a data model?
- One or more tables…potentially with relationships
59
Why are relationships important for reports?
- You must establish relationships before you use visuals and calculations that query different tables
60
What happens when queries are loaded?
- Once queries are loaded into Power BI desktop they become tables in Power BI - Tables can then be organised into different data models - Different data model types are called schemas
61
What are 3 of the most common data model types?
- Data models aka schemas - 1) Flat (fully denormalised) schema - 2) Star schema - 3) Snowflake schema
62
What are flat (fully denormalised) schemas?
- All attributes are denormalised into a single table (e.g., one table with no relationship) - Flat schemas often used when connecting to a single simple source (e.g., a table)
63
Evaluate flat schemas?
Advantages - very performant and very efficient - most cases you don't have to establish relationships (no need for keys) Negative - Single tables can be cumbersome and difficult to navigate - Columns and data can often be duplicated leading to large file size (data redundancy) - Mixing facts with different grains results in more complex DAX formulas - Avoid flat schema for complex data models
64
What are some characteristics of fact tables?
Called Fact or data tables - 1) Metrics you want to aggregate - 2) Have Foreign Keys that are required to create relationships with dimension tables
65
What are some characteristics of dimension tables?
Called Dimension or Lookup tables - 1) Contain attributes that help you slice and dice fact tables - 2) Contains a Primary Key (key column + descriptive columns)
66
How many fact tables are in a star schema?
- It is possible to have more than one fact table
67
Evaluate star schemas?
Advantages - Preferred over star schemas due to their shortcomings (e.g., cumbersome) - Fields are logically grouped so the model is easier to understand - There is less duplication resulting in more efficient table storage - You don't need to write complex DAX formulas to work with fact tables with different fact grains
68
What is a snowflake schema?
- Similar to a star schema but has an extra dimension tables that "snowflake" from other dimension tables
69
When would you use a snowflake schema?
- Use when there are fact tables with different grains
70
Can you configure all tables?
- It depends on the storage mode | - Import mode allows more configurability
71
How can you configure table properties?
- Name - Description: stored in models metadata (see it when you hover over the table) - Synonyms: can add synonyms to help Q&A recognise the table - Row label: tells which column values will serve as the row label for the table (e.g., ProductName column being the row label for the product table) - Key column: this is for primary values - Is hidden: can hide a table from fields pane - Is featured table: could be a featured table which will allow it to be used in Excel - Storage mode
72
How can you configure column properties?
- Name - Description: stored in models metadata (see it when you hover over the table) - Synonyms: can add synonyms to help Q&A recognise the column (SalesTargets may not be picked up by Q&A but Sales would) - Display folder: group columns from same table into folders - Is hidden: hide columns - Data type: different data types to Power Query (e.g., percentage, Timezone, Location) - Format: different data types have different format properties (e.g., percentage > decimal places) - Sort by column: sort names by numbers (i.e., give ordinality to nominal labels) - Data category: default is uncategorised…depending on data type (e.g., Location) you can select City, Continent etc - Summarise by: determines how column is aggregated when you put it in the visual - Is nullable: can disallow null values for a column (be careful because this can cause the refresh to fail if a null appears)
73
What does summarise by mean?
- Summarise by: determines how column is aggregated when you put it in the visual - Option depends on data type (e.g., statistical functions for numbers or counts for text) - Default is Don't Summarise/None - Power BI attempts to do this automatically but is not always correct
74
What are the troubleshoots with is nullable?
- Is nullable: can disallow null values for a column (be careful because this can cause the refresh to fail if a null appears)
75
How can you configure measures properties?
The same as column properties except - Sort by column - Summarise by - Is nullable
76
What is the difference between FORMAT in DAX and formatting a column?
- DAX creates a new column and always outputs text | - Formatting a column retains the original data type
77
What is a measure?
- A measure is a dynamic evaluation of a DAX query that will change in response to interactions with other visuals - This allows you to analyse your data further
78
What are the different ways to produce a quick measure?
- Right click on field or table - Select Quick Measure from Home ribbon - Select drop-down arrow next to the field
79
What calculations categories are available in quick measures?
- Aggregation per category - Filters - Time intelligence - Totals - Mathematical operations - Text
80
What is a troubleshoot with hidden fields?
- They will not sure up in the Quick Measure dialog box
81
What are parent-child hierarchies?
Parent-child hierarchies are defined by two columns - Node key = child only has one parent - Parent key = parent can have many children
82
What are role-playing dimensions?
- Occurs when there is more than one way to filter a fact table by a dimension (e.g., data table can be filtered by two different date columns) - You can have multiple physical relationships but only can be active (active relationship propagates filters) - The functionality of role-playing dimensions will be configured for business requirements
83
Explain autodetect?
- Autodetect automatically detects relationships once you load data - Looks for identical column names - Can go wrong so you can turn it off (Options > Current File > Data load)
84
What information is displayed by a relationship in model view?
- Arrow (cardinality: one or many) - Line (active/inactive: straight or dotted) - Cross filter (filter propagation)
85
What is the difference between Merge and establishing relationships?
- Relationships established in PBI desktop and Merge is done in Power Query - Merge can use multiple columns relationships are restricted to just one column
86
What is cardinality?
1) Define relationships - Many = key appears more than once in the column - One = key appears only once 2) - Refers to the number of distinct values in a column
87
Explain and evaluate one-to-one cardinality?
- One-to-one is a special kind of relationship that only appears once on both sides of the relationship - Useful for splitting a single dimension table with many columns into separate tables - Only use if you are confident there will be no duplicates (duplicates would cause immediate errors in your data model)
88
What is the cross-filter direction for many-to-one and one-to-many relationships?
- They can use single and both
89
What is 'Apply security filter in both directions'?
- Apply security filter in both directions enables the flow of row-level security in both directions
90
What is a common troubleshoot of cross-filter direction?
- A table may display just one value | - This is usually because the cross-filter direction is not enable both ways
91
What two things should you have in mind when performance tuning a data model?
- Business requirements: speed, security etc | - Constraints of data source: storage mode, transformations etc
92
What factors affect report performance?
- Storage mode - Relationships - Aggregations - Cardinality
93
Explain how composite mode impacts performance?
- Relationships perform differently depending on the storage mode of the related tables
94
Explain how aggregations impacts performance?
- When using DirectQuery you can import some of the summarised data - This is so some of the most frequently queried data stays in-memory - More detailed data queried from underlying source - This feature is called aggregations
95
Explain the island analogy?
- Import mode = resources on same island (i.e., it is queried from in-memory) - Direct Query = have to swim to nearby island to get resources
96
Rank the relationships from fastest to slowest?
- One-to-many intra-island relationships - Direct many-to-many relationships - Many-to-many relationships with bridge tables - Cross-island relationships
97
How does cardinality impact performance?
- Power BI imports table in columns not rows | - In general the fewer distinct values there are the better the performance
98
What are the most common reasons why many-to-many relationships occur?
- Many-to-many relationships between dimensions | - Relationships between tables at different granularities
99
Evaluate the different kinds of many-to-many relationships?
Direct many-to-many relationships - This method performs well when the number of unique values on each side of the relationship is fewer than 1000 - Cannot use the RELATED function Bridge table - More efficient method when more than 1000 unique values
100
What is a bridge table?
- A table that allows you to create one-to-many relationships with each table that is in a many-to-many relationship
101
What are the different kinds of bridge table?
- One column with unique values: bridge table on the one side of the relationship (normally used for relating fact tables with different grains) - Two column table with unique combination of values: the bridge table is on the many side of each relationship (commonly used for many-to-many relationships between dimensions)
102
Why is it best practice to create your own date table?
- Can use a calendar other than Gregorian - You can have weeks in the calendar - Filter multiple fact tables by using a single date dimension table - Can configure to organisations preference (e.g., mmm-yy)
103
How to create a date table in Power Query?
- You can use the List.Date function | - Returns a list of dates > convert list to table > add columns
104
How can you create a date table in DAX?
- CALENDARAUTO = scans your model for dates and returns an appropriate date range automatically - CALENDAR = requires you to provide start and end date
105
What is data granularity?
- Date grain refers to the level of detail a table can provide
106
Describe a common data granularity problem?
- For unsupported levels of granularity it won't return any meaningful results (will return repeat value) - For example: if you filter a date table by months instead of days - Can use ISFILTERED function
107
What is data model development?
- The enhancements you make after loading your data model and creating relationships between tables
108
What is a common troubleshoot with a bi-directional cross-filter?
- The security filter will not be applied in both direction despite it being bi-directional
109
Evaluate 'apply security filter in both directions'?
- It is by default set in one direction (security filter is passed on to filtered table) - This setting affects performance of the data model so in some cases it may be undesirable
110
What are calculate tables?
- They are defined by DAX and can only be created once data is loaded into the data model or on new data generated by DAX
111
What are the use cases for calculated tables?
- Cloning tables - Creating tables from different data sources - Pre-calculating measures to improve performance
112
How can you clone a table?
- Data View > New table > "Clone table name" = 'Existing table'
113
How can you create tables from different sources?
Use DISTINCT and UNION New table = DISTINCT( UNION( DISTINCT( Table[Column]), DISTINCT(Table2[Column]) ) )
114
Why might you create tables from different sources?
- You can extract distinct values from different tables into one calculated table (improves performance instead of being in different islands) - Use instead of bridge tables
115
Why might you precalculate measures?
- Some complex measures perform poorly - You can precalculate them in a calculated table and then create measures that aggregate the precalculated values - Usually works for additive measures - Aggregations are an example of calculated tables
116
Explain the characteristics of hierarchies?
- One column can be part of multiple hierarchies | - Only use columns in same table
117
Why might you not use hierarchies?
- You can achieve the same result by dragging multiple fields into a table - May be extra hassle to set up hierarchy
118
What main components are involved with developing data models?
- Hierarchies - Quick measures - Calculated table - Calculated column - Row-level security - Relationships (cardinality, cross-filter, aggregations and storage mode)
119
What are calculated columns?
- DAX expression similar to calculated tables | - They add columns which widen the data tables and could slow down performance of the data model
120
Name some attributes of DAX?
- It is columnal and there is no concept of a cell - To get to a specific value you must filter the column - DAX is strongly typed so it is not possible to have different data types in one column
121
Should you use a calculated column or a measure?
- Should only use calculated column when you also want to use the generated values from a calculated column as filters or categories
122
What is row-level security?
- Row-level security restricts data by filtering at the row-level depending rules defined for each user - Different report users can see different subsets of data
123
How to use row-level security?
- Define each role in Power BI desktop | - Assign individual users or Active Directory security groups to the roles in the Power BI service
124
Where does defining roles apply to?
- Only works for imported data and DirectQuery - If you connect to a Power BI dataset or an Analysis Service data model Power BI will rely on row-level security configured at the source (which you cannot override by creating roles in Power BI desktop)
125
How do you configure row-level security?
Report View > Modelling > Manage roles - Create new role (with user-friendly name) - Duplicate - Rename - Delete
126
Explain how to use DAX expressions with row-level security?
DAX: for each role you can define a DAX expression - Define a DAX expression for each role - These expressions are evaluated against each rows of the relevant table - ONLY those that are evaluated as TRUE will be visible - A single user can have multiple security roles Menu offers 3 options: - Add filter - Copy table filter from - Clear table filter - Validate expression = Verify DAX Expressions - N.b. if you duplicate a role before it is verified the table filter will not be copied over to the new role
127
Describe the application of role-level security?
- It will only be applied at query time - This means that the values are already filtered by the time they go into the data model - Measures will therefore be based on the filtered rows
128
What is dynamic row-level security?
- Allows you to show different data to different users within the same role
129
Evaluate dynamic row-level security?
- Can use a single role which is preferable to large-scale implementation where there are many report users (won't have to add or remove roles to the data model) -
130
How to use dynamic row-level security?
- USERNAME = domain and login of the user - USERPRINCIPLENAME = depending on how the active directory was set up the function usually returns the email address of the user
131
What is the Q&A feature?
- In Power BI desktop and service you can create visuals by asking specific questions
132
How to teach the Q&A feature?
- Terms not recognised appear in read - To define terms select the ones not recognised and enter a new field - This can be time consuming
133
Why would you use synonyms?
- Very good for keeping language consistent - For example a firm may say units instead of quantity - Enter singular