Schemas (Data Models) Flashcards

1
Q

5 steps in the data model design process

A
  1. Connect to Data Sources
  2. Prepare and Transform Data
  3. Configure Table and Column Properties
  4. Create Relationships in model
  5. Create Columns and Measures using DAX
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a schema?

A
  1. A structure that defines the organization and relationships of tables within a dataset
  2. A logical framework of how the data is organized and connected
  3. Defines the rules that govern the relationships

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Benefits of an Optimized Data Model

A
  1. Improve performance significantly
  2. Easier Aggregation of structured data in a data model due to clear relationships and hierarchies
  3. Facilitates complex measures and and predictive analysis
  4. Improved Accuracy and Reliability of Reports
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the benefits of creating an appropriately designed schema?

A

Data
1. Data Structure Definition
2. Faster Data Exploration

Analysis
3. Efficient analysis
4. Assists with visualizations
5. Meaningful Insights
6. Easier aggregation

Reporting
7. Precise reporting
8. Quicker report creation
9. Simpler report maintenance

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the three types of schemas?

A
  1. Flat Schema
  2. Star Schema
  3. Snowflake Schema

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a flat schema?

A

A flat schema stores all data in a single table

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are three benefits of a flat schema?

A
  1. Easy to retrieve data from
  2. Less complex to analyze
  3. Simpler way to visualize data

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Disadvantages of flat schema

A
  1. Typically has a large dataset
  2. More suited to smaller datasets
  3. Can be slow to query
  4. Can be difficult to maintain
  5. Leads to data redundancy and data inconsistency
  6. Doesn’t allow for complex datasets that require more flexibility and detail

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a star schema?

A
  1. Allows for multiple tables to be connected through one central table
  2. A central fact table connects to multiple dimension tables

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Example of a Star Schema

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the advantages of a star schema?

A
  1. Reduces data redundancy
  2. Boosts query performance
  3. Easy to understand
  4. The tables you have to navigate through to answer questions are minimal

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the disadvantages of a star schema?

A
  1. Lacks flexibility
    Adding or modifying tables can require extensive changes to the schema
  2. Can struggle to manage complex relationships

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a snowflake schma?

A
  1. An extension of the star schema
  2. It breaks down the dimension tables into multiple related tables
  3. Can take advantage of an existing star schema and normalize it to create a snowflake schema

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Example of extending an existing star schema to make it a snowflake schema

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the advantages of a snowflake schema?

A

General
- It provides for more efficient storage and retrieval
- It improves data integrity and consistency
- Eliminates data redundancy
- Offers scalability and flexibility by integrating new tables as required
- Less storage requirements

Analysis
- Allows for more granularity
- Improves data governance
- Easier to identify patterns and relationships

Dimension Hierarchies
- More flexibility with dimension hierarchies (think product to sub category to category)
- Facilitates sub dimensions within hierarchies

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the disadvantages of a snowflake schema?

A
  1. More difficult to analyze because of the extra relationships
  2. Can be more challenging to understand and manage
  3. Slower queries

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do you validate a schema?

A
  1. Make sure each column has the correct data type
  2. Make sure each column has the correct formatting
  3. Make sure all columns have clear descriptions
  4. Make sure all table and column properties are correctly configured

Data Modeling in Power BI
Intro to Data Models Week 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is an important thing to do around summarization when setting up columns?

A
  • Set the default summarization for the column. This could be sum, count, avg, etc,
    OR
  • You can choose DON’t SUMMARIZE.

Setting the default summarize behavior for the column ensures that when you bring the column into a visualization that automatically aggregates, it will apply the action you want.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What settings are important to set up for a column when working on the data model/schema?

A
  1. Data Type
  2. Format
  3. Summarization (default you want)
  4. Data Category
  5. Sort by Column
  6. Data Groups

Data Modeling in Power BI
Intro to Data Models Week 1

20
Q

What does Sort Order on a column do?

A

Specifies the default sorting of a columns values such as:
- Ascending
- Descending
- Custom

Data Modeling in Power BI
Intro to Data Models Week 1

21
Q

What is the merge function the same as in database language?

A

A Join
Merging in PowerBI joins two data sources based on a common key.

Writing a long sentence so it will left justify

Data Modeling in Power BI
Intro to Data Models Week 1

22
Q

How are the columns from the table you merged with represented right after the merge?

A
  • They are all in one column where every value says Table.
  • You have to expand the column and choose which columns from the table you merged to that you want to include.

Data Modeling in Power BI
Intro to Data Models Week 1

23
Q

How do you know if an aggregation default has been applied to a column just by looking at the column list?

A
  • There will be a ∑ in front of the column
  • If there is no ∑ then the default is DON’t SUMMARIZE for that column

Data Modeling in Power BI
Intro to Data Models Week 1

24
Q

What do fact tables hold?

A
  • Quantifiable, measurable data about a business
  • All foreign keys of the dimension tables
  • Facts or measures (a property on which calculations can be made) such as quantity sold, amount sold and average sales.
25
Q

What do dimension tables hold?

A
  • Description information for all the measurements in the fact table
  • Descriptive attributes related to fact data
    They offer the context surrounding a business event
26
Q

What is normalization?

A
  • A data model design technique that involves structuring data to minimize redundancy to ensure data integrity
  • It divides data into multiple related tables, each with a specific purpose
  • Reduces data duplication
27
Q

Why might you normalize data?

A

To reduce data redundancy

28
Q

What are some advantages of normalization?

A
  • Remove data redundancy
  • Improve data integrity
  • Easily maintain your data model
29
Q

What is denormalization?

A
  • Converting the normalized schema into a schema that has redundant information.
  • Implementing denormalization helps to avoid expensive queries between the tables but at the cost of creating redundant data
30
Q

What are the benefits of denormalization?

A
  • Instead of loading multiple tables you load one large table, which may be more efficient in performance
  • With no relationships between tables, queries to join the data are reduced
  • Filters amongst tables may be less efficient than filters applied to a single table
  • Establishing a hierarchy from fields within a table is simpler
31
Q

Does normalization or denormalization optimize memory?

A

Normalization optimizes memory use since duplicate data is eliminated

32
Q

What is an advantage of normalization over denormalization?

A
  • Data integrity is maintained during the normalization process
  • It is harder to maintain with a denormalized model.
33
Q

What factors should you take into account when choosing normalization or denormalization?

A

What strikes the right balance between:
- Data integrity
- Query performance
- Ease of maintenance.

There is no one-size-fits-all solution, and the choice depends on your project’s specific business and analytical requirements.

34
Q

What is cardinality?

A
  • Cardinality refers to the nature of the relationship between two datasets.
  • In other words, how tables in your database relate to each other
35
Q

What happens if cardinality is not configured correctly?

A

This can lead to:
- innacurate data analysis
- flawed business decisions

36
Q

What are the three types of cardinalities?

A
  1. one-to-one
  2. one-to-many
  3. many-to-many
37
Q

Why is assessing the granularity of data important?

A
  • It is crucial for establishing the correct cardinality
  • It influences how you set up your cross-filter direction in Power BI
38
Q

What can happen if you incorrectly assess granularity?

A

It can lead to:
- Misrepresented data
- Incorrect business insights

39
Q

What does cross-Filter direction in Power BI determine?

A

The direction of automatic filtering between tables

40
Q

How are Power BI relationships different than other database management systems?

A
  • They are directional in nature
  • The direction significantly impacts how filtering operates
41
Q

How does a single direction cross-filter work?

A
  • A filter applied to table B will affect table A, but not vice versa.
  • For example, filtering on a dimension table filters the fact table it has a single direction cross-filter to. However, filtering on the fact table will not filter the dimension table
42
Q

How does a bi-directional cross-filter work?

A

Bi-directional cross-filters
- Sales (Fact table) to Product (Dimension table)
- Sales (Fact table) to Salesperson (Dimension table)

Number of products sold by each salesperson
1. You apply a filter on Salesperson dimension table

  1. This automatically filters the Sales fact table since there is a bi-directional cross-filter between Sales fact table and Salesperson dimension table
  2. Then this automatically filters the Product dimension table since there is a bi-directional cross-filter between Sales fact table and Product dimension table
43
Q

Disadvantages of bi-directional cross-filter direction

A
  • Can negatively impact performance
  • Can result in ambiguous filter propagation (spreading throughout the model) paths
44
Q

What are the steps for setting up a star schema in Power BI

A
  1. Disable auto detect
  2. Load Fact and Dimension tables
  3. Create relationships
  4. Set cardinality
  5. Set cross-filter direction
45
Q

How can you normalize tables?

A

Create calculated tables using DAX