Data Warehouses and Dimensional Modelling Flashcards
(34 cards)
Main goal of DW and Dimensional Modelling
Simplicity.
Ensures that users can easily understand databases.
Allows software to efficiently navigate databases.
Dimensional Modelling (DM) Definition
A logical design technique for structuring data so that it’s intuitive to business users and delivers fast query performance.
Three primary design goals
- Present the required information to users as simply as possible (understandability).
- Return query results to the users as quickly as possible (query performance)
- Provide relevant information that accurately tracks the underlying business processes
Present the required information to users as simply as possible (understandability).
- Make everything as simple as possible, but not simpler. (Einstein)
- DMs are much easier for users to understand - information is grouped into coherent business categories.
- DMs typically contain exactly the same content as a normalized model, but with far fewer tables.
- Simplicity is relative - the model must reflect complex business processes.
Return query results to the users as quickly as possible (query performance)
- Relational environment:
- Help query performance because of denormalization.
- Pre-join hierarchies and lookup tables - fewer join paths and less immediate temporary tables.
- Predictable framework allows the DBMS to make strong assumptions which aid in performance. - OLAP environment:
- Engine designed to support DMs
- Performance due to aggregation within and across dimensions.
Provide relevant information that accurately tracks the underlying business processes
Requires full range of design patterns to create models that accurately capture and track the business.
Benefits of Dimensional Modelling
- Understandability
- Query performance
- Each dimension is an equivalent entry point into the fact table. Therefore, query performance is simple, predictable and controllable.
- Graceful extensibility to accommodate new data. No query or BI application needs to be reprogrammed to accommodate changes.
Star Join
Each business process can be represented by a DM that consists of a normalized fact table surrounded by denormalized dimension tables.
Fact table
- Highly normalized, storing little redundant data.
- Contains the measurements associated with a specific business process.
- Record in fact table = measurement of a business event.
- Primary key is a multi-part/composite key made up of a subset of foreign keys from business process dimensions.
- Fact tables always express many to many relationships between dimensions.
- Every foreign key must match a unique primary key in the corresponding dimension. - Some business processes track events that do not contain facts. Then a factless fact table is created. Add a dummy counter variable with value 1.
Facts
Numeric values (typically continuous) that quantify the magnitude of the event.
Three types of facts
- Numeric and additive
- Can meaningfully be aggregated across dimensions.
- Most useful facts. - Semi-additive
- Can be meaningfully aggregated across some dimensions.
- Can typically not be aggregated across time dimensions. - Non-additive
- Cannot be meaningfully aggregated across any dimensions.
- Stored in dimension tables
When do facts conform?
If their definitions are the same in different fact tables
What is the grain of a fact table?
The level of detail contained in a fact table.
Lowest level of detail that is possible
Atomic level.
Complete flexibility to roll up to any summary level.
Three fundamental grains
- Transaction fact table
- Track each transaction as it occurs at a point in time
- One record per transaction/line of transaction - Periodic fact table
- Captures cumulative performance over specific time intervals.
- One record per group of transactions made over a period of time.
- Complements detailed transaction facts. - Accumulating snapshot fact table
- Constantly updated over time
- One record for the entire lifetime of an event.
- Used to combine data across several business processes in a value chain.
Dimension Tables
- The nouns of the dimensional model.
- Describes the objects that participate in the business.
- Spot dimensions and/or their attributes in conversations.
- Provide entry points into the data.
- Has single surrogate primary key.
- Conformed dimension: single dimension that is shared by two or more business processes.
- Embedded hierarchy: some dimension attributes relate to each other in a hierarchical or 1:M fashion.
- Relational transaction systems designed using normalization.
- Each row has a set of attributes that provides context which is true the moment the fact table is recorded.
- Dimension attributes are typically textual fields.
What is denormalization?
Process of recombining the attributes into a single dimension.
Nothing is lost. It presents exactly the same information.
What are conformed dimensions?
Two dimensions are conformed if they contain one or more fields with the same names and contents.
Two types:
- Identical conformed dimensions.
- One dimension is a perfect subset of a more detailed granular dimension table.
Benefits of conformed dimensions
- Consistency
- Ensure that every fact table is filtered consistently and the resulting query answer sets are labeled consistently. - Integration
- Allows queries to drill across fact tables representing different processes (supports true cross-business process analysis) - Reduced development time to market
- Do not need to recreate the wheel over and over again.
What is a Bus Matrix?
It is a representation of conformed dimensions’ involvement/participation in multiple business processes.
Each row represents a business process and defines at least one fact table and its associated dimensions.
What do you call it when one business process has more than one DM
Business process dimensional model
What does it mean to “Drill Across”?
Analysis involving data from more than one business process.
What are Surrogate Keys?
- New set of keys, separate from keys in source system.
- Also known as meaningless keys, substitute keys, non-natural keys or artificial keys.
- Unique value, usually an integer, assigned to each row in a dimension.
- Becomes a primary key of dimension table used to join with associated foreign key in fact table.
Benefits of surrogate keys
- Helps protect DW from unexpected administrative changes in source system keys (called business keys or natural keys)
- Allow the DW to integrate the same data with different keys from multiple source systems.
- Enable you to add rows to dimensions that do not exist in the source system.
- Provide the means for tracking changes in dimension attributes over time.
- Integer surrogate keys can improve query and processing performance.