Data Modeling Flashcards

1
Q

Consolidated Fact Tables

A

Combined Fact Tables that have a similar grain.

ex. Sales Actual & Sales Forecast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Star Schema

A

Normalized fact table surrounded by denormalized dimension tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

5 Way to handle Early Arriving Fact / Late Arriving Dimension

A
  1. Never process
  2. Send record to error log
  3. Park and retry later
  4. Insert dummy value
  5. Insert dimension and update fact later
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SCD Type 0

A

Fixed, never changes
ex. Social Security Number, Hire Date

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Slightly Ragged / Variable Depth Hierarchies

A

Doesn’t have a fixed number of levels, but range should be shallow

ex. Geography hierarchies range from 3 to 6 levels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Degenerate Dimensions

A

Dimensions that have no context except for it’s primary key

Placed in fact table, not it’s own dimension table

Ex. Invoice line item numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ragged / Variable Depth Hierarchies

A

Unknown depth

Use bridge table for every possible path to join to proper hierarchy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

SCD Type 5

A
  • The type 5 technique builds on the type 4 mini-dimension by embedding a “current profile” mini-dimension key in the base dimension that’s overwritten as a type 1 attribute
  • This approach, called type 5 because 4 + 1 equals 5, allows the currently-assigned mini-dimension attribute values to be accessed along with the base dimension’s others without linking through a fact table.

Customer Key (PK)
Customer ID (NK)
Customer Name

Currrent Profile Key (FK)

Current Profile Key (PK)
Current Age Band
Current Score
Current Income Level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Accumulating Snapshot Fact

A

Stores what happened over period of time

Ex. Sales Order that updates columns (ship date, received date) that updates columns as it progresses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Three Types of Fact Tables

A
  1. Transactions
  2. Periodic Snapshot
  3. Accumulating Snapshot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

SCD Type 3

A

Add a previous column value to the dimension table

Slowly changing dimension type 3 changes add a new attribute in the dimension to preserve the old attribute value; the new value overwrites the main attribute as in a type 1 change. This kind of type 3 change is sometimes called an alternate reality. A business user can group and filter fact data by either the current value or alternate reality. Rarely used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Multiple Hierarchies in Dimensions

A

Multiple grouping levels

Ex. Dates: year, month, day
geography: Country. region, city
category: subcategory, line item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Conceptual Data Model

A

What the system will contain
Typically created by business stakeholders
Entities, attributes and relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Logical Data Model

A

How the system is implemented
Done by architects and analysts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Date Dimensions

A

Static dimensions that are generally loaded into the system to be used to create variations of the date

Can also be boolean is_holiday, is_weekend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Outrigger Dimensions

A

Dimension table that is reference by another dimension table

Happens with SCD Type 4 (dimension -> mini dimension current)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Conformed Dimensions

A

Single dimension table used in multiple fact tables

ex. date dimension, product dimension, store dimension (address dim used for store location + where customer order shipped to)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Role-playing Dimensions

A

Dimensions that can mean different things

Ex. Date -> Ship Date, Order Date
Address -> Ship Address, Billing Address

19
Q

Factless Fact Table

A

Record of dimensional entities coming together at a point in time
Ex. student attending class on a given day. Nothing to measure

20
Q

Fixed Depth Positional Hierarchies

A

Many-to-one relationships
Ex. product -> category

21
Q

SCD Type 7

A

With type 7, the fact table contains dual foreign keys for a given dimension: a surrogate key linked to the dimension table where type 2 attributes are tracked, plus the dimension’s durable supernatural key linked to the current row in the type 2 dimension to present current attribute values.

Type 7 delivers the same functionality as type 6, but it’s accomplished via dual keys instead of physically overwriting the current attributes with type 6. Like the other hybrid approaches, the current dimension attributes should be distinctively labeled to minimize confusion.

22
Q

SCD Type 1

A

No history - overwrite value

23
Q

SCD Type 6

A

Type 6 builds on the type 2 technique by also embedding current attributes in the dimension so that fact rows can be filtered or grouped by either the type 2 value in effect when the measurement occurred or the attribute’s current value.

i.e., add new columns in the same dimension table that track the ‘current values’

Product Key (PK)
Description
Historic Department name
Current Department Name
Effective Date
Expiration Date
Current Row Indicator (True/False)

24
Q

SCD Type 2

A

Row versioning / history

Add 3 new columns - effective_date, end_date, is_current

Not suitable for rapid changing dimensions. Use Type 4

25
Q

SCD Type 4

A

Add mini-dimension

The type 4 technique is used when a group of dimension attributes are split off into a separate mini-dimension. This approach is useful when dimension attribute values are relatively volatile. Frequently-used attributes in multi-million row dimension tables are also mini-dimension design candidates, even if they don’t change frequently. A surrogate key is assigned to each unique profile or combination of attribute values in the mini-dimension. The surrogate keys of both the base dimension and mini-dimension profile are captured as foreign keys in the fact table.

Date Key (FK)
Customer Key (FK) - Goes to main customer dim table with scd’s (name, address, date of birth)
Customer Profile Key (FK) - Goes to mini-dimension dim with rapidly changing dims (age band, score, income level)

26
Q

How to deal with Nulls in Fact Tables

A

Nulls behave gracefully for aggregate functions

AVOID nulls in foreign keys to prevent referential integrity issues. Set an ‘Unknown’ value

27
Q

Junk Dimension

A

Miscellaneous, low-cardinality flags and indicators

Instead of making separate dimension tables, create a single junk dimension combing them together. Will be the the permutations of all possible values that actually occur in source data

28
Q

What is a Fact Table

A

Table that can be summed, averaged or manipulated

29
Q

Durable Key

A

Surrogate key that does not change value in SCDs

30
Q

Periodic Snapshot Fact

A

Summary of events at specific points in space/time/

Usually semi-additive or non-additive

Ex: customer bank account balance on given day

31
Q

Dimension Surrogate Key

A

Incrementing integers used as primary keys

32
Q

Steps to Create Star Schema

A
  1. Identify Business Process
  2. Identify the Grain
  3. Identity Dimensions
  4. Identify Facts
33
Q

Type of Fact Table Measurements

A
  1. Additive: Sum across any dimensions
  2. Semi-additive: Sum across any EXCEPT time (ex. headcount)
  3. Non-additive: Cannot sum (ex. % off, unit price of item)
34
Q

Natural/Business Key

A

Identifier used by the business (ex. employee number). Generally not used as primary keys as these could change (update employee numbers to new system)

35
Q

Three Levels of Data Modeling

A
  1. Conceptual: Scope
  2. Logical: Table/columns
  3. Physical: Physical structure
36
Q

Physical Data Model

A

How to implement. Done by DBA and developers

37
Q

Galaxy Schema

A

AKA Fact Constellation Schema

Multiple fact tables connected with shared (conformed) dimensions

38
Q

Conformed Fact

A

Fact used in more than one fact table

Used across multiple marts

39
Q

Snowflake Schema

A

Can have normalized dimension tables

40
Q

Transaction Fact Table

A

Facts of events at specific points in space and time

41
Q

Shrunken Rollup Dimension

A

Used to develop aggregate fact tables

Ex. Ticket sold per venue. Main dim is by venue. Want to also see by city.

Shrunken dimensions are conformed dimensions that are a subset of rows and /or columns of a base dimension. Shrunken rollup dimensions are required when constructing aggregate fact tables. They are also necessary for business processes that naturally capture data at a higher level of granularity, such as a forecast by month and brand (instead of the more atomic date and product associated with sales data). Another case of conformed dimension subsetting occurs when two dimensions are at the same level of detail, but one represents only a subset of rows.

42
Q

Centipede Fact Tables

A

Multiple foreign keys in a fact table for many-to-one hierarchy dimensions

Some designers create separate normalized dimensions for each level of a many-to- one hierarchy, such as a date dimension, month dimension, quarter dimension, and year dimension, and then include all these foreign keys in a fact table. This results in a centipede fact table with dozens of hierarchically related dimensions. Centipede fact tables should be avoided.

Centipede fact tables also result when designers embed numerous foreign keys to individual low-cardinality dimension tables rather than creating a junk dimension.

43
Q

Dimension Tables

A

Wide, low-cardinality, denormalized tables