Class 2 - Chapter 3 Flashcards

(50 cards)

1
Q

Characteristics of Data Warehouse

A

Subject-oriented
Focused on business related subjects and organizational activity
Ex: Customers, employees, products, suppliers
Integrated
Data from different sources are stored in a consistent format
Requires addressing naming conflicts, unit discrepancies
Time-variant
Maintains historical data
Each source may contain data at different time points –day, week, month etc. Used to analyze trends.
Nonvolatile
Users cannot change data after it is entered. It is periodically updated.
Structured for query and analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

3 different data Warehouse Architectures

A

independant Data Mart- each data mart has own
ETL process. not centralized storage location. stored in different datamarts, users have access to all data

Dependenat Data Mart and Operational Data Store
ETL process where data is parititoned, same format, standardization because loaded from same systems.

Logical Dat Mart and Real-Time Data Warehouse. real time update of the data. you get data from sources and save in data warehouse, different sections for different data mart. It’s not actual data but a view. more of a dashboard thing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

OLAP

A

multidimensional view of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is aggregation?

A

is like the SQL “Group BY’ summing up a 2 dimension view. so if you had 3 dimensions, you aggregate 1 dimension to make 2 dimension.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

slice

A

fix one attribute of one dimension. filter male. fix one and becomes 2 dimensional view.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

dice

A

take subset of the data. you have cube that you slice and create a subcube.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

drill down

A

data has heirarchies, you have say date dimension. you can drill down to week.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

rollup

A

you aggregate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

foru step dimensional modeling process.

Step 1?

A

select business process. talk to users. sequence of tasks to create outputs from input. production process, procurement process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

foru step dimensional modeling process.

Step 12?

A

Declare

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Business process?

A

Business Process
What is a business process?
Sequence of tasks or activities that take a set of inputs and convert them into desired output.
Procurement, Fulfillment, Production, etc.
Common characteristics
Action verbs
Supported by IT systems
Input/Output/Metrics/KPI
Examples: Retail -> sales, Healthcare -> treatment
. Interview the actors. operations person, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to select Business Process?

A

QUICK WINS
Activities that give the best return based on the effort. They are valuable and fundamental to your success.
MAJOR PROJECTS
Activities what provide long term returns but may be more complex to execute.
FILL-INS
Unimportant activities including everyday tasks.
TIME WASTERS
These activities are time-consuming and require resources that are better spent on other things

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Step 2: Declare the Grain

A

Grain: Level of detail associated with fact table measurements
What level of granularity should be used?
“How do you describe each row in a fact table?”

Note: Include lowest level (atomic) of detail as it can be constrained and rolled up in every possible way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Step 2: Declare the Grain (Contd.)

A

Examples
One row per scan of individual product on customer sales transaction
One row per product under promotion per store each day.
One row per bank account each month.
One row per individual boarding pass scanned at an airport gate
Insurance?
Transportation? Ride Share Service?
Education?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

fact table

A

has all the measures, the grain.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Step 3: Identify Dimensions

A

Context of measurements
Who, what, why, where, when, how
Data types – fact vs dimensions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Natural Key

A

– System generated operational identifier.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Step 4: Identify Facts

A

What is the process measuring?
Key performance metrics
Typical facts are numeric additive figures
Business requirements vs source data
Examples? Quantity sold, current balance, unit price.

Note: facts in a design must be true to the grain defines in Step 2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Fully Additive facts:

A

Measures can be summed across any dimensions Examples?

Sales, Cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Semi Additive Facts

A

Semi-Additive: Measures can be summed across some dimensions, but not all. Examples?
Balance amount across time dimension

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

grain

A

level of detail you want to capture, should be same across all rows in fact table.

22
Q

Non Additive Facts

A

Non-Additive: Measures cannot be summed across any dimension. Example?
Ratios, unit price, salary,

23
Q

Dimensional table

A

doesnt usually have facts, more like product description, location, when, where purchased. connected to fact table using pk/fk. fact table has fk of other tables.

24
Q

Wall mart example:

A
What are the data captured by Walmart retail stores?
Point of Sale data: Sales Transactions
Products Sold
Promotions
Customer
Goods delivery data:
Vendor
Products
Time of delivery
Product return data:
Products returned
Damage/exchange
25
Retail: Step 1 Select Business Process
What are the business requirements? Logistics of ordering, stocking, and selling products while maximizing profit. Start with most critical process as well as the most feasible. Process: POS Retail Sales Transactions Process Which products are selling? in which store? on which days? under which promotion? These will be fore dimension tables
26
Retail: Step 2 Declare Grain
“What level of data detail should be made available in a dimensional model?” Use most detailed atomic information provides maximum analytical flexibility because it can be rolled up in every single way possible. In many situations users are not given access to atomic data which reduces its capability. Most granular data in retail -> POS transactions at product level.
27
Retail: Step 3 Identify Dimensions. | Typically what are the dimensions for a retail store?
``` Typically what are the dimensions for a retail store? Product Store Date/Time Payment Method Promotion Cashier ```
28
Retail: Step 4 Identify Facts
``` “Facts must be true to grain” Quantity Regular unit price Discount unit price Net unit price Extended discount price Extended Sales dollar amount Extended cost dollar amount Extended gross profit ```
29
what is retail dimension table
dimensions: date, payment method, product, promotion, cashier . in star in center is Retail Sales Facts
30
Modeling Time Dimension
Is date dimension needed? Almost all businesses capture time series of performance metrics. It is frequently used for partitioning data. Unlike other dimensions you can build the date dimension in advance. There are many date attributes not supported by the SQL date function such as week numbers, fiscal periods, seasons, holidays, and weekends.
31
Is date dimension needed?
Yes, mostly. Date dimension that can be created before you create other tables.Its standardized on the different databases. You need to standardize date format. Traverse over date dimensin. can capture time series of perfrmance metrics. you can partition the data.
32
what is Date Dimension
``` What should date dimension capture? Day, Month, Year Weekday, Weekend Day of Week Day of Month Calendar vs Fiscal ``` Date vs. Time of Day dimensions Date -> 20yrs -> 7300 approx. Time of Day -> 20 x 365 x 24 x 60 = 10,512,000
33
what is Product Dimension
Product dimension contains every single SKU (Stock Keeping Unit) Hierarchy: Product
34
Numeric Values as Attributes or Facts | Should Standard List Price be in Product dimension or in the fact table?
Standard List Price change infrequently. Standard List Price is non-additive, however, if multiplied by quantity, it gives sales amount which is additive. Note: If the numeric value is used primarily for calculation purpose, it likely belongs to the fact table. If the stable numeric value is used predominantly for filtering and grouping, it should be treated as a dimensional attribute.
35
what is Store Dimension
Describes every store Primary geographic dimension Geographic Hierarchy Internal District Hierarchy
36
what is Promotion Dimension
Most interesting dimension, often called causal dimension Promotion conditions: price reduction, end aisle display, news paper ad, coupons Whether a promotion is effective? Need to capture each combination of promotion Modeled as separate dimensions or combined Null values in fact or dimension tables
37
what is causal dimension
Most interesting dimension, often called causal dimension | Promotion conditions: price reduction, end aisle display, news paper ad, coupons
38
what is Schema Extensibility
Example: Retailer adds frequent shopper program – how would this be supported? Add new dimension table, surrogate keys: Prior to Frequent shopper Program, Frequent shopper Not Identified ``` Modeling at most granular level allows extensibility Extension schemas by adding New dimensions New dimension attributes New measured facts ```
39
What are Factless Fact Table
Some business processes track events without any real measures. If the event happens, we get an entry in the system; if not, there is no row. Example: Employee hiring and firing, student attending a class. The fact tables that track these events typically do not have any actual fact measurements, so they are called factless fact tables. We usually add a column called EventCount containing number 1 to count the number of events by summing the EventCount fact.
40
Do you need a Promotion Dimension
Yes, because in product dimension you would have | a lot of rows with NULL values. How do you connect with Fact Table.
41
DD
attribute that doesnt have it's own dimension. POST Transaction# . generally system generated values. can be used as serigate key.
42
Factless Fact Tables – Promotion Coverage
What products were on promotion but did not sell? Add a Promotion Coverage fact table to include product, date, store and promotion attributes for products which are under promotion.
43
Factless Fact Tables – Promotion Coverage
The Promotion Coverage fact table is a factless fact table. Each row of this table shows the product on promotion regardless of whether the product sold. The grain is different from fact table. not just what was sold but was on promotion. Question: Find the products which were on promotion but not sold. Step1: Query the Promotion Coverage table to identify the products under promotion on a given day. Step 2: Determine what products sold from POS sales fact table on that day. Step 3: Find the difference between above two lists of products.
44
Fact and Dimension Table Keys
Use surrogate keys as primary keys Integer keys, sequential Natural key: operational system identifier -> store number, time stamp
45
What are advantages of surrogate keys
``` Advantages of surrogate keys Buffer from operational changes Integrate multiple source systems Improve performance Handle null or unknown conditions Dimension attribute change tracking ```
46
snowflake schema
hierarchy. you are normalizing the tables. Advantages of Normalization of DimTables Saves space by reducing redundancy Need to update only once. Disadvantages of normalization of DimTables Model becomes complex Numerous joins make query slow
47
slicing query example:
-Slicing: This query generates a two dimensional slice by fixing (filtering) one of the dimension value. SELECT CustomerID, ProductID, StoreID, Quantity FROM FactPurchase_t WHERE CustomerID=1;
48
dicing query example:
--Dicing: This query selects a subset over all dimensions. SELECT CustomerID, ProductID, StoreID, Quantity FROM FactPurchase_t WHERE CustomerID=1 and ProductID=11 and StoreID=101;
49
all types rollup. cube
[dbo].[DimCustomer_t]
50
grouping sets
instead of union, to merge into reports -The GROUPING SETS operator generates a result set equivalent to that generated by a UNION ALL of multiple simple GROUP BY clauses. SELECT CustomerID, ProductID, StoreID, SUM(Quantity) AS TotalCount FROM FactPurchase_t GROUP BY GROUPING SETS(CustomerID, ProductID, StoreID);