Class 2 - Chapter 3 Flashcards
(50 cards)
Characteristics of Data Warehouse
Subject-oriented
Focused on business related subjects and organizational activity
Ex: Customers, employees, products, suppliers
Integrated
Data from different sources are stored in a consistent format
Requires addressing naming conflicts, unit discrepancies
Time-variant
Maintains historical data
Each source may contain data at different time points –day, week, month etc. Used to analyze trends.
Nonvolatile
Users cannot change data after it is entered. It is periodically updated.
Structured for query and analysis
3 different data Warehouse Architectures
independant Data Mart- each data mart has own
ETL process. not centralized storage location. stored in different datamarts, users have access to all data
Dependenat Data Mart and Operational Data Store
ETL process where data is parititoned, same format, standardization because loaded from same systems.
Logical Dat Mart and Real-Time Data Warehouse. real time update of the data. you get data from sources and save in data warehouse, different sections for different data mart. It’s not actual data but a view. more of a dashboard thing.
OLAP
multidimensional view of the data
What is aggregation?
is like the SQL “Group BY’ summing up a 2 dimension view. so if you had 3 dimensions, you aggregate 1 dimension to make 2 dimension.
slice
fix one attribute of one dimension. filter male. fix one and becomes 2 dimensional view.
dice
take subset of the data. you have cube that you slice and create a subcube.
drill down
data has heirarchies, you have say date dimension. you can drill down to week.
rollup
you aggregate
foru step dimensional modeling process.
Step 1?
select business process. talk to users. sequence of tasks to create outputs from input. production process, procurement process.
foru step dimensional modeling process.
Step 12?
Declare
Business process?
Business Process
What is a business process?
Sequence of tasks or activities that take a set of inputs and convert them into desired output.
Procurement, Fulfillment, Production, etc.
Common characteristics
Action verbs
Supported by IT systems
Input/Output/Metrics/KPI
Examples: Retail -> sales, Healthcare -> treatment
. Interview the actors. operations person, etc.
How to select Business Process?
QUICK WINS
Activities that give the best return based on the effort. They are valuable and fundamental to your success.
MAJOR PROJECTS
Activities what provide long term returns but may be more complex to execute.
FILL-INS
Unimportant activities including everyday tasks.
TIME WASTERS
These activities are time-consuming and require resources that are better spent on other things
Step 2: Declare the Grain
Grain: Level of detail associated with fact table measurements
What level of granularity should be used?
“How do you describe each row in a fact table?”
Note: Include lowest level (atomic) of detail as it can be constrained and rolled up in every possible way.
Step 2: Declare the Grain (Contd.)
Examples
One row per scan of individual product on customer sales transaction
One row per product under promotion per store each day.
One row per bank account each month.
One row per individual boarding pass scanned at an airport gate
Insurance?
Transportation? Ride Share Service?
Education?
fact table
has all the measures, the grain.
Step 3: Identify Dimensions
Context of measurements
Who, what, why, where, when, how
Data types – fact vs dimensions
Natural Key
– System generated operational identifier.
Step 4: Identify Facts
What is the process measuring?
Key performance metrics
Typical facts are numeric additive figures
Business requirements vs source data
Examples? Quantity sold, current balance, unit price.
Note: facts in a design must be true to the grain defines in Step 2.
Fully Additive facts:
Measures can be summed across any dimensions Examples?
Sales, Cost
Semi Additive Facts
Semi-Additive: Measures can be summed across some dimensions, but not all. Examples?
Balance amount across time dimension
grain
level of detail you want to capture, should be same across all rows in fact table.
Non Additive Facts
Non-Additive: Measures cannot be summed across any dimension. Example?
Ratios, unit price, salary,
Dimensional table
doesnt usually have facts, more like product description, location, when, where purchased. connected to fact table using pk/fk. fact table has fk of other tables.
Wall mart example:
What are the data captured by Walmart retail stores? Point of Sale data: Sales Transactions Products Sold Promotions Customer Goods delivery data: Vendor Products Time of delivery Product return data: Products returned Damage/exchange