Data Warehouse (2) Flashcards

Part 2

1
Q

Explain the Data Warehouse characteristics

A
  • supports Decision Support data storage
  • includes data extraction and integration filter
  • conform to uniform structures & formats to avoid data conflicts
  • supports decision making
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Multidimensional Data Analysis?

A
  • processing of data to view as part of Multidimensional Structure
  • data related in different ways
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the OLAP client/server components?

A

OLAP GUI

  • in end user’s computer
  • client side
  • main contact between system and end user
  • provides interface allowing end user query DW contents

Analytical Processing Logic (APL)

  • client side (speed) server side (better admin)
  • performs complex transformations required for business data analysis e.g aggregation, MDA

Data Processing Logic (DPL)

  • server level
  • presents data analysis requests to proper data objects in DW
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is MOLAP

A

Multidimensional Online Analytical Processing

  • has OLAP functionality using MDBMS
  • functionality: stores and analyzed Multidimensional data
  • has special techniques to store data in matrix-like arrays of n-dimensions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Checklist when selecting either using ROLAP or MOLAP

A
  • implementation price
  • supported hard/soft
  • hard/soft and DBMS compatibility
  • performance
  • availability, extent, type of admin tools
  • support DB schema
  • handle current & projected DB size
  • DB architecture
  • Resources
  • Scalability
  • flexibility
  • cost of ownership
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Explain Multidimensional cubes and purpose of slice and dice

A

Multidimensional analysis focus on specific slices of the cube. E.g product manager, product dimension.

Intersection of slices create smaller cubes allows dicing of multidimensional cube. Therefore, examining cubes allow precise analysis of variable components

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Common performance improvement for Star Schema

A

Normalized Dimensional Tables

  • semantic simplicity
  • ease end user navigation through dimensions
  • e.g a dimension can be simplified in 3NF using data filtering operations

Creating & Maintaining Multiple Fact Tables

  • speeds query operations
  • save processor cycles at run time = speeds analysis
  • e.g region, city, state is pre-computed at data loading phase at run-time

Demoralising Fact Tables

  • improves data access performance = save data storage
  • stores in one record what usually takes many records in different tables
  • e.g computing total sales in all region required access to Region Sales aggregates for summarizing. May have 300,000 records which means summarizing 300,000. Denormalized tables is better for aggregation.
YEAR_TOTAL
YEAR_ID
MONTH_1
MONTH_2
MONTH_3

Table Partitioning & Replication

  • required when DSS is implemented in dispersed geographic areas
  • Partitioning: splits a table into subsets of rows and columns which is placed in or near the client computer to improve data
  • Replication: makes a copy of a table and places it in a different location for same reason
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Data Mining?

A
  • analyzes large sets of data
  • uncovers hidden trends, patterns and relationships
  • forms computer models to simulate and explain the results and use the models to support decision making
  • can also be the basis to create advanced predictive data models
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 phases of Data Mining?

A

DATA PREPARATION

  • data sets used by data mining operation are identified
  • data is cleansed from any impurities

DATA ANALYSIS and CLASSIFICATION
- studies data to identify common data characteristics or patterns
- data mining tool applies specific algorithms to find:
• groupings, classifications, sequences
• dependencies, links, relationships
• patterns, trends, deviations

KNOWLEDGE ACQUISITION
- uses results of previous phase
- data mining tool selects the appropriate modelling or knowledge acquisition algorithms
- typical algorithms used are based on:
  • neural networks
  • decision trees 
  • rules induction 
  • genetic algorithms 
  • regression trees
  • memory - based reasoning
PROGNOSIS PHASE
- data mining findings used to predict future behaviour and forecast business outcomes
- e.g giving % of chances
- findings presented in:
  • decision tree
  • neural net
  • forecasting model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly