05 Conceptual (and logical) multidimensional modeling Flashcards

1
Q

List the main steps involved in schema design process for DWH Environments:

A
  • User requirements and OLTP Schemas are used as inputs into a Conceptual data model
    • (M E/R, YAM2, ADAPT).
  • A Logical data model follows
    • ROLAP, MOLAP, HOLAP, Star Schema, Snowflake Schema
  • Finally, a Physical data model is created.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the steps to design a Dimensional Model?

A

4 steps are need to design a Dimensional Model:

  1. Choose the business process
    • Usually done by business department, business processes are the fundamental building block of the dimensional DWH, each process equals at least one fact table.
  2. Declare the grain
    • Declare level of detail in the fact table, what does a fact table measurement represents? (a purchase, a transaction, a rental..)
  3. Identify the dimensions
    • Determine applicable dimensions, dimensions usually come from grain, originally planned 1-dimensions may be two or three at the end.
    • Reference + Ratio -> Fact
  4. Identify the facts
    • Identify facts and measures from the business process.
    • Quantity and amount as fundamental facts in transaction-oriented processes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the purpose/mission of a conceptual schema?

A
  • Multidimensional data modeling can play a key role in the design of the dimensional model.
  • Assure the flexibility and re-usability of the schema after the user requirements have changed.
  • Not assume any facts that are the results of further design steps (DB technology used).
  • Semantic relativism: model can accomodate not only one, but many different concepts.
  • Enabling the discussion between business and IT about the requirements of information analysis (example: identification and elimination of weaknesses).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Mention key characteristics of OLAP Databases:

A
  • Dimension with (complex) hierarchies.
    • Different levels correspond to different data granularities
    • Multiple hierarchies on a single dimension
    • Alternative hierarchy paths
  • Multiple cubes (cube architecture).
    • Superior cubes
    • Analysis of different facts with different set of dimensions
  • Summarizability constraints
  • Calculated measures, Ratio systems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why the ER model is not appropiate for modeling Multidimensional Databases? (MDB)

A
  • Distinct from ER, dimensional modeling divides databases into 2 types of objects: fact and dimension tables.
  • Dimension tables are de-normalized.
  • THe inherent separation of qualifying and quantifying data cannot be expressed in an E/R Model.
  • Semantincs of the complex dimension structures are to specific for E/R models.
  • ER are closely aligned with a physical implementation that reflects a relational DB approach.
  • Therefore: an enhanced ERM (Multidimensional ERM ME/R is proposed)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mention the graphical elements used to enhance the traditional ER language into a Multidimensiona ERM:

A
  • Idea: slightly enhance the ER language to ensure the flexibility and the simplicity of the ER notification, but allow the definition of hierarchies.
  • Elements added:
    • A fact relationship set
    • A dimension level set
    • A classification relationship set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the goals of the Object-Oriented Multidimensional Modeling?

(With YAM2)

A
  • Paradigm of object orientation is the fundament for software engineering.
  • The long-term use of UML as modeling and design language has been proven as highly useful.
  • As UML is widespread in science and practice, it can be assumed, that there is less adoption effort.
  • UML-based multidimensional models can be homogeneously integrated into software engineering.
  • UML schema can be enhanced easily by applying specialization techniques.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some critics and evaluations of YAM2?

A
  • YAM2 modelling captures all aspects but its criticized as too cumbersome and complex.
  • The language is a complete extension of UML for multidimensional modeling.
    • UML analytical methods can be used for analyzing purposes.
    • Avoid definitions from scratch.
    • All elements are specializations of UML constructs.
  • YAM2 fulfills almost all requirements for multi-dimensional models
    • Explicit separation of structure and contents
    • Measure sets, Multi-star schemas, many-to-many relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define ADAPT and describe its 8 basic modeling objects:

A
  • ADAPT = Application Design for Analytical Processing Technologies.
  • ADAPT enables the modeler to build manifold dimensional and hierarchical structures.
  • Basic modeling objects:
    • Hypercube (cube)
      • N-dimensional arrary, in an OLAP DB a hypercube is the basic unit of storage for business data
      • Context as a portion of a hypercube that provides a context for analysis.
    • Dimension
      • Axis or index of an hypercube
    • Hiearchy
      • Set of parent/child member combinations that define aggregation.
    • Level
      • Collection of members used to define hierarchical precedence (calendar types).
    • Model
      • Algebraic process to calculate derived data. Document the source and target data of a calculation.
    • Member
      • ​An individual dimension value (used as “examples” in models).
    • Attribute
      • Information about a dimension member. Applied to dimensions/levels (as “additional information”)
    • Scope
      • A (sub-set) collection of dimension members. Difference between calculated and enumerated scopes (examples: “current month”, “year-to-date”).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the (5) main ADAPT Connecting Elements:

A

ADAPT Connecting elements:

  • Loose precedence (there are objects not associated to the superior group)
  • Strict precedence (every object is associated to exactly one superior object)
  • Self precedence
  • Used by
  • Connector
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q
# **Define the 2 Core ADAPT Objects** and describe its graphical representation:
**Hypercube and Dimension.**
A
  • Hypercube:
    • N-dimensional array
    • In an OLAP DB, a hypercube is the basic unit of storage for business data.
  • Dimension
    • Axis or index of an hypercube
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
# Define the 2 **ADAPT Hierarchy Objects** and describe its graphical representation:
**Hierarchy and Level.**
A
  • Hierarchy
    • Set of parent/child member combinations that define aggregation.
  • Level
    • Collection of members used to define hierarchical precedence.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
# Define the 2 **ADAPT Dimension Objects** and describe its graphical representation:
**Member and Attribute.**
A
  • Member
    • An individual dimension value (“examples” in the model).
  • Attribute
    • Information about a dimension member (“additional information”).
    • Can apply to both dimension and level.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define the ADAPT Scope object and describe its graphical representation:

A
  • Scope
    • ​A collection of dimension members
    • Difference between calculated and enumerated scopes
    • Other examples: current month, year-to-date.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q
# Define the 2 **ADAPT Dimension Objects** and describe its graphical representation:
**Model and Context.**
A
  • Model
    • An algebraic process to calculate derived data.
    • Document the source and target data of a calculation.
  • Context
    • A portion of a hypercube that provides a context for analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mention the 3 strategies used to balance Response time and Resource use when doing a physical design:

A

When doing the physical design, one has to find out how the derived data is computed.

Strategies:

  • Compute all derived data before user issue any queries.
  • Use DB Server to compute all data at query time
  • First user request pays the price of computing
17
Q

What are the (3) main approaches used to implement multidimensional models:

A
  • Relational OLAP (ROLAP)
    • Servers store data in relational DBs and support extensions to SQL to efficiently implement the multidimensional data model.
  • Multidimensional OLAP (MOLAP)
    • Servers directly store multidimensional data in special data structures (e.g: arrays) and implement the OLAP operations over those data structures.
  • Hybrid OLAP (HOLAP)
    • Servers combine both technologies, benefiting from the storage capacity of ROLAP and the processing capabilities of MOLAP.
18
Q

Mention and describe the (3) types of schemas used in DWH Schemata:

A
  • Star schema
    • One central fact table
    • Dimension tables are not normalized
  • Snowflake schema
    • Several normalized tables per dimension
  • Hybrid approach
    • Duplication of attributes in dimension tables
    • Attributes of higher levels can be duplicated into lower leves (explicit redundancy)
    • Avoids query performance losses
    • Problem: large dimension tables become even larger (maintenance problems)
19
Q

Explain the Hybrid Approach in DWH Schemata:

A

Hybrid approach

  • Duplication of attributes in dimension tables
  • Attributes of higher levels can be duplicated into lower leves (explicit redundancy)
  • ​Avoids query performance losses
  • Problem: large dimension tables become even larger (maintenance problems)
20
Q

What are the key/typical characteristics of DWH Applications:

A

Typical characteristics in DWH Applications:

  • Less data volume in dimension tables than in fact tables.
  • Only rare changes in dimension tables (danger of update anomalies).
  • Typical queries contain selection predicates on fact table using one or more dimension tables.
  • Stronger focus on query performance than on avoiding redundancy and data storage.
21
Q

Mention some Pros and Cons of the Snowflake Schema:

A
  • Cons:
    • Performance is affected, since more joins need to be performed, when executing queries along hierarchy paths.
    • Benefit of normalization is quite insignificant.
    • More complicated structure than the star schema.
  • Pros:
    • Better storage and querying with sparse dimensions.
    • Reflects the way users think about data.
22
Q

How is represented the extended star schema in SAP Business Warehouse?

A
  • SAP BW automatically creates SID-Tables for dimension objects
    • easier consideration of multilingual attributes
    • easier consideration of slowly changing dimensions
    • multiple use of object master data
    • use of auto-generated keys
23
Q

For which reasons is the step of conceptual modeling often omitted in practice projects?

A
  • Time pressure
  • Trend for agile DWH development, rapid prototyping
  • Business users are not able to formulate requirements
  • Physical design already determined (vendor decision)
  • No established tools available on the market that perform a model-based DB implementation