Article - Multidimensional Database Technology Flashcards

1
Q

When is a multidimensional data model better suitable than a relational data model?

A
  • When the objective is to analyze data rather than to perform online transactions.
  • Key factor in the interactive analysis of large amounts of data for decision-making purposes.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How is data categorized in multidimensional models?

A
  • As facts with associated numerical measures
    Fact: a purchase | Measure: purchase amount/price
  • As textual dimensions that characterize the facts
    Fact: A puchase | Textual dimensions: Purchase time & location
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What three application areas are there for multidimensional data models?

*Means: In which processes are they used?

A
  1. Data warehouses
    Large repositories that integrate data from several sources in an enterprise for analysis
  2. Online analytical processing (OLAP)
    Finding overall trends by aggregating data search queries
  3. Data mining applications
    To discover knowledge by searching for unknown patterns and relationships in multidimensional databases
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a pivot table?

A

A two-dimensional spreadsheet with associated subtotals and totals that supports viewing more complex data by nesting several dimensions on the x- or y-axis and displaying data on multiple pages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are spreadsheets not adequate for managing and storing multidimensional data?

A

They tie data storage to tightly to the presentation - they do not seperate the structural information from the desired views of the information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why are relational databases and SQL databases not adequate for managing and storing multidimensional data?

A
  • The use of aggregation functions can be difficult

- It is harder to combine all the dimensions you are interested in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a multidimensional database/data warehouse built up?

A

A collection of related cubes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why can cubes easily manage the addition of new dimension values?

A

Because dimensions in a cube are first-class, built-in concepts with associated domains

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How many dimensions can a cube have?

A

Unlimited, but current tools start having performance problems with more than 10 dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When do cubes become more sparse?

A

As there are more dimensions added to the cube and when the level of detail (granularity) becomes finer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How many dimensions can you view of a cube?

A

Generally, only two or three. But it can show up to 4 low-cardinality dimensions by nesting one dimension within another on the axis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is an important use of dimensions in a multidimensional database?

A

To use dimensions to provide as much context as possible for facts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why are data redundancies sometimes okay in multidimensional databases?

A

It is okay if it increases the data’s information value.

-> Since the data is derived from other sources, and not born into the multidimensional database, redundancy problems related to updates can be managed more readily

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Where is in general redundancy in a multidimensional database?

A

In the dimensions, not the facts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the relation of hierarchy and dimension?

A

Dimensions are used for selecting and aggregating data at the desired level of detail. These levels of details are part of the hierarhy (time -> year, month, day).

-> Sometimes define multiple hierarchies for a dimension (fiscal year and calender year)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you avoid duplicate definitions when defining multiple hierarchies for dimension?

A

The metadata of a cube defines the dimension hierarchy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How can you avoid adding dimensions to a cube if you want to add extra information?

A

Add a property that you initially wanted as a dimension as a level under an existing dimension:

Dimension: product –> you add the level of package size instead of making package-size an dimension.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What kind of ordering is available in a multi-dimensional database?

A
  • Typically no ordering
  • Only ordering is by hierarchy
  • Ordering can happen by cumalative information, such as total sales to date.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are facts?

A

The representation of the subject for a specific combination of dimensions

20
Q

How is the granularity of a fact determined?

A

By the levels from which its combination of dimension values is drawn.

21
Q

Which three types of facts are commonly included in data warehouses?

A
  1. Events
    sales, clicks on a page, movement of goods
  2. Snapshots
    An entity at a given point in time.
    Warehouse inventory levels at a time
  3. Cumulative snapshots
    Information about an activity up to a certain moment.
    Total sales up to and including the current month

–> Because these 3 are complementary, a database often has them all.

22
Q

Of which two components does a measure consist?

A
  1. A facts numerical property (sales price, profit)

2. A formula that can combine several measure values into one (aggregation function sum)

23
Q

What is a measure?

A

A representation of the properties of the fact that the user wants to optimize.

-> Takes on different values for various dimension combinations.

24
Q

Which three classes of measures behave differently in computations?

A
  1. Additive
  2. Semiadditive
  3. Non-additive
25
Q

How do additive measures behave in computations?

A

They can be meaningfully combined along any dimension, as it does not cause an overlap among the real-world phenomena.
(to add total sales for product, location and time)

26
Q

How do semiadditive measures behave in computations?

A

They can not be combined along one or more dimensions.

  • > You can sum up inventory across products and warehouses.
  • > you can not sum up inventory levels across time, cause you will count inventory several times then.
27
Q

How do nonadditive measures behave in computates?

A

They can not be combined along any dimension, usually because the chosen formula prevents combining lower-level averages into higher-level averages.

28
Q

Which measure(s) can occur for any kind of fact?

A

Additive and nonadditive facts

29
Q

Which measure(s) can occur only for snapshot or cumalative snapshot facts?

A

Semi-additive facts

30
Q

Which 5 types of querying are there in a multidimensional database?

A
  1. Slice-and-dice
  2. Drill-down and roll-up
  3. Drill-across
  4. Ranking
  5. Rotating
31
Q

What is the slice and dice query?

A

Making selections to reduce a cube.

-> slice the cube to only consider those sells concering bread

32
Q

What is the drill-down and roll-up query?

A

Inverse operations that use dimension hierarcies and measures to perform aggregations.

Select country instead of cities. If there are two cities in a country, the numbers of these cities are added up to get to the country number.

33
Q

What is the drill-across query?

A

Combining cubes that share one or more dimensions (joining)

34
Q

What is the ranking query?

A

Returning the cells that appear at the top or bottom of a specified order.

35
Q

What is the rotating query?

A

Rotating to see the data grouped by another dimension

36
Q

Which two basic forms are there for multidimensional database implementation?

A
  • MOLAP systems
    They include provisions for handling sparse arrays and apply advanced indexing and hashing to locate the data when performing queries
  • ROLAP systems
    Using relational database technology for storing data, employing specialized index structures to achieve good query performance
37
Q

What are the benefits of a MOLAP implementation?

A
  • More space-efficient storage;

- Faster query response times;

38
Q

What are the benefits of a ROLAP implementation?

A
  • Better scalability in the number of facts they can store.
  • More flexible with cube redefinitions
  • Provide better support for frequent updates
39
Q

What does a ROLAP implementation typically need to employ?

A

A star or a snowflake model.

40
Q

Which two assumptions do traditional multidimensional data models and implemenation techniques have?

A
  1. All facts map directly to the lowest level dimension values and only to one value in each dimension
  2. Dimension hierarchies are balanced trees
41
Q

Why is complex multidimensional data problematic?

A

It is not summarizable.

42
Q

When is a dimension hierarchy considered strict?

A

If no dimension value has more than one direct parent

43
Q

When is a dimension hierarchy considered onto?

A

If the hierarchy is balanced

44
Q

When is a dimension hierarchy considered covering?

A

If no containment paths skips a level.

45
Q

When is a dimension hierarchy irregular?

A

If a lower level is not connected to the middle level for all data points (Country - State - City) –> Denmark does not have states.

Or if a lower level is connected to two middle levels.

(see figure 5)

46
Q

How can you solve irregular dimensions?

A

By normalizing the hierarchies –> adding in dummy variables.

47
Q

What are trends in the multidimensional database world?

A
  1. Data that must be analyzed is getting increasingly distributed. –> need for more integrated data warehouses
  2. It is being applied to data that current technology can not adequately analyze yet, such as data from sensors of moving objects (GPS)
  3. Multidimensional database technology will increasingly be applied where analysis results are fed directly into other systems, thereby eliminating humans from the loop.