Data Integration and Modelling Flashcards
(11 cards)
What are the components of a Conceptual Model in a Data Warehouse?
Facts, Measures and Dimensions
Define Facts, Measures and Dimensions
Facts: Focus of interest for decision making.
E.g Sales, Shipments
Measures: Attributes that describe facts from different points of view.
E.g. Each sale is measured by its revenue.
Dimensions: Discrete attributes which determine the granularity adopted to represent facts
E.g. Product, Store, Date
What is the purpose of a Multidimensional E/R Model?
Its purpose is to create an intuitive representation
of the multidimensional data that is optimized for
high-performance access
How are dimension hierarchies organised?
Dependencies between the classification levels are described by the classification schema through functional dependencies.
What are the advantages of using a Star Schema?
Improved Query performance for often-used data
Less tables and simple structure
Efficient query processing with regard to dimensions.
What is a data cube?
Cubes consist of data cells with one or more dimensions.
The coordinates of a cell are the classification nodes.
What are some of the properties of dimension tables?
- Contain many descriptive columns: dimension tables are wide.
- Generally don’t have too many rows.
- Contents are relatively static: almost like a lookup table.
What are some of the uses of dimension tables?
Some of the uses of dimension tables include:
1. Filters are based on dimension attributes.
2. Grouping columns are dimension attributes.
3. Fact tables are references through dimensions.
What are the advantages and disadvantages of using a snowflake schema?
The main advantage of using a snowflake schema is that the queries can run faster, as the size of the dimension tables can be reduced.
Disadvantages:
Fact tables are responsible for 90% of the storage requirements. Thus, normalising the dimensions usually lead to insignificant improvements.
Normalisation of the dimension-tables can reduce the performance of the DW because it leads to a large number of tables.
What are the advantages and disadvantages of using a star schema?
The main advantages of using a star schema are:
- Improved query performance for often-used data
- Less tables and simple structure
- Efficient query processing with regard to dimensions.
The main disadvantage is:
- High overhead of redundant data.
What is a star schema and what are the advantages/disadvantages of using this model?