Data Warehouse Design - Chapter 1.3 Flashcards

1
Q

What are 5 architecture properties that are essential for a data warehouse system?

A
  1. Separation
  2. Scalability
  3. Extensibility
  4. Security
  5. Administerability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is meant by seperation?

A

Analytical and transactional processing should be kept apart as much as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by scalability?

A

Hardware and software architectures should be easy to upgrade as the data volume and number of users requirements increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by extensibility?

A

It should be able to host new appliations and technologies without redesiging the whole system

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is meant by security?

A

Monitoring access is essential because of the strategic data stored in data warehouses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is meant by administerability?

A

Management of the system should not be overly difficult

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the characteristics of a single-layer data warehouse?

A
  • There is only one physical layer of data, which is the source layer of operational data
  • The data warehouse is virtual, created by middleware.
  • It does not meet the requirement ‘seperation’
  • Rarely used in real life
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When can a single layer data warehouse be successful?

A

If analysis needs are particularly restricted and the data volume to analyze is huge.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the stages in the two-layer architecture?

A
  1. Source layer
  2. Data staging
  3. Data warehouse layer
  4. Analysis

–> So it has four stages, but two physical layers, namely the source layer and the data warehouse layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the source layer?

A

Data that is originally stored to corporate relational databases or legacy databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is meant by data staging?

A

Here data that is stored in the physical layer is extracted, cleansed and integrated into one common scheme.
For this, ETL tools are used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are ETL Tools?

A

Extraction, Transformation and Loading tools - to merge, extract, transform, cleanse, alidate, filter and load source data into a data warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the data warehouse layer?

A

The place where the information is stored to one logically centralized repository.
This data warehouse can sometimes automatically be accesed or sometimes data marts are used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a data mart?

A

A subset or an aggregation of the data stored to a primary data warehouse. It includes a set of information pieces relevant to a specific business area, deparment or group of users

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What happes in the analysis stage?

A

Integrated data is efficiently and flexibly accessed to issue reports, analyze information and simulate business scenarios

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are three reasons data marts are useful?

A
  1. They can be used as building blocks while incrementally developing data warehouses;
  2. They mark out the information required by a specific group of users to solve queries;
  3. They can deliver better performance because they are smaller than primary data warehouses
17
Q

Which two types of data marts are there?

A
  1. Dependent data marts - populated from a primary data warehouse
  2. Independent data marts - populated by the sources
    (streamlines the design, but prone to inconsistencies between the different marts)
18
Q

How can you avoid data inconsistencies between independent data marts?

A

By creating a primary data warehouse that is populated by the individual marts (instead of the other way around).

19
Q

What are five advantages of the two-layer architecture?

A
  1. Good quality information is always available, even if access to the sources is unavailable.
  2. Analysis queries do not affect the management of transactions
  3. Data warehouses are logically structured according to the multi dimensional model
  4. OLAP system have historical and summarized data as well - so you avoid a mismatch in terms of time and level of detail
  5. Data warehouses can be designed specifically to optimize performance
20
Q

What is the key characteristic of the three-layer architecture?

A

It has all the stages from the two-layer architecture, but in between the data staging and the data warehouse there is another layer called ‘reconciled data’.

-> So the data warehouse is populated by the reconciled data, not by the operational sources

21
Q

What are the advantages of the added ‘reconciled data’ layer?

A
  1. It creates a common reference data model for a whole enterprise.
  2. Sometimes the reconciled layer can be used to better accomplish operational tasks (instead of using the operational sources)
22
Q

Which main architectual principles are used for data warehouse systems?

A
  1. One layer system
  2. Two layer system
  3. Three layer system
23
Q

Which main five types of system are distinguished in scientific literature that also include the beforementioned layers?

A
  1. Independent data marts architecture
  2. Bus architecture
  3. Hub-and-Spoke architecture
  4. Centralized architecture
  5. Federated architecture
24
Q

What is the independent data mart architecture?

A

Different data marts are separately designed and build in a non-integrated fashion.

  • > used when there is no desire for an enterprise-wide warehousing project or when divisions are only loosely related
  • > Not the best data integration and cross-reporting
25
Q

What is the bus architecture?

A

Similar to the independent data mart architecture, but the data marts are all designed according the same conformed dimensions to ensure logical integration and an enterprise-wide view of information.

26
Q

What is the hub-and-spoke architecture?

A

The data marts are feeded by the ‘reconciled data’ layer. Users access the data marts, but sometimes also the reconciled layer.

  • > Attention to scalability, extensibility & an enterprise-wide view.
  • > Most used in medium to large companies.
27
Q

What is the centralized architecture?

A

It is an implementation of the hub-and-spoke architecture, in which the reconciled layer and the data marts are collapsed into a single physical repository.

28
Q

What is the federated architecture?

A

There are multiple data marts that are supplied by the operational data sources. These data marts are then either physically or virtually integrated with the others iin the logical-phyiscal integration phase.

-> Used in dynamix context where preexisting data warehouses / data marts need to be integrated. (think with a merger)

29
Q

Which 4 factors are particularly influential when choosing one of these architectures?

A
  • The amount of interdependent information exchanged between organizational units
  • Restrictions on (human) capital resources could lead to a ‘quick’ architecture
  • Enterprise strategy
  • The need for integration of pre-existing data warehouses can lead to a federal system.