Data Warehouse Design - Chapter 1.3 Flashcards
What are 5 architecture properties that are essential for a data warehouse system?
- Separation
- Scalability
- Extensibility
- Security
- Administerability
What is meant by seperation?
Analytical and transactional processing should be kept apart as much as possible
What is meant by scalability?
Hardware and software architectures should be easy to upgrade as the data volume and number of users requirements increase
What is meant by extensibility?
It should be able to host new appliations and technologies without redesiging the whole system
What is meant by security?
Monitoring access is essential because of the strategic data stored in data warehouses
What is meant by administerability?
Management of the system should not be overly difficult
What are the characteristics of a single-layer data warehouse?
- There is only one physical layer of data, which is the source layer of operational data
- The data warehouse is virtual, created by middleware.
- It does not meet the requirement ‘seperation’
- Rarely used in real life
When can a single layer data warehouse be successful?
If analysis needs are particularly restricted and the data volume to analyze is huge.
What are the stages in the two-layer architecture?
- Source layer
- Data staging
- Data warehouse layer
- Analysis
–> So it has four stages, but two physical layers, namely the source layer and the data warehouse layer
What is the source layer?
Data that is originally stored to corporate relational databases or legacy databases
What is meant by data staging?
Here data that is stored in the physical layer is extracted, cleansed and integrated into one common scheme.
For this, ETL tools are used.
What are ETL Tools?
Extraction, Transformation and Loading tools - to merge, extract, transform, cleanse, alidate, filter and load source data into a data warehouse.
What is the data warehouse layer?
The place where the information is stored to one logically centralized repository.
This data warehouse can sometimes automatically be accesed or sometimes data marts are used.
What is a data mart?
A subset or an aggregation of the data stored to a primary data warehouse. It includes a set of information pieces relevant to a specific business area, deparment or group of users
What happes in the analysis stage?
Integrated data is efficiently and flexibly accessed to issue reports, analyze information and simulate business scenarios
What are three reasons data marts are useful?
- They can be used as building blocks while incrementally developing data warehouses;
- They mark out the information required by a specific group of users to solve queries;
- They can deliver better performance because they are smaller than primary data warehouses
Which two types of data marts are there?
- Dependent data marts - populated from a primary data warehouse
- Independent data marts - populated by the sources
(streamlines the design, but prone to inconsistencies between the different marts)
How can you avoid data inconsistencies between independent data marts?
By creating a primary data warehouse that is populated by the individual marts (instead of the other way around).
What are five advantages of the two-layer architecture?
- Good quality information is always available, even if access to the sources is unavailable.
- Analysis queries do not affect the management of transactions
- Data warehouses are logically structured according to the multi dimensional model
- OLAP system have historical and summarized data as well - so you avoid a mismatch in terms of time and level of detail
- Data warehouses can be designed specifically to optimize performance
What is the key characteristic of the three-layer architecture?
It has all the stages from the two-layer architecture, but in between the data staging and the data warehouse there is another layer called ‘reconciled data’.
-> So the data warehouse is populated by the reconciled data, not by the operational sources
What are the advantages of the added ‘reconciled data’ layer?
- It creates a common reference data model for a whole enterprise.
- Sometimes the reconciled layer can be used to better accomplish operational tasks (instead of using the operational sources)
Which main architectual principles are used for data warehouse systems?
- One layer system
- Two layer system
- Three layer system
Which main five types of system are distinguished in scientific literature that also include the beforementioned layers?
- Independent data marts architecture
- Bus architecture
- Hub-and-Spoke architecture
- Centralized architecture
- Federated architecture
What is the independent data mart architecture?
Different data marts are separately designed and build in a non-integrated fashion.
- > used when there is no desire for an enterprise-wide warehousing project or when divisions are only loosely related
- > Not the best data integration and cross-reporting