Data Warehousing Flashcards
(37 cards)
Data warehouse
A collection of data that helps analysts to make decisions
True or false: Data warehouses store operational databases
False! Data warehouses store historical data.
Features of a data warehouse
Subject-oriented - provides info around a subject rather than operations
Integrated - data from many different sources
Time-variant - data is identified with a particular period (e.g. last 12 months)
Non-volatile – data is not erased when new data is added
Information processing
Processing data via queries or statistical analysis
Analytical processing
Processing data via Online Analytical Processing (OLAP) tools
Data mining
Finding hidden patterns and associations in data
Features of an enterprise data warehouse
o An EDW
o An operational data store
o Data marts
Operational data store
A hybrid data warehouse containing integrated information
Data extraction
Gathering data from a variety of sources
Data cleaning
Finding and correcting errors
Data transformation
Converting data to warehouse format
Data loading
Sorting/summarising/consolidating/checking integrity and building patterns accordingly
Refreshing
Updating data sources to the warehouse
Dimensional modelling
Different individual models (e.g. separate models for sales and inventory)
True or false: Dimensional modelling leads to fewer tables
True! Data is grouped together and includes redundancies.
Fact table
Measurement of business (e.g. sales)
True or false: Fact tables are usually small
False! Fact tables usually range from 1 to 5 TB.
Transaction fact table
Information related to events (e.g. product sales)
Snapshot fact table
Information from specific moments in time (e.g. year-end accounts)
Accumulating snapshot
A running tally (e.g. year to year sales figures)
Dimension table
Describes dimensions of the business (e.g. product, customer)
True or false: Dimension tables are larger than fact tables
False! They are usually smaller than fact tables.
Types of dimension table
o Time o Geography (address/location) o Product o Customer o Range (range of values for time, price, etc.)
Star schema
Dimension tables connected to a central fact table