Data Warehousing Flashcards

(37 cards)

1
Q

Data warehouse

A

A collection of data that helps analysts to make decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

True or false: Data warehouses store operational databases

A

False! Data warehouses store historical data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Features of a data warehouse

A

Subject-oriented - provides info around a subject rather than operations
Integrated - data from many different sources
Time-variant - data is identified with a particular period (e.g. last 12 months)
Non-volatile – data is not erased when new data is added

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Information processing

A

Processing data via queries or statistical analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Analytical processing

A

Processing data via Online Analytical Processing (OLAP) tools

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data mining

A

Finding hidden patterns and associations in data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Features of an enterprise data warehouse

A

o An EDW
o An operational data store
o Data marts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Operational data store

A

A hybrid data warehouse containing integrated information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Data extraction

A

Gathering data from a variety of sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data cleaning

A

Finding and correcting errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Data transformation

A

Converting data to warehouse format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data loading

A

Sorting/summarising/consolidating/checking integrity and building patterns accordingly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Refreshing

A

Updating data sources to the warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dimensional modelling

A

Different individual models (e.g. separate models for sales and inventory)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

True or false: Dimensional modelling leads to fewer tables

A

True! Data is grouped together and includes redundancies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Fact table

A

Measurement of business (e.g. sales)

17
Q

True or false: Fact tables are usually small

A

False! Fact tables usually range from 1 to 5 TB.

18
Q

Transaction fact table

A

Information related to events (e.g. product sales)

19
Q

Snapshot fact table

A

Information from specific moments in time (e.g. year-end accounts)

20
Q

Accumulating snapshot

A

A running tally (e.g. year to year sales figures)

21
Q

Dimension table

A

Describes dimensions of the business (e.g. product, customer)

22
Q

True or false: Dimension tables are larger than fact tables

A

False! They are usually smaller than fact tables.

23
Q

Types of dimension table

A
o	Time
o	Geography (address/location)
o	Product
o	Customer
o	Range (range of values for time, price, etc.)
24
Q

Star schema

A

Dimension tables connected to a central fact table

25
True or false: A star schema is normalised
Both! The fact table is normalised but the dimension tables aren't, resulting in faster read queries and analysis.
26
How does a star schema work?
Fact data is pulled from dimension tables, duplicated and stored in the fact table
27
Advantages of star schema
o Simpler queries – dimension tables are treated as one large table o Easier business insights o Better query performance o No bottlenecks from normalised schemas o Faster queries o Read only o Can be used to build OLAP cubes
28
Disadvantages of star schema
o Less accurate data due to normalisation o Less capable of handling complex queries o No many-to-many relationships
29
Snowflake schema
Like a star schema, but the dimension tables connect to other dimension tables
30
True or false: A snowflake schema is normalised
True! Its purpose is to normalise data in a star schema.
31
Advantages of snowflake schema
o No integrity issues o Faster write queries o Uses less space than star schema
32
Why are read queries slower in a snowflake schema?
The DBMS needs to search through multiple tables
33
Roll-up
Removes dimensions in a data cube by: o Climbing up a hierarchy for a dimension o Dimension reduction (e.g. orders grouped by city rather than country)
34
Drill-down
Adds dimensions to a data cube to create more detailed data
35
Slice
Selects a dimension and creates a sub-cube
36
Dice
Slicing multiple dimensions
37
Pivot/rotation
Swapping data axes to provide an alternative presentation of data