Data Warehouse Design - Chapter 1.4 Flashcards

1
Q

What happens in the data staging layer?

A

The ETL processes that extract, integrate and clean data from operational sources to feed the data warehouse layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When does ETL happen?

A

When a data warehouse is populated for the first time. After that, it occurs every time the data warehouse is regularly updated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is the ETL stage often defined in the three-layer architecture and why?

A

Reconciliation.

  • As the ETL processes feed the reconciled data layer, which in turn feeds the data warehouse. This is considered the most complex stage.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which four stages are present in ETL?

A
  1. Extraction (capture)
  2. Cleansing (cleaning/scrubbing)
  3. Transformation
  4. Loading

*Note: Cleansing and transformation stage are often blurred together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the main difference between the cleansing and the transformation stage?

A

Cleansing focusses on rectifying data values, whereas transformation is focussed on managing data formats.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which three types of data extraction are possible?

A
  1. Static extraction
    - Making a snapshot of operational data, done when you populate the data warehouse for the first time
  2. Incremental extraction
    - Used to update data warehouses. Based on a log maintained by the operational DBMS
  3. Source driven extraction
    - You rewrite your operational applications to give notifications of all the data changes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What happens in the cleansing stage of ETL?

A

Remove inconsistencies and mistakes that make the data dirty to improve data quality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the most frequent mistakes with data that require cleansing?

A
  1. Duplicate data
  2. Inconsistent values that are logically associated
  3. Missing data
  4. Unexpected use of fields
  5. Imposible or wrong values
  6. Incosistent values for a single entity because different practices were used
  7. Inconsistent values for one individual entity because of typing mistakes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main data cleansing features in ETL and what do they do?

A
  1. Rectification (Rectify mistakes)
  2. Homogenization (Make them similar)

-> Using specific dictionaries to rectify typing mistakes and to recognize synonyms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is rule-based cleansing?

A

A technique to achieve rectification and homogenization. It enforces domain-specific rules and defines appropiate associations between values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens in the transformation phase?

A

Data is converted from its operational source format into a specific data warehouse format.

-> In a three-layer architecture, this results into the reconciled layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which data aspects are rectified in the transformation phase?

A
  1. Loose texts that may hide valuable information
    (BigDeal LtD might hide that it is a limted partnership company)
  2. Different formats can be used for individual data
    (Data saved as a string or as three integers)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the three main transformation processes in the transformation phase?

A
  1. Conversion and normalization that operate on both storage formats and units of measure to make data uniform.
  2. Matching that associates equivalent fields in different sources
  3. Selection that reduces the number of source fields and records.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which transformation technique is used when populating a data warehouse after the transformation phase?

A

Denormalization.

so it requires aggregation to sum up data from the data warehouse as end-user

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which two ways of loading (into the data warehouse) exist?

A
  1. Refresh
    Replacing al the older data, rewrite the database completely. Normally done at the beginning
  2. Update
    Only add the changes in data. Normally done to update.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly