Data Management - What I missed from his slides Flashcards

1
Q

What is business intelligence?

A
  • Data driven decision making

- Transforming data into meaningful information/knowledge to support business decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is knowledge?

A

Processed data or information that is applicabale to a business decision problem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are descriptive analytics, predictive analytics and prescriptive analytics?

A
  1. Using data to understand past and present
  2. Predict future behaviour based on past performance
  3. Make decisions or recommendations to achieve the best performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is business intelligence a product?

A

It provides information and knowledge that enables decision making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is business intelligence a process?

A

You have to extract information and knowledge from data. This requires a series of actions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a database management system (DBMS)?

A

The software that controls the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between a super key and a candidate key?

A

A superkey is any combination of attributes that identifies each row.

A candidate key is a superkey without any unnecessary attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is cardinality?

A

The way tables are related. Their relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference in terms of time between a data warehouse and an operational system?

A

Operational: Current value data

Data warehouse: More historical perspective.
In a data warehouse, every key structure contains an element of time (whether you know it or not).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are 4 conceptual characteristics of a data warehouse?

A
  1. Subject-oriented
  2. Integrated
  3. time-variant
  4. nonvolatile (no updates)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an independent data mart architecture?

A

Different data marts are seperately designed and built in a nonintegrated fashion. They are loaded with data from the source layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Wat is a bus architecture?

A

Similar to an independent data mart system, except that the data marts have conformed dimensions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a hub-and-bespoke architecture?

A

You have datamarts that are loaded with data from the reconciled layer.

One of the most used architectures. Much attention to scalability and extensibility.

Reconciled layer: already integrated, consistent and correct data.

+ creates a common reference model
- leads to more redundancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a centralized architecture?

A

It is an implementation of the hub and bespoke, in which the reconciled layer and the data marts are collapsed into a single physical repository.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a federated architecture?

A

You have datamarts filled by the operational data sources. BUT the data marts are integrated (virtually or physically) after that.

Happens often when there is a merger or acquisition and some datawarehouse needs to continue to exist.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference between the grouped name ‘Data Mart Approach’ (bottom-up) and ‘Enterprise data warehouse approach’ (top-down?

A
  • Data mart approach means you have a collection of data marts. Uses dimensional modelling.
  • Enterprise warehouse means that (even though you might have data marts) they are integrated. Uses Entity-relational modeling.
17
Q

What is a data lake?

A

Similar to a datawarehouse, except that the data is stored in its natural/raw form untill it is needed.

18
Q

What two types of loads are their in the ETL-loading stage?

A
  1. Integral load
    All records are taken and processed by comparing them with the records in the datawarehouse
  2. Delta/incremental load
    Only the changed records are taken and processed systematically in the datawarehouse.
19
Q

What are the different generations of ETL tools?

A
  1. Generation 0 - do it yourself (SQL procedures)
  2. Generation 1 - code generators
  3. Generation 2 - ETL engines
20
Q

What is the difference between ‘Query & reporting’ , ‘OLAP’ and ‘Data Mining Models’?

A

Query and reporting gives static reports through SQL.

OLAP gives interactive analysis of multidimensional data

Data mining discovers new valuable knowledge
(which customer is at risk of leaving?)

21
Q

What is a lattice of cubes?

A

If you roll up the base cube over all its associated dimensions and dimension hierarchies.

Bottom most cube = base cub Cb

Topmost cube = apex = Ct = contains only one cell

22
Q

How do you calculate how many cubes there are in a multidimensional database if there are no hierarchies?

A

You have n dimensions.

2^n

23
Q

How do you calculate how many cubes there are in a multidimensional database if there are hierarchies?

A

You have n dimensions and each dimension has L levels.

(L + 1)^n

24
Q

What is the purpose of a star schema?

A

To limit the number of joins in a multi-dimensional model to increase query performance.

25
Q

What is given in a star schema?

A

You have the fact table in the middle. This table has a list of all the dimension keys. These dimension keys are linked to other tables that are the dimension table. These have lists of their layers.

The fact table also include the measure keys.

26
Q

What is a snowflake schema?

A

Similar to a star schema, except that data is more normalized. This means that some dimension tables are linked to a secondary dimension table further specifying a specific layer in the dimension.

In the first dimension table that means that there is an extra ‘dimension key’ to link to the secondary table.

27
Q

How do you calculate density in a cube?

A

Number of actual rows / number of potential rows.

If the Fact sales has dimension Time, Product and Store and you set the dimension on Time(day), products and Stores.

There are 4 days, 5 products and 3 stores.

Potential rows: 4 * 5 * 3

Actual rows = given = 35

Densitity = 35 / 65 = 53.33%

28
Q

How can you deal with slow changes in dimension attributes?

A
  1. Overwrite old values
  2. Add version number field as part of the key
  3. Add new columns to the dimension table.