Data Management - Important Facts Flashcards

1
Q

What is a data model?

A

A simple representation of a complex real-world object or event.

Or a simple representation of a data structre and its relationships.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is functional dependence?

A

If attribute A determines the value of all the rows for attribute B.

Emphasis on all.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is full functional dependence?

A

If the attribute (B) is functionally dependent on a composite key (A) but not on any subset of that composite key, the attribute (B) is fully functionally dependent on (A).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is entity integrity?

A

That a primary key attribute can not have a row with a null value.

Each row is uniquely identified by the primary key.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is referential integrity?

A

All values of a foreign key refer to existing values in another table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is data normalization?

A

A process for evaluating and correcting table structures to minimize data redunandcies, thereby reducing the likelihood of data anomalies.

-> Dividing everything into seperate tables as much as possibile to minimize redundancies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does a DSS look like from an architectual viewpoint?

A
  1. Model based management system that is connected to
  2. A knowledge engine
  3. A graphical user interface.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are mandatory requirements for a datawarehouse process?

To turn operational data into information that can be used to support decision-making?

A
  1. Accesibility to users that are not familiar with data
  2. Intergration of data on the basis of a standard model
  3. Query flexibility to maximize the obtained advantages from existing information
  4. Information conciseness to allow for target-oriented and effective analyses
  5. Multidimensional representation to give users a manageable view of information
  6. Correctness and completeness of integrated data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are three features of a data warehouse?

A
  1. It is subject-oriented
  2. It is integrated and consistent
  3. It shows its evolution over time and it is not volatile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the three main differences between an operational database and a datawarehouse?

A
  1. Operational data covers a short period of time, whereas a datawarehouse covers a period of time.
  2. Data in operational databases can be added, removed or modified. This does not happen in a data warehouse.
  3. Operational queries focus on a small number of tables and their relations. Data warehouse queries focus on huge amounts of data and queries.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are 5 architecture properties that are essential for a data warehouse system?

A
  1. Separation
  2. Scalability
  3. Extensibility
  4. Security
  5. Administerability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which main five types of system are distinguished in scientific literature that also include the beforementioned layers?

A
  1. Independent data marts architecture
  2. Bus architecture
  3. Hub-and-Spoke architecture
  4. Centralized architecture
  5. Federated architecture
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are five advantages of the two-layer architecture?

A
  1. Good quality information is always available, even if access to the sources is unavailable.
  2. Analysis queries do not affect the management of transactions
  3. Data warehouses are logically structured according to the multi dimensional model
  4. OLAP system have historical and summarized data as well - so you avoid a mismatch in terms of time and level of detail
  5. Data warehouses can be designed specifically to optimize performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the advantages of the added ‘reconciled data’ layer?

A
  1. It creates a common reference data model for a whole enterprise.
  2. Sometimes the reconciled layer can be used to better accomplish operational tasks (instead of using the operational sources)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When does ETL happen?

A

When a data warehouse is populated for the first time. After that, it occurs every time the data warehouse is regularly updated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the main data cleansing features in ETL and what do they do?

A
  1. Rectification (Rectify mistakes)

2. Homogenization (Make them similar)

17
Q

Which three types of data extraction are possible?

A
  1. Static extraction
    - Making a snapshot of operational data, done when you populate the data warehouse for the first time
  2. Incremental extraction
    - Used to update data warehouses. Based on a log maintained by the operational DBMS
  3. Source driven extraction
    - You rewrite your operational applications to give notifications of all the data changes
18
Q

What are the three main transformation processes in the transformation phase?

A
  1. Conversion and normalization that operate on both storage formats and units of measure to make data uniform.
  2. Matching that associates equivalent fields in different sources
  3. Selection that reduces the number of source fields and records.
19
Q

Which two ways of loading (into the data warehouse) exist?

A
  1. Refresh
    Replacing al the older data, rewrite the database completely. Normally done at the beginning
  2. Update
    Only add the changes in data. Normally done to update.
20
Q

How could you represent the sales cube with the relational model schema?

A

You start with the fact followed by the used dimensions into brackets and underlined, and then the measures:

SALES(store, product, date, quantity, receipts)

First three are underlined.

21
Q

How could you represent the dependency of the sales cube?

A

The primary key, built up from the dimensions determine the measures.

store, product, date -> quantity, receipts

22
Q

What is the navigation path in OLAP?

A

An analysis process for facts acording to different viewpoints and at different levels of detail.

This path is turned into a sequence of (sequential) queries, each referencing their previous queries.

23
Q

What is the main problem with ROLAP implementations?

A

Performance hits caused by join operations between large tables.

24
Q

What is an aggregate navigator?

A

Component in the multidimensional engine that selects a view from among all the alternatives to solve a specific query at the minimum access cost.

25
Q

What are the benefits of HOLAP?

A

Combines both ROLAP as MOLAP. Takes advantage from standardization and the ability to handle large data from ROLAP, but the query speed from MOLAP.

Largest amount of data stored in a relational datbase, and a multidimensional model for the most frequently used data.

26
Q

What are 7 properties that characterize quality?

A
  1. Accuracy
  2. Freshness
  3. Completeness
  4. Consistency
  5. Availability
  6. Traceability
  7. Clearness
27
Q

Which 5 types of querying are there in a multidimensional database?

A
  1. Slice-and-dice
  2. Drill-down and roll-up
  3. Drill-across
  4. Ranking
  5. Rotating
28
Q

When is a dimension hierarchy considered onto?

A

If the hierarchy is balanced

29
Q

When is a dimension hierarchy irregular?

A

If a lower level is not connected to the middle level for all data points

30
Q

When is a dimension hierarchy considered covering?

A

If no containment paths skips a level.

31
Q

When is a dimension hierarchy considered strict?

A

If no dimension value has more than one direct parent

32
Q

What is a conformed dimension?

A

In data warehousing, a conformed dimension is a dimension that has the same meaning to every fact with which it relates.

Conformed dimensions allow facts and measures to be categorized and described in the same way across multiple facts and/or data marts, ensuring consistent reporting across the enterprise.