test Flashcards

(24 cards)

1
Q

What is data mining?

A

The non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does data mining relate to Knowledge Discovery in Databases (KDD)?

A

Data mining is one (central) step within the broader KDD process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Name four common data-mining tasks.

A

Classification, clustering, association-rule mining, and regression (prediction).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which two broad categories do data-mining tasks fall into?

A

Predictive mining and descriptive mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

During KDD, what is the purpose of the pattern-evaluation step?

A

To identify the truly interesting, actionable patterns discovered by the mining algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

List the main stages of the KDD pipeline in order.

A

Data cleaning → data integration → data selection → data transformation → data mining → pattern evaluation → knowledge presentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How does data mining differ from Online Analytical Processing (OLAP)?

A

OLAP supports fast, interactive exploration of known aggregates, whereas data mining digs deeper to discover previously unknown patterns automatically.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is DMQL and why is it useful?

A

The Data Mining Query Language – a SQL-like language that lets users specify, in a high-level way, which patterns to mine and how to post-process them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why is the data-transformation step necessary before mining?

A

Because many algorithms expect data in specific formats or scales (e.g., normalised numbers, encoded categories).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define data cleaning in the context of KDD.

A

The process of detecting, correcting or removing errors, inconsistencies and noise from the raw data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the role of data integration in KDD?

A

To merge data from multiple heterogeneous sources into a coherent data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why are measures of interestingness important?

A

They rank or filter mined patterns so that only the most relevant, novel or useful results are presented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the three coupling schemes for integrating a mining system with a database?

A

No coupling, loose (semi-tight) coupling, and tight coupling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does the acronym OLAM stand for?

A

Online Analytical Mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Give one advantage of multidimensional data mining over traditional flat-table mining.

A

It can exploit pre-computed cube aggregates to speed up pattern discovery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a fact table in a data-warehouse star schema?

A

A central table containing numeric measures of business processes, referenced by keys to surrounding dimension tables.

17
Q

How does a dimension table differ from a fact table?

A

Dimension tables store descriptive attributes that define the perspectives (e.g., time, product, geography) used to analyse facts.

18
Q

What is metadata in a data-mining context?

A

Data that describes other data – such as schema definitions, data provenance, quality indicators and transformation histories.

19
Q

Describe a schema hierarchy in OLAP.

A

A concept hierarchy that organises attribute values into levels of abstraction (e.g., city→state→country).

20
Q

What are the three strategies for materialising a data cube?

A

Do-not-materialise, partial materialisation, and full materialisation.

21
Q

Contrast OLTP and OLAP systems in one sentence.

A

OLTP captures current, transaction-level data for routine operations, whereas OLAP stores integrated, historical data optimised for analysis.

22
Q

Why is visualisation considered part of the KDD process?

A

It transforms mined patterns into human-readable forms (charts, graphs, dashboards).

23
Q

What is data reduction and when is it applied?

A

A pre-processing technique that reduces data volume but produces a representative sample.

24
Q

Give one ethical consideration before performing data mining.

A

Whether the mining activity violates privacy or could lead to discriminatory outcomes.