test Flashcards

Question 1

Q

What is data mining?

Answer

A

The non-trivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data.

Question 2

Q

How does data mining relate to Knowledge Discovery in Databases (KDD)?

Answer

A

Data mining is one (central) step within the broader KDD process.

Question 3

Q

Name four common data-mining tasks.

Answer

A

Classification, clustering, association-rule mining, and regression (prediction).

Question 4

Q

Which two broad categories do data-mining tasks fall into?

Answer

A

Predictive mining and descriptive mining.

Question 5

Q

During KDD, what is the purpose of the pattern-evaluation step?

Answer

A

To identify the truly interesting, actionable patterns discovered by the mining algorithms.

Question 6

Q

List the main stages of the KDD pipeline in order.

Answer

A

Data cleaning → data integration → data selection → data transformation → data mining → pattern evaluation → knowledge presentation.

Question 7

Q

How does data mining differ from Online Analytical Processing (OLAP)?

Answer

A

OLAP supports fast, interactive exploration of known aggregates, whereas data mining digs deeper to discover previously unknown patterns automatically.

Question 8

Q

What is DMQL and why is it useful?

Answer

A

The Data Mining Query Language – a SQL-like language that lets users specify, in a high-level way, which patterns to mine and how to post-process them.

Question 9

Q

Why is the data-transformation step necessary before mining?

Answer

A

Because many algorithms expect data in specific formats or scales (e.g., normalised numbers, encoded categories).

Question 10

Q

Define data cleaning in the context of KDD.

Answer

A

The process of detecting, correcting or removing errors, inconsistencies and noise from the raw data.

Question 11

Q

What is the role of data integration in KDD?

Answer

A

To merge data from multiple heterogeneous sources into a coherent data set.

Question 12

Q

Why are measures of interestingness important?

Answer

A

They rank or filter mined patterns so that only the most relevant, novel or useful results are presented.

Question 13

Q

What are the three coupling schemes for integrating a mining system with a database?

Answer

A

No coupling, loose (semi-tight) coupling, and tight coupling.

Question 14

Q

What does the acronym OLAM stand for?

Answer

A

Online Analytical Mining.

Question 15

Q

Give one advantage of multidimensional data mining over traditional flat-table mining.

Answer

A

It can exploit pre-computed cube aggregates to speed up pattern discovery.

Question 16

Q

What is a fact table in a data-warehouse star schema?

Answer

Study These Flashcards

A

A central table containing numeric measures of business processes, referenced by keys to surrounding dimension tables.

Question 17

Q

How does a dimension table differ from a fact table?

Answer

Study These Flashcards

A

Dimension tables store descriptive attributes that define the perspectives (e.g., time, product, geography) used to analyse facts.

Question 18

Q

What is metadata in a data-mining context?

Answer

Study These Flashcards

A

Data that describes other data – such as schema definitions, data provenance, quality indicators and transformation histories.

Question 19

Q

Describe a schema hierarchy in OLAP.

Answer

Study These Flashcards

A

A concept hierarchy that organises attribute values into levels of abstraction (e.g., city→state→country).

Question 20

Q

What are the three strategies for materialising a data cube?

Answer

Study These Flashcards

A

Do-not-materialise, partial materialisation, and full materialisation.

Question 21

Q

Contrast OLTP and OLAP systems in one sentence.

Answer

Study These Flashcards

A

OLTP captures current, transaction-level data for routine operations, whereas OLAP stores integrated, historical data optimised for analysis.

Question 22

Q

Why is visualisation considered part of the KDD process?

Answer

Study These Flashcards

A

It transforms mined patterns into human-readable forms (charts, graphs, dashboards).

Question 23

Q

What is data reduction and when is it applied?

Answer

Study These Flashcards

A

A pre-processing technique that reduces data volume but produces a representative sample.

Question 24

Q

Give one ethical consideration before performing data mining.

Answer

Study These Flashcards

A

Whether the mining activity violates privacy or could lead to discriminatory outcomes.

test Flashcards

(24 cards)