M5 Flashcards

(30 cards)

1
Q

Online Analytical Processing (OLAP) with data warehouses tells us what is happening and how while data mining tells us what is likely to ___

A

happen

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data mining is ___ Discovery in (commercial) Databases

A

Knowledge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data mining is a ____ rather than a product

A

process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

This is the fastest growing segment of business intelligence market

A

Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

This is how the data mining in biology and medicine called

A

Bioinformatics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two broad groups of data mining

A

DIrected and Uniderected data mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In this type of data mining, we know what we are looking for and we aim to find the value of a pre-identified target variable in terms of a collection of input variables, eg, classifying insurance claims

A

Directed Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In this type of data mining, it finds patterns in data and leaves it to the user to find the significance of these patterns

A

Undirected Data Mining

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Classification, Estimation, and ___ are under Directed Data Mining

A

Prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Affinity grouping & Assoc rules, ___, and Description & Visualization
are under undirected data mining

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

This type of data mining approach are particularly suitable for solving classification problems

A

Decision Trees

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Each leaf node in a decision tree is labelled with a ___ label

A

class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

In decision tree, rhe class label decided by the class of the records that ended up in that __ during training

A

leaf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In decision tree, each edge originating from an internal node is labelled with a ___ predicate involving that node’s splitting attribute

A

splitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In decision tree, the ___ forces any record to take a unique path from the root to exactly one leaf node

A

splitting predicate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

To prevent overfitting in decision trees, test data set are used to ___ decision trees once it has been built using the training data set.

17
Q

In decision trees, this is to an iterative process of splitting the training data into partitions (regions of record space)

A

recursive partitiioning

18
Q

The most important task in building a decision tree is to decide which of the ___ gives the best split

19
Q

In decision trees, a node becomes a ___ node when no split can be found that significantly decreases the diversity

20
Q

In decision trees, pruning is done by removing leaves and branches (edges leading to leaves) that fail to ___

21
Q

This aims to discover structure in a complex data set as a whole in order to carve it up into simpler groups

A

Automatic Cluster Detection

22
Q

Type of clustering that is available in a wide variety of commercial data mining tools. It divides the data set into a predetermined number, k, of clusters

A

K-Means Clustering

23
Q

In K-Means Clustering, in the first step, k data points are selected to be the seeds. Each seed is an ___ cluster with only one element

24
Q

The ___ of a cluster of records calculated by taking average of each field for all the records in that cluster

25
___ distance most commonly used for measuring distance by data mining software.
Euclidean
26
In the k-means method, the original choice of the value of k determines the number of ___ that will be found
clusters
27
___ claimed to be often more effective than k-means for complex shaped clusters
SOMs
28
Four ways of utilizing data mining expertise in business: 1. By purchasing readymade ___ from outside vendors. 2. By purchasing software that embodies data mining expertise designed for a particular application 3. By hiring outside consultants to perform data mining for special projects 4. By developing own data mining skills within the business organization
scores
29
___ automate the process of creating candidate models and selecting the ones that perform best
Model building software
30
Outside expertise for data mining is likely to be available in three possible places: 1. From a data mining software vendor 2. Data mining centers 3. ___ companies
Consulting