Data mining Flashcards

1
Q

name qualitative

A

nominal
ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

name quantitative

A

interval
ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data preprocessing

A

aggregation
sampling
dimensionality reduction
feature subset selection
feature creation
discretization and binarization
attribute transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

aggregation

A

combining two or more attributes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

types of sampling

A

simple random sampling
sampling with replacement
sampling without replacement
stratified sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

dimensionality reduction

A

PCA
singular value decomposition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

feature subset selection

A

brute-force approach
embedded approach
filter approach
wrapper approach

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

attribute transformation

A

standardization
normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

pro MIN

A

can handle non-elliptical shapes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

limitation MIN

A

sensitive to noise and outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

pro MAX, group average, ward’s method

A

less susceptible to noise and outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

limitation average group, ward’s method

A

biased towards globular clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

limitation MAX

A

tends to break large clusters
biased towards globular clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

4 advantages of using decision tree

A

inexpensive to construct
extremely fast for classifying unknown records
easy to interpret
accuracy is comparable to others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

4 disadvantages of using decision tree

A

do not generalize well to certain boolean functions
the used induction algorithm is greedy
not expressive enough for modeling continuous variables
tree replication

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

4 disadvantages using MAX

A

tendency to break large clusters
biased towards global clusters
once a decision is made, can’t be undone
no objective function is minimized

17
Q

classification techniques

A

decision tree
rule-based
memory-based
neutral networks
support vector machines