K-Means Flashcards

1
Q

Radius: ____________ from any point of the cluster to its centroid

A

square root of average distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Diameter: _______________ between all pairs of points in the cluster

A

square root of average mean squared distance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the elbow method?

A

plots the value of the cost function produced by different values ofk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

The value ofkat which improvement in distortion ___________ the most is called the elbow

A

declines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cost Function: For each k, calculate the ______________

A

total within-cluster sum of square (wss).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Support:

A

Freq (X,Y) / N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Confidence:

A

Freq (X,Y) / Freq (X)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Lift:

A

Support / Support(X) * Support(Y)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Conviction:

A

1-supp(y)/(1- conf(x->y))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If an itemset is frequent, then all of its ______ must also be frequent

A

subsets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If an itemset is not frequent, then all of its _______ cannot be frequent

A

supersets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The ______ of an itemset never exceeds the _________ of its subsets

A

support

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mining Association Rules

A
  1. Generate all itemsets whose support >=minsup
  2. Generate high confidence rules from each frequent itemset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

An association rule r is strong if

A

Support(r) ≥ min_sup
Confidence(r) ≥ min_conf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Classification Accuracy

A

the number of correct predictions made as a ratio of all predictions made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Log Loss

A

a performance metric for evaluating the predictions of probabilities of membership to a given class.

17
Q

Area Under ROC Curve

A

a performance metric for binary classification problems.

18
Q

Sensitivity

A

the true positive rate also called the recall. It is the number instances from the positive (first) class that predicted correctly.

19
Q

Specificity

A

the true negative rate. Is the number of instances from the negative class (second) class that were predicted correctly.

20
Q

Gini Coefficient

A

2*AUC – 1

21
Q

The ROC curve

A

the plot between sensitivity and (1- specificity)

22
Q

AUC

A

the ratio under the curve and the total area

23
Q

Lift charts

A

a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model.

24
Q

Calculate the points of the lift curve by

A

determining the ratio between the result predicted by our model and the result using no model.