KNN Flashcards

1
Q

k-NN process overview

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Goal

A

given a set of labeled items, automatically label a new item

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Idea

A

Consider most similar other items (defined in terms of their attributes), look at their labels and give the unassigned item the majority votes. Ties broken randomly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

To automate knn, what two decisions need to be made

A
  • How to define similarity?
  • How many should vote? (what is k?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Euclidean distance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cosine similarity

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Jaccard distance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hamming distance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Manhatan distance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Regarding distance metrics…what if attributes are a mixture of kinds of data?

A

Define your own custom designed metric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

synonymous terms

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Evaluation metrics

A
  • Accuracy
  • Precision
  • Recall
  • F-score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Evaluation Metric : Accuracy

A

number of correct labels / (total number of labels)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Evaluation Metric : Precision

A

number of true positives /

(number of true positives + number of false positives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Evaluation Metric : Recall

A

Number of true positives /

(number of true positives + number of false negatives)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Evaluation Metric : F-score

A

Harmonic mean of precision and recall

(2 × precision × recall) / (precision + recall)

17
Q

Evaluation Metric : Misclassification rate

A

1-accurary

18
Q

Choosing k

A
  • Need to understand data well to get a good guess
  • Then try a few different k’s and see how evaluation changes. Pick the k that optimizes the chosen evaluation metric
  • In binary classification, pick k to be an odd number
19
Q

Modeling assumptions in K-NN

A
20
Q

Scaling

A

Standardize the data so that all variables are given a mean of zero and a standard deviation of one.

In R, this can be achieved using the scale() function