Nearest neighbor Flashcards

Question 1

Q

What are eager and lazy learners?

Answer

A

Eager learners refer to models that learn as soon as the training data becomes available. It is then “learned”, and can predict shit.

Lazy learners wait with learning until the task of prediction is to be done.

Question 2

Q

name an example of lazy learner

Answer

A

Rote classifier.

Rote classifier memorize the entire training set. It then performs classification only if one of the test instances match a training instance in regards to attributes, perfectly.
The obvious drawback of this is the risk of instances not being classified, as they lack the training to do so.

We can do this more flexible by finding the set of all training instances that are RELATIVELY similar to the test instance(s)’ attributes. This is known as “nearest neighbor classifier”.

Question 3

Q

What is the intuition behind nearest neighbor classifier?

Answer

A

“if it walks like a duck, and make sound like a duck, it is probably a duck”.

Question 4

Q

Elaborate on nearest neighbor classifier

Answer

A

Nearest neighbor represent each example as a data point in a D-dimensional space, where D is the number of attributes.

Then it will use a proximity measure to compute the distance between a test-instance (of prediction instance) and the training set instances.

The k-nearest neighbors refers to the k nearest training instances relative to the test instance.

We use “k” as a measure of how many data points we want to use our analysis on. If k=1, we basically assign the class label to be the same label as the closest training instance.
If k>1, we assign the majority share label to be the class label of the test instance.

Question 5

Q

What happens if “k” is too small, say 1?

Answer

A

We are likely to get overfitted results. We capture a shit ton of noise.

Question 6

Q

What happens if “k” is too large?

Answer

A

We might drag different “areas” as one, and get bad results.

Question 7

Q

KNN is very sensitive to the number k. how can we balance it out a little?

Answer

A

we can add a weight that depends on the distance:

wi = 1 / [d(x’, xi)^2]

Question 8

Q

What is especially important to implement using KNN?

Answer

A

A proximity measure

Question 9

Q

Elaborate on performance in regards to time

Answer

A

They are generally quite expensive, as we have to compute all distances for every test set instance.

In comparison, decision trees spend their bulk in the first phase. When they are done, they are extremely efficient (linear time, doesnt get much better).

Question 10

Q

Why are KNN more prone to noise than decision trees?

Answer

A

KNN finds local models, while decision trees find global models. The globability of decision trees remove noise, especially with early stoppage.

When we use KNN with a relatively low value for k, we are looking at potentially VERY small areas, which can contain a lot of noise.

Question 11

Q

Name an advantage of KNN in comparison to decision trees

Answer

A

KNN can split the space more flexible. Recall that decision trees are generally restricted to rectilinear shapes.
KNN can have circles, or more complex functions.

So, the most important advantage of KNN is the fact that it can take on complex shapes like spirals etc that decision trees can only dream of managing.

KNN also have the edge over trees in regards to being able to represent relationships among attributes.

Question 12

Q

Lastly, what is important to consider when using KNN?

Answer

A

It is a proximity based procedure. Therefore, we have to be careful about the scales. We should normalize the data.

Question 13

Q

What is “precision”?

Answer

A

Precision = TruePositives / (TruePositives+FalsePositives)

So, precision says “out of all the ones you predicted to be positive, how many of those are actually positive”. Note how precision is not about “correct” or not, it is about how many true the model predicts relative to the number of actual true elements.

Precision is a measure that gives us more than accuracy in some cases. Accuracy struggle on unbalanced data sets. Precision is better on such sets.

Example:

The model predicted that 10 were positive. However, from the test set we know that only 7 of them were actually positive. That gives a precision of 0.7

Question 14

Q

What is “recall”?

Answer

A

Recall is defined as “the number of true positives divided by the sum of true positives and false negatives”.
Recall = TruePositives / (TruePositives + FalseNegatives)

Recall thus tells us how many of the ones that are actually correct we actually got right.

Question 15

Q

What is F-measure?

Question 16

Q

Elaborate on accuracy

Answer

A

accuracy measure the number of correct predicitons relative to the total amount of predictions.

Works well on balanced sets. The problem is that with unbalanced sets, it will be good no matter what, which makes it shit.

Furthermore, accuracy is great when both classes are of roughly equal importance.

Question 17

Q

Elaborate on precision

Answer

A

precision is known as the “positive predictive value”, because it captures how well we give out positives. As it is defined as TruePositives/(TruePositives + FalsePositives), it gives us a number of “how many of the one our model said were true, are actually true”.

It is therefore an indicator on how often the model spits out bullshit. In systems where false positives are absolutely NO GO, precision is a great measure.

Imagine a spam-detection system. We would not want to classify important mails as spam. This is a case where false positives are damaging.

Question 18

Q

Elaborate on “recall”

Answer

A

Recall is defined as = TruePositives / (TruePositives + FalseNegatives)

In other words, “how well do we capture the true elements?”. Recall does not care about False positives. This indicate that Recall is a measure we use when the most important thing is to make sure that we AT LEAST find the values that are true. Then there is a tradeoff between other factors in terms of guessing wrong a lot of times.

So, recall is nice when the cost of false negatives is very high. I suppose cancer analysis fits here. We would rather have a sensitive system that detects all tumors, than a system that dont catch some.

Therefore, recall is also a sensitivity measure,

Question 19

Q

Elaborate on F-score

Answer

A

F-measure is a combination of precision and recall.

F-Measure = 2PrecisionRecall / (Precision + Recall)

F measure gives equal weighting to precision and recall.

Great for unbalanced datasets and when both classes mean equally much to us.

Question 20

Q

When do we use precision

Answer

A

We use precision when we dont want false positives

Question 21

Q

When do we use recall

Answer

A

We use recall when we dont want false negatives

Question 22

Q

When do we use F-measure?

Answer

A

Balance between precision and recall, and with unbalanced datasets with labels of equal importance.

Question 23

Q

When do we use KNN and when do we use decision trees?

Answer

A

For categorical data that is very nominal and not binary etc, decision trees are likely better at capturing the underlying function. It is difficult to encode certain categorical attributes, which we need to be aware of.

Binary attributes are usually fine with encoding and KNN, as we basically just get two extremely precise clusters. We then use these clusters with the KNN logic of “if it looks like a duck …. it is probably a duck”. We dont necessarily need a very precise distance measure to be accurate enough.

Question 24

Q

Elaborate on characteristics of KNN

Answer

A

1) Instnace based learning: Does not require a model, but requries a proximity funciton that can measure similarity.

2) Can be quite expensive, especially if we want to classify many cases. This is because it will do everything always. Eager learners, on the other hand, spends a great bulk of time creating the model, and once build, offers a significant speed advantage. For instance, decision tree deduction is linear in time complexity.

3) KNN make use of local information. Decision trees, on the other hand, takes ALL data into account.

4) KNN can produce boundaries of arbritrary shape. Can offer advantages over decision trees in cases where the pattern is complex.

5) Difficulty in handlign missing values.

6) Can definitely handle the effect of attributes that are interatcing. this was a drawback of deicision trees, which only create rectilinear splits.

7) Irrelevant attributes can significantly fuck up due to all attributes taking a part in the proximity calcualtions.

8) KNN is very sensitive to the scale of values for its attributes, due to the fact that the procimity measyures will be fucked up.

Question 25

Q

elaborate on cosine similarity

Answer

A

Used for things like documents. Why does it work good for documents? It works good for documents because cosine will tend towards zero if entries are mostly 0. this is good for docs because we dont want to say that two documents are similar because they are empty.

the measure itself is calculated as cos(Œ) = xy/|x||y], essentially the scalar product.

Question 26

Q

What is the bottleneck of KNN and how can we improve it?

Answer

A

The bottleneck is the insane amount of distance computaitons. we usually have a O(nd) thing going on. If we use k-d trees, we can get O(logn). Worst case O(n).

Question 27

Q

elaboratew on the shape of the decision boundaries when using KNN

Answer

A

they are extremely flexible, can rpdouce any shape.

Question 28

Q

Are KNN local or global classifier?

Answer

A

local. Local means that it use a subset of the data (local region) to make the decisions. Therefore, decision trees are also local.

Deep neural nets are global. Linear regression is global.

Question 29

Q

Brainscape's Knowledge GenomeTM

Nearest neighbor Flashcards

Brainscape's Knowledge Genome^TM