Chapter 22 k-Nearest Neighbors Flashcards

1
Q

DOES KNN MODEL, LEARN FROM THE DATASET? P108

A

KNN has no model other than storing the entire dataset, so there is no learning required. It stores the whole model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

HOW DOES KNN PREDICT? P108

A

Predictions are made for a new data point by searching through the entire training set for the k most similar instances (the neighbors) and summarizing the output variable for those k instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

HOW DOES KNN DETERMINE THE MOST SIMILAR DATA POINTS TO A NEW DATA POINT? P108

A

A distance measure is used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

WHEN IS IT BETTER TO USE EUCLIDEAN DISTANCE? HOW ABOUT MANHATTAN? P109

A

Euclidean is a good distance measure to use if the input variables are similar in type (e.g. all measured widths and heights).
Manhattan distance is a good measure to use if the input variables are not similar in type (such as age, gender, height, etc.).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHAT CAN BE DONE IF THE DATASET IS VERY LARGE AND IT WOULD BE COMPUTATIONALLY EXPENSIVE TO USE KNN? P109

A

For very large training sets, KNN can be made stochastic by taking a sample from the training dataset from which to calculate the k-most similar instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

WHAT ARE THE OTHER NAMES OF KNN? P109

A

ˆ Instance-Based Learning: The raw training instances are used to make predictions. As such KNN is often referred to as instance-based learning or a case-based learning (where each training instance is a case from the problem domain).
ˆ Lazy Learning: No learning of the model is required and all of the work happens at the time a prediction is requested. As such, KNN is often referred to as a lazy learning algorithm.
ˆ Nonparametric: KNN makes no assumptions about the functional form of the problem being solved. As such KNN is referred to as a nonparametric machine learning algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WHAT IS THE CURSE OF DIMENSIONALITY? P110

A

KNN works well with a small number of input variables (p), but struggles when the number of inputs is very large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

HOW CAN WE PREPARE DATA FOR KNN? P110

A

ˆ Rescale Data: KNN performs much better if all of the data has the same scale. Normalizing your data to the range between 0 and 1 is a good idea. It may also be a good idea to standardize your data if it has a Gaussian distribution.
ˆ Address Missing Data: Missing data will mean that the distance between samples cannot be calculated. These samples could be excluded or the missing values could be imputed.
ˆ Lower Dimensionality: KNN is suited for lower dimensional data. You can try it on high dimensional data (hundreds or thousands of input variables) but be aware that it may not perform as well as other techniques. KNN can benefit from feature selection that reduces the dimensionality of the input feature space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly