L17 - KNN and Weighted-KNN Flashcards

1
Q

What type of model is KNN?

A

A supervised learning model for classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What core assumption does the model work on?

A

Data points within close proximity are likely to be of the same class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the majority vote concept of KNN?

A

The new data point is classified based on the majority class of the surrounding data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which distance metric is used to determine similarity?

A

Euclidian distance metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is K? What type of parameter is it?

A

The coefficient of the algorithm, representing the number of surrounding points assessed in the majority vote.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 2 main issues that can skew the classification performance?

A

Outliers - If a class has outliers close to a cluster of a different class, this can cause incorrect classification of new data.
Class imbalance - If one class count heavily outweighs another, this can cause incorrect classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the solution to Outliers and Class Imbalance?

A

Weighting data points by the inverse of their distance to the new data point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is the selection of K so important?

A

Determines the number of neighbours to assess.

Chosen carefully to avoid under-fitting or overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why should K be an odd number?

A

To avoid classification ties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If there are a high number of outliers, should a high or low K be chosen? Give reason…

A

High K.

To compensate for the outliers by having a wider spread of data points to assess.

This will ensure nearby class clusters can be assessed, which will outweigh the outlier points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 2 main methods for choosing K? Explain each…

A

Incremental: Start with K = 1, and increment by 1. Perform a classification test data upon each incrementation to determine the classification performance with that K.

Square Root: K = the square root of all data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of applying Weighted KNN? How does it work?

A

Compensates for outliers and class imbalance by assigning a higher weight to nearer points.

All K points are assessed. Points of the same class have their weight summed. New data point is assigned to the class with the greatest weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are some things to consider when using KNN or Weighted-KNN?

A

Training Data Size - Performance degradation occurs when training on large data sets. Complexity grows with training size.

Normalisation - All data should be normalised to be between 0 and 1.

Dimensionality - Both work better in lower dimensions. Thus feature space should be decreased. E.g though feature selection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are some advantages and disadvantages of KNN and WKNN?

A

Advantages:
- Simple to implement
- Adaptable
- Few hyper-parameters

Disadvantages:
- Computationally expensive on large data
- Doesn’t perform well on high dimensional data
- Prone to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 2 hyper-parameters of KNN?

A

K
Distance Metric

How well did you know this?
1
Not at all
2
3
4
5
Perfectly