Lecture 4 - K Neighbour, Multiclass Classifcation Types Flashcards

(22 cards)

1
Q

What is k-Nearest Neighbours?

A

K-Nearest Neighbours (K-NN) is a non-parametric, instance-based machine learning algorithm used for both classification and regression. It makes predictions based on the similarity (distance) between input points and training data.
- Basic idea: similar instances will be closer to each other in the feature space.
- Based on measures of similarity: the distance between the instances in the feature space is defined by a distance metric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does k-NN work?

A

For classification:
- Let’s say k=3:
1. A new data point is introduced.
2. The algorithm finds the 3 training samples closest to the new point.
If the 3 neighbours are: [Class A, Class B, Class A], the predicted class is Class A (majority vote).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How is distance measured in k-NN?

A

Euclidean distance (most common)
Manhattan distance
Minkowski distance
REFER TO NOTES FOR FORMULAS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the key properties of k-NN?

A

Memory-intensive, but simple and intuitive.
Expensive testing or prediction.
Works better when the density of the feature space is high and similar for each class.
Requires a meaningful distance metric.
Noise and outliers may have negative effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the trade-offs in k-NN?

A

Small values of k: risk of overfitting.
Higher values of k: risk of underfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is k-NN is sensitive to outliers, why and how do you fix it?

A

It is sensitive to outliers because it makes predictions based on the nearest point, this closest point could be an outlier, mislabelled or noisy.
Three ways to reduce the impact on the model would be:
○ To increase the value of k, allowing the classifier to average over multiple nearby points
○ Use distance-weighted voting to give closer neighbours for influence by weighting their votes according to distance
○ Can also use Attribute-weighted k-NN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is distance-weighted k-NN?

A

Distance-Weighted K-NN is a variation of the standard K-Nearest Neighbours algorithm where closer neighbours are given more influence in making predictions than those further away.
Rather than taking a simple majority vote (classification) or simple average (regression) among the k nearest points, DW-KNN weights each neighbour’s contribution based on its distance to the query point — the closer, the greater the weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why use distance-weighted k-NN?

A
  • In regular K-NN:
    ○ All k neighbours are treated equally, even if some are much farther from the query point.
  • In DW-KNN:
    ○ We acknowledge that closer neighbours are usually more relevant.
    This reduces misclassification or poor predictions caused by distant points that are technically among the k nearest, but not truly similar.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is attribute-weighted k-NN?

A

Attribute-Weighted K-NN (also called Feature-Weighted K-NN) is a variation of the traditional K-NN algorithm where different features (attributes) are assigned different importance (weights) when calculating the distance between data points.
- Instead of treating all features equally, AW-KNN allows you to emphasize more relevant features and suppress noisy or irrelevant ones, making distance calculations more meaningful and reducing the impact of outliers and noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why use attribute-weighted k-NN?

A
  • Standard K-NN assumes all features contribute equally to similarity — which is often not true.
  • Some features may be more informative than others (e.g., “symptom severity” might be more important than “patient ID”).
  • Noisy or irrelevant features can distort distances and lead to incorrect neighbours being selected.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Feature Normalisation?

A

Feature normalization (also known as feature scaling) is a data preprocessing step where features are rescaled to a common range or distribution. This ensures that no single feature dominates the others in algorithms that rely on distance calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is feature normalisation important in k-NN?

A

K-Nearest Neighbours (K-NN) calculates distance (e.g., Euclidean) between data points. If one feature has a much larger scale than the others, it can dominate the distance computation, leading the model to give undue importance to that feature — even if it’s not relevant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the Benefits of using Feature Normalisation in k-NN?

A

Ensures fair feature contribution
Improves accuracy
Speeds up convergence (for some variants)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the two types of Multiclass Classification?

A

One-vs-All (OvA)
One-verse-One (OvO)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why do we use OvO or OvA

A

In multiclass classification (more than two classes), many machine learning models — like Logistic Regression, SVM, or K-NN — are inherently binary classifiers. So we use strategies to extend them to handle multiple classes:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is One-vs-All (OvA)

A

The idea is it breaking down a multiclass classification problem into a binary one. It is one class verse the rest of the classes. For each classifier created it will ask either yes/no questions or it will provide a score. It will go through each set and then by doing this one will end up with the highest score or will be predicted as yes

How It Works:
* For each class, train a separate classifier to distinguish that class vs. all the others.
○ Classifier 1: Class A vs (Class B and Class C)
○ Classifier 2: Class B vs (Class A and Class C)
○ Classifier 3: Class C vs (Class A and Class B)
Each classifier answers:
“Is this input my class or not?”
Prediction Phase:
* Each binary classifier outputs either:
○ A class label (Yes or No)
○ Or a score (like a probability or margin) indicating confidence
* The final predicted class is the one with:
○ The highest confidence score
○ Or the “Yes” vote if only one classifier returns a positive result


REFER TO SLIDE FOR EXAMPLE

17
Q

What are the pros and cons of OvA?

A
  • Pros:
    ○ Simple and easy to implement
    ○ Efficient if K is not too large
  • Cons:
    ○ Can be biased if class imbalance exists
    ○ May produce conflicting predictions when classes overlap
18
Q

What is One-verse-One (OvO)

A

For every possible pair of classes, we train a model. Each model only compares two classes at a time and ignores the others.
When you give the system a new input, every model makes a decision between the two classes it was trained on. Each time a class is picked, it gets a vote. After all the models vote, the class with the most votes is chosen as the final answer.
There are n(n-1)/2 binary classifiers for n classes

Let’s say the test input goes through:
Classifier Prediction
A vs B → A
A vs C → C
B vs C → C

🔢 Vote count:
* A: 1 vote (from A vs B)
* B: 0 votes
* C: 2 votes (from A vs C and B vs C)
✅ Final prediction: Class C

REFER TO SLIDES FOR EXAMPLE

19
Q

What are the pros and cons of OvO?

A
  • Pros:
    ○ More focused classifiers (since only two classes are involved at a time)
    ○ Can be more accurate when classes are hard to separate globally
  • Cons:
    ○ Computationally expensive with large K
    ○ Requires aggregating multiple decisions (majority voting)
20
Q

What is Multilabel Classification?

A

Multilabel classification is a machine learning task where each instance can be assigned multiple labels simultaneously — rather than just one (as in standard classification).
- Unlike multiclass classification (where you pick one class out of many),
- Multilabel classification allows more than one class to be true at the same time.
=== EXAMPLE
Let’s say you’re tagging images.
Labels:
* A = “Dog”
* B = “Outdoor”
* C = “Person”
You show it a picture of a person walking a dog outside.
The model might predict:
Label Prediction
Dog ✅ True
Outdoor ✅ True
Person ✅ True

So the final prediction is: [Dog, Outdoor, Person]

So, multiple labels (people) are True at once.

21
Q

What is the evaluation metric used in Multilabel Classification?

A

F1-score per label or macro/micro average of f1, precision and recall

22
Q

What is Multioutput-Multiclass Classification?

A

Each input can have multiple labels (outputs), and each label can come from more than two possible classes (i.e., not just “yes/no”, but many options per label).
=== EXAMPLE
You have an image of a fruit, and you want to classify TWO labels for the same image:
Type of fruit (Multiclass):
- Possible values: apple, banana, grape, orange, pear – 5 classes
Colour of fruit (Multiclass):
- Possible values: orange, red, green, yellow, purple, brown – 6 classes

Say I input a red apple, it will classify and output apple and red for each class (fruit and colour)