knn Flashcards

1
Q

What is the role of the training set in the K-Nearest Neighbors (KNN) algorithm?

A. To estimate parameters for a model
B. To train a neural network
C. To serve as a reference for classifying new observations
D. To identify the principal components

A

Answer: C
Explanation: KNN is a lazy learner and uses the entire training set as reference when classifying new observations. There is no training in the traditional sense.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a low value of (e.g., 1 or 3) typically capture?

A. Global trends in the data
B. Local structure and noise
C. Principal components
D. Cross-validation error

A

Answer: B
Explanation: Low values capture local patterns but may also overfit due to sensitivity to noise.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which of the following is a primary advantage of using the KNN algorithm?

A. Requires a small training set
B. Makes strong assumptions about the data
C. Captures complex interactions without a model
D. Offers real-time prediction even in high dimensions

A

Answer: C
Explanation: KNN can capture complex patterns in data without assuming a specific model form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which distance metric is most commonly used in KNN?

A. Manhattan distance
B. Cosine similarity
C. Euclidean distance
D. Jaccard index

A

Answer: C
Explanation: Euclidean distance is computationally cheap and is the most commonly used metric in KNN

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the key limitation of KNN in high-dimensional datasets?

A. Overfitting due to too much data
B. Decreasing accuracy as data increases
C. Curse of dimensionality
D. Requires data normalization

A

Answer: C
Explanation: As the number of predictors increases, all points become distant from each other, making KNN less effective. This is known as the curse of dimensionality.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When k=n (i.e., all data points are used), KNN reduces to which of the following?

A. Linear Regression
B. Naïve Bayes
C. Naïve classification rule
D. Decision Tree

A

Answer: C
Explanation: When k=n, all points are considered neighbors, so the predicted class is simply the majority class in the training set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In KNN, how is a class predicted for a new observation in a classification problem?

A. By summing the distances of k neighbors
B. By averaging response values
C. By majority voting of k nearest neighbors
D. By applying a decision tree to the k neighbors

A

Answer: C
Explanation: The most common class among the k nearest neighbors is used to classify the new observation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a numerical prediction task using KNN, how is the prediction made?

A. Using logistic regression
B. Averaging response values of the neighbors
C. Predicting the mode of neighbors
D. Choosing the response of the single nearest neighbor

A

Answer: B
Explanation: In regression, KNN returns the average (possibly weighted) of the response values of the k nearest neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What method is recommended to address the curse of dimensionality in KNN?

A. Add more predictors
B. Use neural networks
C. Use PCA to reduce dimensions
D. Increase k to a very large number

A

Answer: C
Explanation: PCA (Principal Component Analysis) is commonly used to reduce dimensionality and improve KNN performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why is KNN referred to as a “lazy learner”?

A. It memorizes only the final model
B. It does not generalize well to unseen data
C. It does not build a model; it stores training data for on-the-fly computation
D. It stops learning after one epoch

A

Answer: C
Explanation: KNN makes predictions at runtime by comparing new records to stored training data without building a model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens to the error rate if the value of k is too low?

A. It increases due to underfitting
B. It decreases and stabilizes
C. It increases due to overfitting and noise sensitivity
D. It remains unchanged

A

Answer: C
Explanation: Low values of k make the model sensitive to local noise, which increases the risk of overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the best value of k typically selected?

A. By minimizing the training error
B. By maximizing the distance to the nearest neighbor
C. By trial and error
D. By minimizing the classification error on validation data

A

Answer: D
Explanation: The optimal value of k is usually chosen based on the lowest validation error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following is true about KNN?

A. It is a model-based method
B. It makes parametric assumptions about data
C. It is a non-parametric, data-driven approach
D. It requires prior distributional knowledge

A

Answer: C
Explanation: KNN is non-parametric and does not assume any functional form or distribution of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the impact of increasing the number of predictors (p) in a dataset used for KNN?

A. Improves accuracy due to more information
B. Decreases computation time
C. Increases the expected distance to the nearest neighbor
D. Reduces the need for normalization

A

Answer: C
Explanation: As the number of predictors increases, all points tend to become far apart, a key symptom of the “curse of dimensionality.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

In multi-class classification using KNN, how is the final class assigned?

A. Weighted average of neighbor responses
B. Assign based on the closest class
C. Class with the highest frequency among k neighbors
D. Randomly among the classes in the neighbor group

A

Answer: C
Explanation: The new instance is classified into the most common class among the k nearest neighbors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is KNN not ideal for real-time predictions in large datasets?

A. It requires training time that is too long
B. It stores too little information
C. It must compute distance to all records at prediction time
D. It builds complex decision trees

A

Answer: C
Explanation: KNN computes distances to all records during prediction, making it slow and computationally expensive for real-time use.

16
Q

Which scenario is an appropriate use case for KNN?

A. Classifying email as spam or not spam
B. Predicting stock prices in high-frequency trading
C. Recommending movies using collaborative filtering
D. Analyzing structured time series data

A

Answer: A
Explanation: KNN is well-suited for binary classification problems like spam detection, especially when data is structured and features are meaningful.

17
Q

Which of the following can be used to improve KNN performance?

A. Increase the number of classes
B. Avoid normalizing features
C. Use dimensionality reduction methods like PCA
D. Always choose k=1

A

Answer: C
Explanation: Dimensionality reduction helps alleviate the curse of dimensionality and improves KNN efficiency and accuracy.

18
Q

Which KNN feature makes it capable of capturing complex patterns?

A. It performs PCA internally
B. It uses advanced hyperparameter tuning
C. It directly compares input features without modeling
D. It uses a weighted decision tree

A

Answer: C
Explanation: KNN relies on the similarity of instances, which inherently captures complex feature interactions without building a model.

19
Q

What does the term “majority decision rule” refer to in KNN?

A. Choosing the mode of the outcome variable in training data
B. Using the most frequent class among the neighbors to classify a record
C. Voting between models
D. Assigning random class labels based on frequency

A

Answer: B
Explanation: In classification, KNN assigns the class label that occurs most frequently among the k nearest neighbors.