knn Flashcards

Question 1

Q

What is the role of the training set in the K-Nearest Neighbors (KNN) algorithm?

A. To estimate parameters for a model
B. To train a neural network
C. To serve as a reference for classifying new observations
D. To identify the principal components

Answer

A

Answer: C
Explanation: KNN is a lazy learner and uses the entire training set as reference when classifying new observations. There is no training in the traditional sense.

Question 2

Q

What does a low value of (e.g., 1 or 3) typically capture?

A. Global trends in the data
B. Local structure and noise
C. Principal components
D. Cross-validation error

Answer

A

Answer: B
Explanation: Low values capture local patterns but may also overfit due to sensitivity to noise.

Question 3

Q

Which of the following is a primary advantage of using the KNN algorithm?

A. Requires a small training set
B. Makes strong assumptions about the data
C. Captures complex interactions without a model
D. Offers real-time prediction even in high dimensions

Answer

A

Answer: C
Explanation: KNN can capture complex patterns in data without assuming a specific model form.

Question 4

Q

Which distance metric is most commonly used in KNN?

A. Manhattan distance
B. Cosine similarity
C. Euclidean distance
D. Jaccard index

Answer

A

Answer: C
Explanation: Euclidean distance is computationally cheap and is the most commonly used metric in KNN

Question 5

Q

What is the key limitation of KNN in high-dimensional datasets?

A. Overfitting due to too much data
B. Decreasing accuracy as data increases
C. Curse of dimensionality
D. Requires data normalization

Answer

A

Answer: C
Explanation: As the number of predictors increases, all points become distant from each other, making KNN less effective. This is known as the curse of dimensionality.

Question 6

Q

When k=n (i.e., all data points are used), KNN reduces to which of the following?

A. Linear Regression
B. Naïve Bayes
C. Naïve classification rule
D. Decision Tree

Answer

A

Answer: C
Explanation: When k=n, all points are considered neighbors, so the predicted class is simply the majority class in the training set.

Question 7

Q

In KNN, how is a class predicted for a new observation in a classification problem?

A. By summing the distances of k neighbors
B. By averaging response values
C. By majority voting of k nearest neighbors
D. By applying a decision tree to the k neighbors

Answer

A

Answer: C
Explanation: The most common class among the k nearest neighbors is used to classify the new observation.

Question 8

Q

In a numerical prediction task using KNN, how is the prediction made?

A. Using logistic regression
B. Averaging response values of the neighbors
C. Predicting the mode of neighbors
D. Choosing the response of the single nearest neighbor

Answer

A

Answer: B
Explanation: In regression, KNN returns the average (possibly weighted) of the response values of the k nearest neighbors.

Question 9

Q

What method is recommended to address the curse of dimensionality in KNN?

A. Add more predictors
B. Use neural networks
C. Use PCA to reduce dimensions
D. Increase k to a very large number

Answer

A

Answer: C
Explanation: PCA (Principal Component Analysis) is commonly used to reduce dimensionality and improve KNN performance

Question 10

Q

Why is KNN referred to as a “lazy learner”?

A. It memorizes only the final model
B. It does not generalize well to unseen data
C. It does not build a model; it stores training data for on-the-fly computation
D. It stops learning after one epoch

Answer

A

Answer: C
Explanation: KNN makes predictions at runtime by comparing new records to stored training data without building a model.

Question 11

Q

What happens to the error rate if the value of k is too low?

A. It increases due to underfitting
B. It decreases and stabilizes
C. It increases due to overfitting and noise sensitivity
D. It remains unchanged

Answer

A

Answer: C
Explanation: Low values of k make the model sensitive to local noise, which increases the risk of overfitting.

Question 12

Q

How is the best value of k typically selected?

A. By minimizing the training error
B. By maximizing the distance to the nearest neighbor
C. By trial and error
D. By minimizing the classification error on validation data

Answer

A

Answer: D
Explanation: The optimal value of k is usually chosen based on the lowest validation error.

Question 13

Q

Which of the following is true about KNN?

A. It is a model-based method
B. It makes parametric assumptions about data
C. It is a non-parametric, data-driven approach
D. It requires prior distributional knowledge

Answer

A

Answer: C
Explanation: KNN is non-parametric and does not assume any functional form or distribution of data.

Question 14

Q

What is the impact of increasing the number of predictors (p) in a dataset used for KNN?

A. Improves accuracy due to more information
B. Decreases computation time
C. Increases the expected distance to the nearest neighbor
D. Reduces the need for normalization

Answer

A

Answer: C
Explanation: As the number of predictors increases, all points tend to become far apart, a key symptom of the “curse of dimensionality.”

Question 15

Q

In multi-class classification using KNN, how is the final class assigned?

A. Weighted average of neighbor responses
B. Assign based on the closest class
C. Class with the highest frequency among k neighbors
D. Randomly among the classes in the neighbor group

Answer

A

Answer: C
Explanation: The new instance is classified into the most common class among the k nearest neighbors.

Question 16

Q

Why is KNN not ideal for real-time predictions in large datasets?

A. It requires training time that is too long
B. It stores too little information
C. It must compute distance to all records at prediction time
D. It builds complex decision trees

Answer

Study These Flashcards

A

Answer: C
Explanation: KNN computes distances to all records during prediction, making it slow and computationally expensive for real-time use.

Question 17

Q

Which scenario is an appropriate use case for KNN?

A. Classifying email as spam or not spam
B. Predicting stock prices in high-frequency trading
C. Recommending movies using collaborative filtering
D. Analyzing structured time series data

Answer

Study These Flashcards

A

Answer: A
Explanation: KNN is well-suited for binary classification problems like spam detection, especially when data is structured and features are meaningful.

Question 18

Q

Which of the following can be used to improve KNN performance?

A. Increase the number of classes
B. Avoid normalizing features
C. Use dimensionality reduction methods like PCA
D. Always choose k=1

Answer

Study These Flashcards

A

Answer: C
Explanation: Dimensionality reduction helps alleviate the curse of dimensionality and improves KNN efficiency and accuracy.

Question 19

Q

Which KNN feature makes it capable of capturing complex patterns?

A. It performs PCA internally
B. It uses advanced hyperparameter tuning
C. It directly compares input features without modeling
D. It uses a weighted decision tree

Answer

Study These Flashcards

A

Answer: C
Explanation: KNN relies on the similarity of instances, which inherently captures complex feature interactions without building a model.

Question 20

Q

What does the term “majority decision rule” refer to in KNN?

A. Choosing the mode of the outcome variable in training data
B. Using the most frequent class among the neighbors to classify a record
C. Voting between models
D. Assigning random class labels based on frequency

Answer

Study These Flashcards

A

Answer: B
Explanation: In classification, KNN assigns the class label that occurs most frequently among the k nearest neighbors.

knn Flashcards

(20 cards)