knn Flashcards
What is the role of the training set in the K-Nearest Neighbors (KNN) algorithm?
A. To estimate parameters for a model
B. To train a neural network
C. To serve as a reference for classifying new observations
D. To identify the principal components
Answer: C
Explanation: KNN is a lazy learner and uses the entire training set as reference when classifying new observations. There is no training in the traditional sense.
What does a low value of (e.g., 1 or 3) typically capture?
A. Global trends in the data
B. Local structure and noise
C. Principal components
D. Cross-validation error
Answer: B
Explanation: Low values capture local patterns but may also overfit due to sensitivity to noise.
Which of the following is a primary advantage of using the KNN algorithm?
A. Requires a small training set
B. Makes strong assumptions about the data
C. Captures complex interactions without a model
D. Offers real-time prediction even in high dimensions
Answer: C
Explanation: KNN can capture complex patterns in data without assuming a specific model form.
Which distance metric is most commonly used in KNN?
A. Manhattan distance
B. Cosine similarity
C. Euclidean distance
D. Jaccard index
Answer: C
Explanation: Euclidean distance is computationally cheap and is the most commonly used metric in KNN
What is the key limitation of KNN in high-dimensional datasets?
A. Overfitting due to too much data
B. Decreasing accuracy as data increases
C. Curse of dimensionality
D. Requires data normalization
Answer: C
Explanation: As the number of predictors increases, all points become distant from each other, making KNN less effective. This is known as the curse of dimensionality.
When k=n (i.e., all data points are used), KNN reduces to which of the following?
A. Linear Regression
B. Naïve Bayes
C. Naïve classification rule
D. Decision Tree
Answer: C
Explanation: When k=n, all points are considered neighbors, so the predicted class is simply the majority class in the training set.
In KNN, how is a class predicted for a new observation in a classification problem?
A. By summing the distances of k neighbors
B. By averaging response values
C. By majority voting of k nearest neighbors
D. By applying a decision tree to the k neighbors
Answer: C
Explanation: The most common class among the k nearest neighbors is used to classify the new observation.
In a numerical prediction task using KNN, how is the prediction made?
A. Using logistic regression
B. Averaging response values of the neighbors
C. Predicting the mode of neighbors
D. Choosing the response of the single nearest neighbor
Answer: B
Explanation: In regression, KNN returns the average (possibly weighted) of the response values of the k nearest neighbors.
What method is recommended to address the curse of dimensionality in KNN?
A. Add more predictors
B. Use neural networks
C. Use PCA to reduce dimensions
D. Increase k to a very large number
Answer: C
Explanation: PCA (Principal Component Analysis) is commonly used to reduce dimensionality and improve KNN performance
Why is KNN referred to as a “lazy learner”?
A. It memorizes only the final model
B. It does not generalize well to unseen data
C. It does not build a model; it stores training data for on-the-fly computation
D. It stops learning after one epoch
Answer: C
Explanation: KNN makes predictions at runtime by comparing new records to stored training data without building a model.
What happens to the error rate if the value of k is too low?
A. It increases due to underfitting
B. It decreases and stabilizes
C. It increases due to overfitting and noise sensitivity
D. It remains unchanged
Answer: C
Explanation: Low values of k make the model sensitive to local noise, which increases the risk of overfitting.
How is the best value of k typically selected?
A. By minimizing the training error
B. By maximizing the distance to the nearest neighbor
C. By trial and error
D. By minimizing the classification error on validation data
Answer: D
Explanation: The optimal value of k is usually chosen based on the lowest validation error.
Which of the following is true about KNN?
A. It is a model-based method
B. It makes parametric assumptions about data
C. It is a non-parametric, data-driven approach
D. It requires prior distributional knowledge
Answer: C
Explanation: KNN is non-parametric and does not assume any functional form or distribution of data.
What is the impact of increasing the number of predictors (p) in a dataset used for KNN?
A. Improves accuracy due to more information
B. Decreases computation time
C. Increases the expected distance to the nearest neighbor
D. Reduces the need for normalization
Answer: C
Explanation: As the number of predictors increases, all points tend to become far apart, a key symptom of the “curse of dimensionality.”
In multi-class classification using KNN, how is the final class assigned?
A. Weighted average of neighbor responses
B. Assign based on the closest class
C. Class with the highest frequency among k neighbors
D. Randomly among the classes in the neighbor group
Answer: C
Explanation: The new instance is classified into the most common class among the k nearest neighbors.
Why is KNN not ideal for real-time predictions in large datasets?
A. It requires training time that is too long
B. It stores too little information
C. It must compute distance to all records at prediction time
D. It builds complex decision trees
Answer: C
Explanation: KNN computes distances to all records during prediction, making it slow and computationally expensive for real-time use.
Which scenario is an appropriate use case for KNN?
A. Classifying email as spam or not spam
B. Predicting stock prices in high-frequency trading
C. Recommending movies using collaborative filtering
D. Analyzing structured time series data
Answer: A
Explanation: KNN is well-suited for binary classification problems like spam detection, especially when data is structured and features are meaningful.
Which of the following can be used to improve KNN performance?
A. Increase the number of classes
B. Avoid normalizing features
C. Use dimensionality reduction methods like PCA
D. Always choose k=1
Answer: C
Explanation: Dimensionality reduction helps alleviate the curse of dimensionality and improves KNN efficiency and accuracy.
Which KNN feature makes it capable of capturing complex patterns?
A. It performs PCA internally
B. It uses advanced hyperparameter tuning
C. It directly compares input features without modeling
D. It uses a weighted decision tree
Answer: C
Explanation: KNN relies on the similarity of instances, which inherently captures complex feature interactions without building a model.
What does the term “majority decision rule” refer to in KNN?
A. Choosing the mode of the outcome variable in training data
B. Using the most frequent class among the neighbors to classify a record
C. Voting between models
D. Assigning random class labels based on frequency
Answer: B
Explanation: In classification, KNN assigns the class label that occurs most frequently among the k nearest neighbors.