naive bayes 2 Flashcards
What is the key assumption made by the Naïve Bayes algorithm?
A. All variables are numerical
B. Predictor variables are independent
C. Predictor variables are correlated
D. The data must be normally distributed
Answer: B. Predictor variables are independent
Explanation: Naïve Bayes assumes that the predictors are conditionally independent given the class label. This is known as the “naïve” assumption.
Which of the following is NOT a strength of Naïve Bayes?
A. It works well with high-dimensional data
B. It handles missing data naturally
C. It’s fast and easy to implement
D. It requires large amounts of training data
Answer: D. It requires large amounts of training data
Explanation: Naïve Bayes is effective even with small datasets and is easy to train, but its independence assumption can be limiting.
What should be done to numerical variables before using Naïve Bayes?
A. Normalize them
B. Drop them
C. Bin and convert to categorical
D. Standardize them
Answer: C. Bin and convert to categorical
Explanation: Naïve Bayes requires categorical inputs, so numeric data must be binned into categories.
In the given classification example, what is the final classification result for the day described as Sunny, Cold, High Humidity, and Windy?
A. Play
B. No Play
Answer: B. No Play
In the image P no play .9 > p of no play .2
Which of the following problems can occur in Naïve Bayes if a predictor value never occurs with a certain class in the training data?
A. Overfitting
B. Bias-variance tradeoff
C. Zero conditional probability
D. Missing data issue
Answer: C. Zero conditional probability
Explanation: If a predictor value never appears with a class in training data, the probability becomes 0, which breaks the classification calculation.
What is the purpose of using a probability cutoff in Naïve Bayes classification?
A. To improve the independence assumption
B. To reduce the number of predictors
C. To handle continuous variables
D. To define the threshold for class assignment
D. To define the threshold for class assignment
Explanation: A cutoff (e.g., 0.5) allows classification decisions to be based on whether predicted probabilities exceed a certain threshold.
What type of learning does Naïve Bayes fall under?
A. Reinforcement learning
B. Unsupervised learning
C. Supervised learning
D. Semi-supervised learning
C. Supervised learning
Explanation: It uses labeled training data to learn how to classify new instances.
Which real-world application was mentioned as an example of Naïve Bayes usage?
A. Stock price prediction
B. Spell check programs
C. Video recommendation systems
D. Customer segmentation
B. Spell check programs
Explanation: Naïve Bayes can classify misspelled words into the most probable correct word class.
Why is Naïve Bayes considered “naïve”?
A. It is not effective on real data
B. It assumes perfect predictions
C. It assumes independence between features
D. It is an outdated method
C. It assumes independence between features
Explanation: Despite often being inaccurate, the independence assumption simplifies computation significantly.
Which of the following best describes how Naïve Bayes classifies a new record?
A. It finds exact matches in the training data.
B. It uses the average of the nearest neighbors.
C. It calculates the likelihood of each class given the predictors.
D. It builds a decision tree from training data.
Answer: C. It calculates the likelihood of each class given the predictors.
Explanation: Naïve Bayes uses Bayes’ Theorem to calculate P(Class∣X), then selects the class with the highest posterior probability.
In Naïve Bayes, how is the joint probability P(X∣Class) typically computed?
A. As the product of individual conditional probabilities
B. Using a regression function
C. Through k-nearest neighbors
D. By summing class frequencies
Answer: A. As the product of individual conditional probabilities
Explanation: Naïve Bayes assumes feature independence
When applying Naïve Bayes, what is one way to handle the zero-probability problem?
A. Increase the cutoff threshold
B. Use Laplace smoothing
C. Use fewer predictor variables
D. Convert all data to numerical
Answer: B. Use Laplace smoothing
Explanation: Laplace smoothing adds a small constant to frequency counts to prevent any probability from being zero.
What type of variables are required to apply Naïve Bayes directly?
A. Only numerical
B. Only binary
C. Categorical
D. Ordinal
Answer: C. Categorical
Explanation: Naïve Bayes needs predictors to be categorical; numerical variables must be binned beforehand.
Suppose a predictor value never occurred with a class label in training. What impact does this have?
A. Makes that record unclassifiable
B. Reduces model accuracy slightly
C. Makes the posterior probability for that class zero
D. Causes overfitting
C. Makes the posterior probability for that class zero
Explanation: If any conditional probability is zero, the whole product becomes zero, eliminating the class from consideration.
Which of the following assumptions makes Naïve Bayes “naïve”?
A. Data is normally distributed
B. Predictors are correlated
C. Predictors are conditionally independent given the class
D. The training data is large and diverse
Answer: C. Predictors are conditionally independent given the class
Explanation: This simplifying assumption makes calculations tractable but is rarely true in practice.
What does the denominator P(X) in Bayes’ Theorem represent?
A. The prior probability of the class
B. The likelihood of the class
C. The marginal probability of the observed predictor values
D. The sum of all prior probabilities
Answer: C. The marginal probability of the observed predictor values
Explanation: P(X) ensures that the posterior probabilities across all classes sum to 1.
In a spam detection application, which of the following would be a “class”?
A. Frequency of the word “free”
B. Total number of emails
C. Label: Spam or Not Spam
D. Subject line length
Answer: C. Label: Spam or Not Spam
Explanation: The “class” is what the model is trying to predict — in this case, whether an email is spam.
- What does it mean if P(Class∣X) is high?
A. The model is confident that X belongs to that class
B. X occurs frequently in the dataset
C. The class is rare in the data
D. The cutoff threshold is too high
Answer: A. The model is confident that X belongs to that class
Explanation: A higher posterior probability indicates higher confidence in the class prediction for record X.