Classification Flashcards
week 7
classification probabilities
P(class|features) we’re essentially trying to understand the likelihood of an observation belonging to a particular class given its features.
Approaches to Classifcation
Generative classifiers:.
Generative classifiers try to understand how data is generated, modeling both the features and the classes together.
These generate probabilities P(class|predictors) by first estimating other distributions.
Rely on statistical theory like Bayes theorem.
Discriminative classifiers:
Discriminative classifiers focus on predicting the class directly based on the observed features, without necessarily understanding the underlying data generation process.
Estimate P(class|predictors) directly.
Also referred to as conditional classifiers.
prior probability for a class
The “prior probability for a class” refers to the probability of a particular class occurring before considering any evidence or features. It represents our initial belief or assumption about the likelihood of each class before we observe any data.
Bayes’ Theorem
Bayes’ Theorem to determine the probability of a class given a set of features (or a feature vector).
Posterior Probability for a Class:
P(j∣x) represents the probability of class j given a feature vector x. This is what we want to find out—it’s like the updated probability of each class after we’ve observed the features.
misclassification rate
The performance of a classifier is usually measured by its misclassification rate. The misclassification rate is the proportion of observations assigned to the wrong class.
Linear Discriminant Analysis
Linear Discriminant Analysis
- Linear Discriminant Analysis is often abbreviated to LDA.
- LDA applicable when all the features are quantifiable.
- We assume that fj is a (joint) normal probability distribution.
- In addition, we assume that the covariance matrix is the same from class to class.
- The classes are differentiated by locations of their means.
Kernel Discriminant Analysis (KDA):
KDA extends LDA by allowing for more complex, nonlinear decision boundaries between classes. It achieves this by mapping the data into a higher-dimensional space using a kernel function. KDA essentially “lifts” the data into a higher-dimensional space where the classes might be more easily separable and then applies LDA in this new space.