Lecture 1 Flashcards
Definitions and road map in classification and Bayesian Decision Theory (33 cards)
what is the definition of a classifier?
it is a function or an algorithm that maps every possible input (from a legal set of inputs) to a finite set of decisions
what are discriminating features?
they are features that should help the system to discriminate between inputs from different classes
what is Classification error (total error)?
the ratio between the number of test objects and the objects that were classified incorrectly
for example, if there are 50 test objects, and 17 of them were classified incorrectly, the classification error is 17/50
True / False ?
It is better to use one discriminating feature, because it is simpler
False!!
usually one feature is poor alone, and it results in relatively high total error
what is overfitting the data?
using complicated boundaries that are too “tuned” to the particular
training data
lesson learned: avoid using too complicated boundaries
example of overfitting: using a boundary that match EVERY datum of the training data, even if it is an outlier
True / False?
overfitting don’t generalize well
True
True / False?
If decision boundary 1 perform better on the training data than decision boundary 2, it will likely perform better on new data too
False!
One of the examples is overfitting; overfitting decision boundary perform ideally on the training data but usually perform poorer on testing data than a simpler decision boundary
True / False ?
A lot is known easier
A little is known harder
True
True / False?
training data is needed when the probability distribution of the categories is known
False
What do we need to do when the shape of probability distribution is known?
we need to estimate parameters of probability
distribution from the training data
What should we do when no probability distribution (no shape or
parameters are known) but the shape of discriminant functions is
known
Need to estimate parameters of the
discriminant function
Which is considered “easier”?
1.The shape of the probability distribution is known
2.The shape of the discrimination function is known
1
What to do when neither probability distribution nor discriminant function is known (all we have is labled data)?
Estimate the probability distribution from the labeled data
What to do when NOTHING is known (even the data is not labled!)?
- Estimate the probability distribution from the unlabeled data
- Cluster the data
To which type of problems we can design optimal classifier?
Know probability distribution of the categories (Also called Bayesian Decision theory)
But it is rare in the real world
How proir probabilities are determined and what they reflect?
They reflect our prior knowledge, and they are determined accordingly
What event we choose when using ML classifier
The event which maximize the likelihood function; the event that is most likely to happen
Is likelihood function:
1. A density
2. A probability distribution
3. Non of the above
3
True or False?
Posterior is proportional to likelihood * prior
True
according to bayes, without the normalizing factor, cause it doesn’t depend on the class
True or False?
MAP classifier minimizes the probability of error
True
What is loss function?
loss functions lamda(alpha_i,c_j) describes the loss occurred when taking action alpha_i when the true
class is c_j
True or False?
Action is just another word for class
False
For example, in some cases we may want to refuse to make a decision, so the number of actions is larger than the number of classes
Why we want to have a loss function?
Usually some mistakes are more costly than others
For example: classifying a benign tumor as
cancer is not as bad as classifying cancer
as benign tumor
What is conditional risk?
It is the expected loss associated with taking
action
It is a function of an action alpha and data x
sum of (for each class): (the penalty to decide to take action alpha when the class of x is c) * (the probability of class x being c)