Lecture 1 Flashcards by rosa 28

what is the definition of a classifier?

it is a function or an algorithm that maps every possible input (from a legal set of inputs) to a finite set of decisions

How well did you know this?

Not at all

Perfectly

what are discriminating features?

they are features that should help the system to discriminate between inputs from different classes

How well did you know this?

Not at all

Perfectly

what is Classification error (total error)?

the ratio between the number of test objects and the objects that were classified incorrectly

for example, if there are 50 test objects, and 17 of them were classified incorrectly, the classification error is 17/50

How well did you know this?

Not at all

Perfectly

True / False ?

It is better to use one discriminating feature, because it is simpler

False!!
usually one feature is poor alone, and it results in relatively high total error

How well did you know this?

Not at all

Perfectly

what is overfitting the data?

using complicated boundaries that are too “tuned” to the particular
training data

lesson learned: avoid using too complicated boundaries

example of overfitting: using a boundary that match EVERY datum of the training data, even if it is an outlier

How well did you know this?

Not at all

Perfectly

True / False?

overfitting don’t generalize well

True

How well did you know this?

Not at all

Perfectly

True / False?

If decision boundary 1 perform better on the training data than decision boundary 2, it will likely perform better on new data too

False!

One of the examples is overfitting; overfitting decision boundary perform ideally on the training data but usually perform poorer on testing data than a simpler decision boundary

How well did you know this?

Not at all

Perfectly

True / False ?

A lot is known easier
A little is known harder

True

How well did you know this?

Not at all

Perfectly

True / False?

training data is needed when the probability distribution of the categories is known

False

How well did you know this?

Not at all

Perfectly

What do we need to do when the shape of probability distribution is known?

we need to estimate parameters of probability
distribution from the training data

How well did you know this?

Not at all

Perfectly

What should we do when no probability distribution (no shape or
parameters are known) but the shape of discriminant functions is
known

Need to estimate parameters of the
discriminant function

How well did you know this?

Not at all

Perfectly

Which is considered “easier”?
1.The shape of the probability distribution is known
2.The shape of the discrimination function is known

How well did you know this?

Not at all

Perfectly

What to do when neither probability distribution nor discriminant function is known (all we have is labled data)?

Estimate the probability distribution from the labeled data

How well did you know this?

Not at all

Perfectly

What to do when NOTHING is known (even the data is not labled!)?

Estimate the probability distribution from the unlabeled data
Cluster the data

How well did you know this?

Not at all

Perfectly

To which type of problems we can design optimal classifier?

Know probability distribution of the categories (Also called Bayesian Decision theory)

But it is rare in the real world

How well did you know this?

Not at all

Perfectly

How proir probabilities are determined and what they reflect?

Study These Flashcards

They reflect our prior knowledge, and they are determined accordingly

What event we choose when using ML classifier

Study These Flashcards

The event which maximize the likelihood function; the event that is most likely to happen

Is likelihood function:
1. A density
2. A probability distribution
3. Non of the above

Study These Flashcards

True or False?

Posterior is proportional to likelihood * prior

Study These Flashcards

True

according to bayes, without the normalizing factor, cause it doesn’t depend on the class

True or False?

MAP classifier minimizes the probability of error

Study These Flashcards

True

What is loss function?

Study These Flashcards

loss functions lamda(alpha_i,c_j) describes the loss occurred when taking action alpha_i when the true
class is c_j

True or False?

Action is just another word for class

Study These Flashcards

False

For example, in some cases we may want to refuse to make a decision, so the number of actions is larger than the number of classes

Why we want to have a loss function?

Study These Flashcards

Usually some mistakes are more costly than others

For example: classifying a benign tumor as
cancer is not as bad as classifying cancer
as benign tumor

What is conditional risk?

Study These Flashcards

It is the expected loss associated with taking
action

It is a function of an action alpha and data x

sum of (for each class): (the penalty to decide to take action alpha when the class of x is c) * (the probability of class x being c)

What is a zero-one loss function?

lambda(alpha_i,c_j) = 0 if i=j otherwise 1 in other words: 0 if we chose the right class, and 1 otherwise

# True / False? MAP classifier is Bayes decision rule under zero-one loss function

True

What is a decision rule?

Decision rule is a function (x) which for every x specifies action out of {alpha_1,alpha_2,...,alpha_k}

Write the equation of average risk for alpha(x)

R(alpha) = integral(R(alpha(x)|x)p(x)) dx ## Footnote note that to make this as small as possible (out goal) we should minimize R(alpha(x)|x)

How to make the average risk as small as possible

from the equation of the average risk we can notice that to minimize it we should minimize R(alpha(x)|x). | R(alpha) = integral(R(alpha(x)|x)p(x)) dx

# continue the sentence at observation x choose class iff

g_i(x) > g_j(x) for all j != i

What are the 3 decision rules?

1.ML decision rule: gi(x) = P(x|ci) 2.MAP decision rule: gi(x) = P(ci|x) 3.Bayes decision rule: gi(x) = -R(ci|x)

Lecture 1 Flashcards

Definitions and road map in classification and Bayesian Decision Theory (33 cards)