Lecture 1 Flashcards

Definitions and road map in classification and Bayesian Decision Theory (33 cards)

1
Q

what is the definition of a classifier?

A

it is a function or an algorithm that maps every possible input (from a legal set of inputs) to a finite set of decisions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are discriminating features?

A

they are features that should help the system to discriminate between inputs from different classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is Classification error (total error)?

A

the ratio between the number of test objects and the objects that were classified incorrectly

for example, if there are 50 test objects, and 17 of them were classified incorrectly, the classification error is 17/50

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

True / False ?

It is better to use one discriminating feature, because it is simpler

A

False!!
usually one feature is poor alone, and it results in relatively high total error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is overfitting the data?

A

using complicated boundaries that are too “tuned” to the particular
training data

lesson learned: avoid using too complicated boundaries

example of overfitting: using a boundary that match EVERY datum of the training data, even if it is an outlier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

True / False?

overfitting don’t generalize well

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

True / False?

If decision boundary 1 perform better on the training data than decision boundary 2, it will likely perform better on new data too

A

False!

One of the examples is overfitting; overfitting decision boundary perform ideally on the training data but usually perform poorer on testing data than a simpler decision boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True / False ?

A lot is known easier
A little is known harder

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

True / False?

training data is needed when the probability distribution of the categories is known

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What do we need to do when the shape of probability distribution is known?

A

we need to estimate parameters of probability
distribution from the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What should we do when no probability distribution (no shape or
parameters are known) but the shape of discriminant functions is
known

A

Need to estimate parameters of the
discriminant function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which is considered “easier”?
1.The shape of the probability distribution is known
2.The shape of the discrimination function is known

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What to do when neither probability distribution nor discriminant function is known (all we have is labled data)?

A

Estimate the probability distribution from the labeled data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What to do when NOTHING is known (even the data is not labled!)?

A
  1. Estimate the probability distribution from the unlabeled data
  2. Cluster the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

To which type of problems we can design optimal classifier?

A

Know probability distribution of the categories (Also called Bayesian Decision theory)

But it is rare in the real world

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How proir probabilities are determined and what they reflect?

A

They reflect our prior knowledge, and they are determined accordingly

17
Q

What event we choose when using ML classifier

A

The event which maximize the likelihood function; the event that is most likely to happen

18
Q

Is likelihood function:
1. A density
2. A probability distribution
3. Non of the above

19
Q

True or False?

Posterior is proportional to likelihood * prior

A

True

according to bayes, without the normalizing factor, cause it doesn’t depend on the class

20
Q

True or False?

MAP classifier minimizes the probability of error

21
Q

What is loss function?

A

loss functions lamda(alpha_i,c_j) describes the loss occurred when taking action alpha_i when the true
class is c_j

22
Q

True or False?

Action is just another word for class

A

False

For example, in some cases we may want to refuse to make a decision, so the number of actions is larger than the number of classes

23
Q

Why we want to have a loss function?

A

Usually some mistakes are more costly than others

For example: classifying a benign tumor as
cancer is not as bad as classifying cancer
as benign tumor

24
Q

What is conditional risk?

A

It is the expected loss associated with taking
action

It is a function of an action alpha and data x

sum of (for each class): (the penalty to decide to take action alpha when the class of x is c) * (the probability of class x being c)

25
26
What is a zero-one loss function?
lambda(alpha_i,c_j) = 0 if i=j otherwise 1 in other words: 0 if we chose the right class, and 1 otherwise
27
# True / False? MAP classifier is Bayes decision rule under zero-one loss function
True
28
What is a decision rule?
Decision rule is a function (x) which for every x specifies action out of {alpha_1,alpha_2,...,alpha_k}
29
Write the equation of average risk for alpha(x)
R(alpha) = integral(R(alpha(x)|x)p(x)) dx ## Footnote note that to make this as small as possible (out goal) we should minimize R(alpha(x)|x)
30
How to make the average risk as small as possible
from the equation of the average risk we can notice that to minimize it we should minimize R(alpha(x)|x). | R(alpha) = integral(R(alpha(x)|x)p(x)) dx
31
# continue the sentence at observation x choose class iff
g_i(x) > g_j(x) for all j != i
32
What are the 3 decision rules?
1.ML decision rule: gi(x) = P(x|ci) 2.MAP decision rule: gi(x) = P(ci|x) 3.Bayes decision rule: gi(x) = -R(ci|x)
33