Data Mining - Lecture Naive Bayes Flashcards

1
Q

What is classification?

A

Training a model with data, so that that model can classify new records into pre-defined classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is clustering?

A

Training a model to cluster records together. There are no pre-defined classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is argmax P in Naive Bayes?

A

It means that you can compute the probability of a class given a specific set of independent variables for every class there is.

You pick the class that has the highest probability, hence argamax.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the Naive Conditional Independence?

A

Assume that all features are independent given the class label y.

This means that you can compute the probability of each (Yes given Sunny etc.) individually and multiply them all together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you calculate P(Play Tennis = yes)

A

You look at the amount of outcomes that are yes and you divide that by the total amount of outcomes.

This extends to other classes of course.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do you calculate P (Play tennis = yes | outlook = sunny)?

A

You check how many days were sunny that you could play tennis.

You divide that by the total amount of days that you played tennis

So you compute a conditional probability P(yes and sunny)/(yes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do you calculate the P that a record is classified in a class given 3 variables?

A

P(class) * (Pclass given x1) * (Pclass given x2) * (Pclass given x3)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why do we sometimes need to use Laplace Smoothing?

A

If the combination of an x variable and the desired outcome class did not happen yet, the probability will be 0. This will extend to the whole formula.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do we use Laplace Smoothing?

A

For every probability, you add a 1 to the numerator and the number of possible classes to the denominator.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the advantages of Naive Bayes?

A
  • Easy
  • Handles categorical variables well
  • Computational efficient
  • Good classification performance, espeically with many predictors
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the disadvantages of Naive Bayes?

A
  • Requires a large dataset to performel WELL

- Attributes are not always independent (which it assumes)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Student number?

A

2064381

How well did you know this?
1
Not at all
2
3
4
5
Perfectly