Lecture 3 - Machine Learning Flashcards

1
Q

What is Supervised learning

A
  • Machine learns by using LABELLED data
  • Uses regressions and classification
  • Maps labelled input to known output.
  • Linear regression,. logistic regresison, KNN
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is unsupervised learning.

A

-Machine is trained using unlabeled data without guidance.
- uses association and clustering
- Understand patterns and discover output.
- K-means, C-means, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is reinforcement learning

A
  • An agent interacts with its environment by producting actions and discovers errors and rewards.
  • Uses reward based.
  • No pre-defined data
  • Follow trail and error meothod - Q-learning, etc.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the sum rule in probability

A

P(A ∪ B) = P(A) + P(B) - P(A ∩ B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is posterior/conditional probability P(A|B)

A

 Probability of an event given that you know that B is true (B = some evidence)
 P(A|B) = 0.8 P(rain today| cloudy) = 0.8
 i.e. your belief about A given that you know B

P(A|B)=P(A ∩ B)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Bayes Theorem?

A

P(A|B) = (P(B|A)*P(A))/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Bayes Reasoning Formula

A

= Argmax P(Hi|E) = argmax P(Hi) *P(E|Hi) / P(E)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are smooth Probabilities?

A

what if we have a P(wi|cj) = 0…?
 ex. the word “dumbo” never appeared in the class SPAM?
 then P(“dumbo”| SPAM) = 0
 so if a text contains the word “dumbo”, the class SPAM is
completely ruled out !
 to solve this: we assume that every word always appears at
least once (or a smaller value, like 0.5)
 ex: add-1 smoothing:

P(wi | cj ) = (frequency of wi in cj ) + 1 / totalnumber of words in cj size of vocabulary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

instead of multiplying the probabilities, we can use log to…?

A

 if we really do the product of probabilities…

argmaxcj P(cj) ∏ P(wi|cj)
 we soon have numerical underflow…
 ex: 0.01 x 0.02 x 0.05 x …
 so instead, we add the log of the probs

argmaxcj log(P(cj)) + Σ log(P(wi|c))
 ex: log(0.01) + log(0.02) + log(0.05) + …

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Benefits and problems with Naive Bayes

A

 Makes a strong assumption of conditional independence
 that is often incorrect
 ex: the word ambulance is not conditionally independent of the
word accident given the class SPORTS
 BUT:
 surprisingly very effective on real-world tasks
 basis of many spam filters
 fast, simple
 gives confidence in its class predictions (i.e., the scores)
 Fast, easy to apply
 often used as a baseline algorithm before trying other methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How to split machine learning data ?

A

 Split data set into 3 sub-sets
1. Actual training set (~80%)
2. Validation set (~20%)
3. Test set ~20%

1 and 2 are 80%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the steps to training a model?

A
  1. Collect a large set of examples (all with correct classifications)
  2. Divide collection into training, validation and test set
    Loop:
  3. Apply learning algorithm to training set to learn the parameters
  4. Measure performance with the validation set, and adjust hyperparameters* to improve performance
  5. Measure performance with the test set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Parameters vs hyper-parameters

A

Parameters:
basic values learned by the ML
model. eg.
* for NB: prior & conditional
probabilities
* for DTs: features to split
* for ANNs: weights

Hyper-parameters: parameters
used to set up the ML model. eg.
* for NB: value of delta for
smoothing,
* for DTs: pruning level
* for ANNs: nb of hidden layers, nb
of nodes per layer…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the metrics in machine learning?

A

Accuracy
 % of instances of the test set the algorithm correctly
classifies
 when all classes are equally important and represented
Recall, Precision & F-measure
 when one class is more important than the others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why can Accuracy be misleading?

A

problem:
 when one class C is more important than the others
 eg. when data set is unbalanced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the equation for precision?

A

TP/(TP+FP)

17
Q

What is the equation for recall?

A

TP/TP+FN

18
Q

What is the equation for accuracy?

A

(TP+TN)/(TP+TN+FP+FN)

19
Q

What is recall?

A

How many % of instances of C (are actually C) were found correctly?

20
Q

WHat is precision?

A

Of the detected instances of C, how many % were correct?

21
Q
A