Week 5 intro to machine learning Flashcards

(21 cards)

1
Q

What is machine learning?

A

A set of methods to detect patterns in data and use those patterns to predict future data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

supervised learning

A

Program trained on a given set of examples with labels . It learns how to reach an accurate conclusion when given new data

(x1 , y1 ) , (x2 , y2) … (xn , yn)

eg:
(1, True) , (2,False) , (3,True) , (5, True) , (12 ,False) , (27, True)

algorithm comes up with pattern (decides whether number is odd in this case ) -
when given new data it predicts the label (learns to make accurate conclusion)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unsupervised learning

A

Program given unlabelled data and algorith uses patterns and relationship to group related data

ie we may pass in a bunch of images og either dogs or cats and algorithm used to group those that are related (dog images - group 1 cat images group 2) for example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

reinforcement learning

A

Program learns from the consequences of its actions and selects actions by exploting what went well previously while still having options to make new choices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

classification

A

A type of supervised learning

organise data into classes and when given new data it predicts the the class

class - possible category a data point can belong to (Same thing as a label)

ie for muffin and chiwawa problem classes :
[“muffin”, “chiwawa”]

returns a class when given new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Regression

A

A type of supervised learning
Fit functions to fata and determine values of new datapoints

my definition for understanding:
. finds relationship between input features and a numeric value
. We provide model with lots of examples of input fearues (ie size of house , distance from major city , no of rooms…) and a numeric value (price)

model learns a function so when you pass in a new set of input features it predicts the corresponding numeric output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Clustering

A

A type of unsupervised learning

separate data into groups and when given new data we determing which group it goes in

NO LABELS AKA UNSUPERVISED LEARNING

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dimensionality reduction

A

transform high dimensional (lots of features ) data to lower dimensional data while preserving desired properties

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a training set IN SUPERVISED LEARNING

A

A set of pairs of data and their labels that we give to the program for learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a test set in supervised learning

A

UNTOUCHED (unseen) portion of data that once we have trained our model we use to predict label . We then compare the actual vs predicted labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Precision

A

TP / TP + FP ( ie of how many that WE SAID have disease how many actually do)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Sensistivity

A

TP / TP + FN ( ie of how many that ACTUALLY HAVE disease how many actually do)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

F1 score (harmonic mean between sensitivity and precision)

A

2 / (1 / sensitivity) + (1 / precision)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

balanced test set

A

each class (in set of classes ) has equal representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Mean absolute error

A

average of absolute differences between predicted and actual value

look at ipad for notation

bad thing is ( it doesnt treat outliers harshly)

ie 2 + 3 + 4 / 3 = 9 / 3 = 3 is the same as 0 + 0 + 9 / 3 for the second one the first two points are bang on (predicted = actual value but for the last one (9) difference is huge we have a massive outlier) but MAE treats this the same as our earlier example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Mean squared error

A

average of the squares of absolute differences between predicted and actual value

Penalises outliers which is good but now :

disadvantage - if the original values (in somethign like cm) mean squared error value is in cm^2 - WE NOW HAVE DIFFERENT MEASURE

solution root mean squared error

17
Q

root mean squared error

A

root of mean squared error

No longer have different measure

18
Q

Generalisation error

A

We want to minimise the error on unseen data (generalisation error) . However we only deal with samples (dont have access to unseen data obviously to calculate generalisation error)

We therefore use empirrical error (calculated using available samples (training set)

We hope by minimising empirical error we are minimising generalisation error

19
Q

underfitting

A

model fails to capture complexity of of the training data

eg: model is linear when the true pattern is quadratic or degree 4

20
Q

overfitting

A

model is too accurate - IT FITS TOO MUCH OF TRAINING DATA AND FAILS TO GENERALISE UNSEEN DATA