Week 5 intro to machine learning Flashcards

Question 1

Q

What is machine learning?

Answer

A

A set of methods to detect patterns in data and use those patterns to predict future data

Question 2

Q

supervised learning

Answer

A

Program trained on a given set of examples with labels . It learns how to reach an accurate conclusion when given new data

(x1 , y1 ) , (x2 , y2) … (xn , yn)

eg:
(1, True) , (2,False) , (3,True) , (5, True) , (12 ,False) , (27, True)

algorithm comes up with pattern (decides whether number is odd in this case ) -
when given new data it predicts the label (learns to make accurate conclusion)

Question 3

Q

Unsupervised learning

Answer

A

Program given unlabelled data and algorith uses patterns and relationship to group related data

ie we may pass in a bunch of images og either dogs or cats and algorithm used to group those that are related (dog images - group 1 cat images group 2) for example

Question 4

Q

reinforcement learning

Answer

A

Program learns from the consequences of its actions and selects actions by exploting what went well previously while still having options to make new choices

Question 5

Q

classification

Answer

A

A type of supervised learning

organise data into classes and when given new data it predicts the the class

class - possible category a data point can belong to (Same thing as a label)

ie for muffin and chiwawa problem classes :
[“muffin”, “chiwawa”]

returns a class when given new data

Question 6

Q

Regression

Answer

A

A type of supervised learning
Fit functions to fata and determine values of new datapoints

my definition for understanding:
. finds relationship between input features and a numeric value
. We provide model with lots of examples of input fearues (ie size of house , distance from major city , no of rooms…) and a numeric value (price)

model learns a function so when you pass in a new set of input features it predicts the corresponding numeric output

Question 7

Q

Clustering

Answer

A

A type of unsupervised learning

separate data into groups and when given new data we determing which group it goes in

NO LABELS AKA UNSUPERVISED LEARNING

Question 8

Q

Dimensionality reduction

Answer

A

transform high dimensional (lots of features ) data to lower dimensional data while preserving desired properties

Question 9

Q

What is a training set IN SUPERVISED LEARNING

Answer

A

A set of pairs of data and their labels that we give to the program for learning

Question 10

Q

What is a test set in supervised learning

Answer

A

UNTOUCHED (unseen) portion of data that once we have trained our model we use to predict label . We then compare the actual vs predicted labels

Question 11

Q

Precision

Answer

A

TP / TP + FP ( ie of how many that WE SAID have disease how many actually do)

Question 12

Q

Sensistivity

Answer

A

TP / TP + FN ( ie of how many that ACTUALLY HAVE disease how many actually do)

Question 13

Q

F1 score (harmonic mean between sensitivity and precision)

Answer

A

2 / (1 / sensitivity) + (1 / precision)

Question 14

Q

balanced test set

Answer

A

each class (in set of classes ) has equal representation

Question 15

Q

Mean absolute error

Answer

A

average of absolute differences between predicted and actual value

look at ipad for notation

bad thing is ( it doesnt treat outliers harshly)

ie 2 + 3 + 4 / 3 = 9 / 3 = 3 is the same as 0 + 0 + 9 / 3 for the second one the first two points are bang on (predicted = actual value but for the last one (9) difference is huge we have a massive outlier) but MAE treats this the same as our earlier example

Question 16

Q

Mean squared error

Answer

Study These Flashcards

A

average of the squares of absolute differences between predicted and actual value

Penalises outliers which is good but now :

disadvantage - if the original values (in somethign like cm) mean squared error value is in cm^2 - WE NOW HAVE DIFFERENT MEASURE

solution root mean squared error

Question 17

Q

root mean squared error

Answer

Study These Flashcards

A

root of mean squared error

No longer have different measure

Question 18

Q

Generalisation error

Answer

Study These Flashcards

A

We want to minimise the error on unseen data (generalisation error) . However we only deal with samples (dont have access to unseen data obviously to calculate generalisation error)

We therefore use empirrical error (calculated using available samples (training set)

We hope by minimising empirical error we are minimising generalisation error