Lecture 3 Flashcards

1
Q

Machine Learning Basics

A
  • Identify some features (by hand or automatic)
  • Feed system with training data
    o System figures out which features
    are useful
  • Supervised learning: we have labels
    o E.g. image from healthy/unhealthy
    patient
    o For example Multi-class problems and trying to find linear boundaries between
    classes
  • Unsupervised learning: we have no labels
    o Aims to find structure in data (e.g. K-means)
  • Semi-supervised learning: we have partially unlabelled data
    o Labelled data is very expensive
    o Unlabelled data is easy to obtain
    o How can we improve decision rules by means of unlabelled data?
  • Classification: no continuous target variable
    o E.g. is the person in the image male or female
    o Aims to label data
  • Regression: continuous variables
    o E.g. how old is the person in the image?
    o Aims to learn a function
  • Discriminative model: generates no new examples
  • Generative model: generates new examples
    o E.g. ChatGPT
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parametric Models

A
  • Result of the supervised learning procedure is a function that
    predicts the label y from a given input x
  • Model typically has some parameters W
    o Number of parameters defines capacity
    o Balance between too few (under-fitting) and too many (over-fitting)
    o Compromise between:
    ▪ Best fit to training data
    ▪ Best generalisation for future unseen data
    o These need to be initialised, but must not be the same value -> all parameters would
    be updated the same after calculating the gradient
    ▪ Best to use random initialisation with normal distribution
    ▪ Variance of the output of a neuron increases with the number of inputs ->
    can normalise this
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Training Supervised Models

A
  • Typically given N training samples
    o Each sample is a feature vector x with a label
    y
    o Goal is to learn a model (function) f(x,W) = y
    ▪ Needs to be good at predicting the
    right label for unseen data
  • Loss function: used to steer optimisation of parameters
    o Should be differentiable
    o Often use cross-entropy
    o Input: predicted label & true label
  • Optimiser: uses the loss function and updates parameters to reduce the total loss
    o I.e. we want to find a global minimum of the loss function, e.g. using gradient
    descent
    ▪ Batch: use entire training set to calculate gradient steps
    ▪ Stochastic: use single samples to calculate gradient steps
    ▪ Mini batch: use subset of training data with more than 1 sample to calculate
    gradient steps. A set of n mini-batches is called one epoch
  • Testing: present a set of test data that is unseen by the model, see how many labels are
    predicted correctly
  • Forward pass: feeding data into model and making a prediction
  • Backward pass: using prediction to optimise parameters
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data in Supervised Learning

A
  • Data is represented as a point cloud in a high-dimensional vector space
  • Labelled dataset is split into training data and test data
    o Test set to report performance of the system
    o Same image must not occur in both sets
    o Specifically to medical imaging the same patient may not occur in both sets (even if
    the images are different)
  • Problem: we tune parameters on the test set
    o Thus the test set is not independent of system development
  • Solution: we make a new split: add validation set
    o Evaluate on validation set
    o Only after finishing training use the test set
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

K-fold Cross Validation:

A
  • Divide labelled dataset into k subsets
  • Use one subset as validation set
  • Use one subset as test set
  • Use the remaining sets as training data
  • Repeat K times for all possible combinations
  • Also here: patient may not be in different sets
  • We end up with K trained systems (as each is trained on a
    different training set)
    o Can combine into ensemble system
    o Or use Cross-fold validation to choose parameters and then retrain on the full dataset
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Neural Networks

A
  • Consist of an input layer, a number of hidden layers and an
    output layer
  • In each neuron, each input from the previous layer has its
    own weight
    o A neuron also has an additional bias term
  • Typically we use matrix notation
    o Input layer becomes x (4x1)
    o Hidden layer becomes W (2x4)
    o Biases become b (2x1)
    o Output becomes Wx + b
  • Before output is passed to the next layer, we apply a nonlinear activation function
    o Must be non-linear so that we can develop complex representations that are not
    possible with linear regression models
  • Output layer: One neuron for each possible label in a n-class classification problem
    o Special activation function modelled to question
    ▪ E.g. multi-class classification uses one-hot vectors as output
  • Number of parameters: each neuron has a weight for each of its inputs plus one bias term
    o More hidden neurons leads to over-fitting
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Backpropagation

A
  • Recursively apply the chain rule to compute gradients of
    expressions
  • In blue: forward pass
    o We see that we always perform an
    operation on two variables at a time
  • In purple: backward pass
  • Makes use of a learning rate -> how fast we
    update parameter values
    ▪ Usually pick the initial one,
    and implement a strategy to decrease it over number of epochs (e.g. linear
    decrease)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly