ML Flashcards
(99 cards)
Unsupervised Learning and how is the model trained
Only input data provided and models learn to extract patterns from the data
Supervised Learning and how is the model trained
Each input paired with an output (target) and model trained to minimise the error
Regression
Finding the relationship between two variables where the model is linear
Classification
Category labels e.g. dog or bagel
Underfitting
A model that is too simple which does not fit the data
Overfitting
A model that fits minor variations or noise
Model Selection and how does it work
Selecting correct model
Split dataset for training and validation (and testing)
Training dataset
Used to train/optimise the model
Validation dataset
Used for validating the model
Test dataset
Used to test model for general fitting quality
Cross-validation
Split data into S groups so (S-1)/S data used for training
No free lunch theorem
All models are wrong, but some are useful
Model parameters
Values learned from training data
Parametric model
of parameters stay the same as quantity of data increases
Non-parametric model
of parameters increase/decrease as quantity of data increases
Likelihood function
Probability of data given model parameters
Maximum likelihood estimation
Method for estimating parameters of a probabilistic model
Linear regression model formula
p(y|x,w) = w^T x + e
What is the distribution of e in the linear regression model and what is the bias parameter in the linear regression model formula and what is it for
Gaussian distribution with mean 0: N(0, standard deviation squared)
Within the vector w by addition of dummy variable which always has value 1 and gives extra flexibility to fit the data 1
Linear regression to a feature vector
p(y|ϕ(x), w) = w^T ϕ(x) + e
Least-squares problem for linear regression formula
Theta = (X^T X)^-1 X^T y
Two key points about least-squares solutions
The solution has a closed-form
The solution is also the maximum likelihood solution
What does the Linear Discriminant compute (formula) and what class is x assigned to according to y
y=w^T x
y>=0 means C1
Otherwise C2
What assumptions are made for parameters to be learnt by applying MLE?
- Data for each class have a Gaussian distribution
- These two Gaussian distributions have the same covariance matrix