Q Flashcards
(34 cards)
How do you derivate a0 and a1 for a linear regression model that is least squares optimal?
1.Identify the sum of squared errors equation for the model
2.Take partial derivatives with respect to a0 and a1
3. Set equal to zero, and solve simultaneous equations
What does the R^2 statistic indicate?
If it is close to 1, the model is able to capture variation in the data well
What is meant by regularization?
Regularization is the application of constraints to the amplitude of estimated model parameters in order to simplify the model
What is regularizations relevance to modelling?
it reduces the model variation and helps with predictor selection
also reduced numerical issues when constructing the regressor matrices
What does LASSO stand for?
Least Absolute Shrinkage and Selection Operator
What does Lasso do to the cost function J?
Introduces a term into cost function J which is linear with absolute values of model coefficient, through tuning parameter Lambda
What are the parameters in regularization governed by?
the set of parameters is identified by the point at which contours of J and the additional parameter-dependent contours are tangential
What are the steps for applying Lasso through cross-validation?
1 - Divide n observations into K equal groups
2 - Specify a range of Lasso weights lambda (lambda 1… lambda m)
3 - let k=k+1 and lambda =lambda k, train K different models using each of the data groups and lasso, except 1
4 - Find average performance
5 - if K<M return to 3, otherwise find k* such that CVMSE is minimum
6 - Let lambda=lambda k* and apply lasso to n observations to determine final model parameter estimates.
What is the advantage of Lasso over L2 regularization?
-optimal set of parameters after regularization is tangent point of J cost contour with constraint contours
- The Lasso formulation makes this point null -> induces sparse solutions
- Some predictors can be excluded, reducing model size.
What is the optimal set of parameters after regularization?
the tangent point of J cost contour with constraint contours
What is meant by Odds?
The ratio between the probability of an event occurring with the probability the event won’t occur
What is the definition of logistic model?
The association of posterior probability y of the class membership to a logistic sigmoid function.
What is a support vector machine and how does it work?
-SVM is an extension of the maximal margin classifier
- classifier maximises the margins of support vectors
- SVM expands feature space through non-linear features and returns classification of the original data set.
How does an SVM work?
- works by defining a separation hyperplane in the data space through a small number of data points that are closest to the hyperplane(support vectors), and maximising them
What is the main disadvantage of a support vector classifier?
- Hard classifier - separated classes without returning any info on the level of confidence for each data point.
What is meant by ARX?
Auto Regressive model with Exogeneous inputs
What are ARX and ARMAX model?
linear models used in the analysis of time series data modelling of dynamical systems through a sample of its present and past inputs/outputs.
What is mean by ARMAX?
Auto Regressive model with Moving Average Exogeneous inputs.
What is a confusion matrix?
table to compare the rate of the true and predicted values of a classifier model
What is mean by ROC and how is it constructed?
- Receiver Operating Characteristic
- A diagram formed by plotting the sensitivity against 1-specificity, when the classifier threshold value changes from T=0 to T=1.
- each point on the curve defines the value of TPR and FPR obtained for a certain threshold of the classifier
-can assess accuracy over all thresholds
What is the AUC and what is it used for?
Area under Curve
quantifies performance of a classifier
perfect classifier has AUC of 1
What elements of the data should be considered before building a black box model?
- Model will represent the data using basis functions
- Provides the best representation, but the data itself must be of good quality
- Density of the data shapes the relative weights of the basis function components
- Low density areas will be poorly represented and overfit
How does cross validation work simply?
Rotate training and testing samples through the data
test on one part of the data and train on the rest
What are the two most important factors to employ to prevent bias in AI models?
Pre-processing of data
Explainability of method