Q Flashcards by Bethan J O'Malley

How do you derivate a0 and a1 for a linear regression model that is least squares optimal?

1.Identify the sum of squared errors equation for the model
2.Take partial derivatives with respect to a0 and a1
3. Set equal to zero, and solve simultaneous equations

How well did you know this?

Not at all

Perfectly

What does the R^2 statistic indicate?

If it is close to 1, the model is able to capture variation in the data well

How well did you know this?

Not at all

Perfectly

What is meant by regularization?

Regularization is the application of constraints to the amplitude of estimated model parameters in order to simplify the model

How well did you know this?

Not at all

Perfectly

What is regularizations relevance to modelling?

it reduces the model variation and helps with predictor selection
also reduced numerical issues when constructing the regressor matrices

How well did you know this?

Not at all

Perfectly

What does LASSO stand for?

Least Absolute Shrinkage and Selection Operator

How well did you know this?

Not at all

Perfectly

What does Lasso do to the cost function J?

Introduces a term into cost function J which is linear with absolute values of model coefficient, through tuning parameter Lambda

How well did you know this?

Not at all

Perfectly

What are the parameters in regularization governed by?

the set of parameters is identified by the point at which contours of J and the additional parameter-dependent contours are tangential

How well did you know this?

Not at all

Perfectly

What are the steps for applying Lasso through cross-validation?

1 - Divide n observations into K equal groups
2 - Specify a range of Lasso weights lambda (lambda 1… lambda m)
3 - let k=k+1 and lambda =lambda k, train K different models using each of the data groups and lasso, except 1
4 - Find average performance
5 - if K<M return to 3, otherwise find k* such that CVMSE is minimum
6 - Let lambda=lambda k* and apply lasso to n observations to determine final model parameter estimates.

How well did you know this?

Not at all

Perfectly

What is the advantage of Lasso over L2 regularization?

-optimal set of parameters after regularization is tangent point of J cost contour with constraint contours
- The Lasso formulation makes this point null -> induces sparse solutions
- Some predictors can be excluded, reducing model size.

How well did you know this?

Not at all

Perfectly

What is the optimal set of parameters after regularization?

the tangent point of J cost contour with constraint contours

How well did you know this?

Not at all

Perfectly

What is meant by Odds?

The ratio between the probability of an event occurring with the probability the event won’t occur

How well did you know this?

Not at all

Perfectly

What is the definition of logistic model?

The association of posterior probability y of the class membership to a logistic sigmoid function.

How well did you know this?

Not at all

Perfectly

What is a support vector machine and how does it work?

-SVM is an extension of the maximal margin classifier
- classifier maximises the margins of support vectors
- SVM expands feature space through non-linear features and returns classification of the original data set.

How well did you know this?

Not at all

Perfectly

How does an SVM work?

works by defining a separation hyperplane in the data space through a small number of data points that are closest to the hyperplane(support vectors), and maximising them

How well did you know this?

Not at all

Perfectly

What is the main disadvantage of a support vector classifier?

Hard classifier - separated classes without returning any info on the level of confidence for each data point.

How well did you know this?

Not at all

Perfectly

What is meant by ARX?

Study These Flashcards

Auto Regressive model with Exogeneous inputs

What are ARX and ARMAX model?

Study These Flashcards

linear models used in the analysis of time series data modelling of dynamical systems through a sample of its present and past inputs/outputs.

What is mean by ARMAX?

Study These Flashcards

Auto Regressive model with Moving Average Exogeneous inputs.

What is a confusion matrix?

Study These Flashcards

table to compare the rate of the true and predicted values of a classifier model

What is mean by ROC and how is it constructed?

Study These Flashcards

Receiver Operating Characteristic
A diagram formed by plotting the sensitivity against 1-specificity, when the classifier threshold value changes from T=0 to T=1.
each point on the curve defines the value of TPR and FPR obtained for a certain threshold of the classifier
-can assess accuracy over all thresholds

What is the AUC and what is it used for?

Study These Flashcards

Area under Curve
quantifies performance of a classifier
perfect classifier has AUC of 1

What elements of the data should be considered before building a black box model?

Study These Flashcards

Model will represent the data using basis functions
Provides the best representation, but the data itself must be of good quality
Density of the data shapes the relative weights of the basis function components
Low density areas will be poorly represented and overfit

How does cross validation work simply?

Study These Flashcards

Rotate training and testing samples through the data
test on one part of the data and train on the rest

What are the two most important factors to employ to prevent bias in AI models?

Study These Flashcards

Pre-processing of data
Explainability of method

What does PCA application do to a data set?

PCA projects and reconstructs feature set into a frame set that maximises variability in each axis

What is the purpose of PCA?

Provides a measure of importance of each input and allows the reduction of the dimensionality of a large feature space

What is the geometrical relationship between principal components?

Each PC is orthogonal to the next

What do the eigenvectors of a data set represent?

Principle components

What do the eigenvalues of a data set represent?

Variance contained in that eigenvector direction

How do you identify the first principle component of a data set?

The largest eigenvalue points to the eigenvector containing the first PC

How would you incorporate non-linearity into the decision boundary of a two-input logistic classifier?

Rewrite the basis function and change the number of weights required.

what is meant by inner product?

dot product

How do you derive the least squares solution?

(z - Xa)T (z - Xa) and expand this like a standard equation gradient with respect to m-vector nabla (ZTZ - 2aT XT z + aT XT Xa) = 0 - 2 XT a + 2 XT Xa equate to zero and rearrange a = (XTX)^-1 XTz

You introduce quadratic non-linearity to a model, what are the new column dimensions of the design matrix?

(d+n)! / d! n!, where n=2 and d=3

Q Flashcards

(34 cards)