Q Flashcards

1
Q

How do you derivate a0 and a1 for a linear regression model that is least squares optimal?

A

1.Identify the sum of squared errors equation for the model
2.Take partial derivatives with respect to a0 and a1
3. Set equal to zero, and solve simultaneous equations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the R^2 statistic indicate?

A

If it is close to 1, the model is able to capture variation in the data well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by regularization?

A

Regularization is the application of constraints to the amplitude of estimated model parameters in order to simplify the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is regularizations relevance to modelling?

A

it reduces the model variation and helps with predictor selection
also reduced numerical issues when constructing the regressor matrices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does LASSO stand for?

A

Least Absolute Shrinkage and Selection Operator

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Lasso do to the cost function J?

A

Introduces a term into cost function J which is linear with absolute values of model coefficient, through tuning parameter Lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the parameters in regularization governed by?

A

the set of parameters is identified by the point at which contours of J and the additional parameter-dependent contours are tangential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the steps for applying Lasso through cross-validation?

A

1 - Divide n observations into K equal groups
2 - Specify a range of Lasso weights lambda (lambda 1… lambda m)
3 - let k=k+1 and lambda =lambda k, train K different models using each of the data groups and lasso, except 1
4 - Find average performance
5 - if K<M return to 3, otherwise find k* such that CVMSE is minimum
6 - Let lambda=lambda k* and apply lasso to n observations to determine final model parameter estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the advantage of Lasso over L2 regularization?

A

-optimal set of parameters after regularization is tangent point of J cost contour with constraint contours
- The Lasso formulation makes this point null -> induces sparse solutions
- Some predictors can be excluded, reducing model size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the optimal set of parameters after regularization?

A

the tangent point of J cost contour with constraint contours

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is meant by Odds?

A

The ratio between the probability of an event occurring with the probability the event won’t occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the definition of logistic model?

A

The association of posterior probability y of the class membership to a logistic sigmoid function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a support vector machine and how does it work?

A

-SVM is an extension of the maximal margin classifier
- classifier maximises the margins of support vectors
- SVM expands feature space through non-linear features and returns classification of the original data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How does an SVM work?

A
  • works by defining a separation hyperplane in the data space through a small number of data points that are closest to the hyperplane(support vectors), and maximising them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the main disadvantage of a support vector classifier?

A
  • Hard classifier - separated classes without returning any info on the level of confidence for each data point.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is meant by ARX?

A

Auto Regressive model with Exogeneous inputs

17
Q

What are ARX and ARMAX model?

A

linear models used in the analysis of time series data modelling of dynamical systems through a sample of its present and past inputs/outputs.

18
Q

What is mean by ARMAX?

A

Auto Regressive model with Moving Average Exogeneous inputs.

19
Q

What is a confusion matrix?

A

table to compare the rate of the true and predicted values of a classifier model

20
Q

What is mean by ROC and how is it constructed?

A
  • Receiver Operating Characteristic
  • A diagram formed by plotting the sensitivity against 1-specificity, when the classifier threshold value changes from T=0 to T=1.
  • each point on the curve defines the value of TPR and FPR obtained for a certain threshold of the classifier
    -can assess accuracy over all thresholds
21
Q

What is the AUC and what is it used for?

A

Area under Curve
quantifies performance of a classifier
perfect classifier has AUC of 1

22
Q

What elements of the data should be considered before building a black box model?

A
  • Model will represent the data using basis functions
  • Provides the best representation, but the data itself must be of good quality
  • Density of the data shapes the relative weights of the basis function components
  • Low density areas will be poorly represented and overfit
23
Q

How does cross validation work simply?

A

Rotate training and testing samples through the data
test on one part of the data and train on the rest

24
Q

What are the two most important factors to employ to prevent bias in AI models?

A

Pre-processing of data
Explainability of method

25
Q

What does PCA application do to a data set?

A

PCA projects and reconstructs feature set into a frame set that maximises variability in each axis

26
Q

What is the purpose of PCA?

A

Provides a measure of importance of each input and allows the reduction of the dimensionality of a large feature space

27
Q

What is the geometrical relationship between principal components?

A

Each PC is orthogonal to the next

28
Q

What do the eigenvectors of a data set represent?

A

Principle components

29
Q

What do the eigenvalues of a data set represent?

A

Variance contained in that eigenvector direction

30
Q

How do you identify the first principle component of a data set?

A

The largest eigenvalue points to the eigenvector containing the first PC

31
Q

How would you incorporate non-linearity into the decision boundary of a two-input logistic classifier?

A

Rewrite the basis function and change the number of weights required.

32
Q

what is meant by inner product?

A

dot product

33
Q

How do you derive the least squares solution?

A

(z - Xa)T (z - Xa) and expand this like a standard equation
gradient with respect to m-vector
nabla (ZTZ - 2aT XT z + aT XT Xa)
= 0 - 2 XT a + 2 XT Xa
equate to zero and rearrange
a = (XTX)^-1 XTz

34
Q

You introduce quadratic non-linearity to a model, what are the new column dimensions of the design matrix?

A

(d+n)! / d! n!, where n=2 and d=3