Quant Flashcards

1
Q

What is linear regression?

A

Finding the relationship between 2 variables for predictive analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the SSE, SSR and SST

A

On a slope, one must determine the error between the line of best fit and the data points. These 3 varibles quantify that

SSR is the pRedicted deviation - its is the difference between the line of best fit and the mean of the data set

SSE is the ERROR deviation and is the difference between the line of best fit and the data point

SST is the sum of SSE and SSR - it shows the total deviation from the mean to the data point

Remember these are all SQUARED

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the formula for r squared

A

R squared = SSR / SST

It shows how well explained/predictive the model is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is a high or low r squared meaning the relationship is greater?

A

High r squared means HIGH relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

R squred, what are the highest and lowest numbers it could be

A

It is between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the degrees of freedom

A

Degrees of freedom are the number of variables you have in the model minus how many variables you have minus 1.

You want the degrees of freedom high to have a good mdoel

DF = n-k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

As Degrees of freedom increases, R squared ________? and why

A

As Degrees of freedom increases, R squared decreases.

Think if you only had 2 data points, the r^2 (relationship) would be 1. Putting in more variables would DECREASE r^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Formula for Y / relationship between x and Y

A

Y = β0 + β1 x + error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When comparing y = beta 0 + beta 1 * x + e, which is the independant and dependant variable?

A

Y is dependant, x is independant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If the confidence interval rises (from 90- 99%) does the probability of rejecting the null hypothesis go up or down? WHy

A

The probabily will go….. down. The confidence interval will get wider (to ensure we are more confient we have the right number).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a t statistic? What is the formula?

A

A t statisitc test is checking whether a hypothesised number could be the actual statistic/value of a score based on a t score, standard error, and the score we know to be true.

So it is the score we know it true + and - the t score * standard error.

The t score is found using the degrees of freedom minus 2. Get the score from the t table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the stnadard error

A

SD / square root n

OR

Epsilon (which is Y -β1 - β2) < the formula for Y in reverse.

(Epsilon squared / n-2) ^.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to find SSR

A

It is the line of best fit - mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to find SSE

A

Value - line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an f test?

A

It compares 2 data sets to check if they’re statistically consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Confidence interval formula

A

= mean +- t or z score * standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the z scores for 90,95 and 99% ?

A
  1. 64
  2. 96
  3. 68
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Coefficient of determination is

A

r^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is correlation squared?

A

r^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

in the formula y = Y = β0 + β1 x + error , What is B0

A

β0 is the y intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Confidence interval explanation and formula

A

Mean + - t or z stat * standard error.

Check if the OTHER mean (be it the actual or standard mean) is within those boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the p value,

A

The pathetic value, we want that low to reject the null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some key assumptions to simple linear regression

A

the relationship between x and y is linear
x is uncorrelated with the error terms
Sum of residuals = 0
there is a constant variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Formula for standard deviation with Standard error

A

Square root of Standard error / n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Is variance the same as SST?

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Formula for DOF

A

DOF = k+ (n-k-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

MSR (mean squared regression) and MSE (mean squared Error) formulas

A
MSR = SSR / k
MSE = SSE / n-k-1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is MSR / MSE

A

F stat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Formula for standard error in regression

A

square root sse / n - k - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Correlation formula, then R squared formula

A

Cor = Cov / omega omega

R^2 = cor^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

F stat formula, what is means, and how to interperet it

A

F stat is testing if there is even a relationship between the y and x variables

It is MSR/MSE

Over 1 means that there is a relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Calcualte MSR and MSE

A
MSR = SSR / n-k-1
MSE = SSE/k

MSE/MSR = F

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What does adjusted r^2 do

A

It adjusts the r^2 so that increasing the dof does NOT increase the r^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Downfall of R^2?

A

It is not bound by 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a dummy variable? And how to incorporate into formula?

A

Introducing a QUALatative variable. You give it a value of 1, and every alternative a value of 0. If it is months of the year, and you want only results collected in Jan, Jan has a value of 1, and the rest (minus one month) have a value of 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is heteroskatacity?

A

It is unequal variances. Pretty much that there is a relationship between the standard error and the variable’s variance. You don’t want that

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the assumptions of multiple regression

A
There is a linear relationship
The independant variables are NOT random
Error = 0
Variance is constant
Errors are not correlated
Error is normally distributed
38
Q

How does Heteroskatacity effect the standard error, and what does this mean?

A

It makes the standard error lower (because it can be more easily predicted etc. from variable value) meaning that it is HARDER to reject a null.

39
Q

How to reject heteroskatcity?

A

Broysche Pagan Test

40
Q

What is serial correlation?

A

Than an independant varialbe is correlated with itself, so it is more predictable and therefore variance is lowered. So if a stock goes up one day, it is more likley to go up the next day. That is not constant variance

41
Q

What will serial correlation do to the t stat

A

Increase it meaning you wont be able to reject the null

42
Q

What is multicollinarity

A

Multicolunarity means that two independent variables are closely correlated

43
Q

What will multicolliarity do to the t stat and standard error

A

increase standard error and reduce t stat

44
Q

How do you resolve multicolliarity?

A

Remove a variable

45
Q

The null hypothesis is the….

A

not true hypothesis

46
Q

What is autoregression?

A

Variable yesterday explains a variable today

47
Q

Formula for an autoregression equation

A

x = b0 + b1(X-1n) +E

48
Q

How do you detect if error terms are correlated?

A

Durbin Watson Test - you cant use this data if the error terms are correlated

49
Q

I have x1, how do i get x2 using autoregression

A

x2 = b0+b1*X1

X1 is the same as x-1 from x2

50
Q

Autoregressive correlation. How do you test for this, and what does the test mean?

A

Normal t test for this one. Find the autocorrelation / Standard error. Compare against t value.

If it is NOT REJECTED, the data is all okay

51
Q

You do a t test on the serial correlation on some time series data and find out that the null is rejected, meaning that the t stat is outside the t value, what does this mean?

A

Rejected null means reject that data, it is autocorrelated and not good

52
Q

Mean regression line, what is the formula for this?

A

B0 / 1-b1

THis is what the data points should revert to

53
Q

How do you work out which autoregression line you should use? e.g. data from 2 years ago or 3 years ago.

A

You use Root Mean Squared Error. Pretty much the Square root of MSE of both series - the smallest means you use that data set

54
Q

What is the mean reversion from a random walk and why

A

There is none! It is B0 / 1-b1

B1 is always 1, so 0/0 = 0

55
Q

Formula for a random walk and what it means

A

x = x-1 + random error term.

It is the best guess of the value beyond that of the one in the past. x-1 + a random variable

56
Q

Multicollinarity, Heteroskadacity and serial correlation, how are eachs’ standard error effected?

A

Multi = multicorrelation = Multiple increase in standard error, so Multicorrelation has a higher standard error, the other ones don’t have multi, meaning they have lower standard error

57
Q

How can a model be misfitted?

A

Types:

  1. Time-series: Serial correlation with a lagged variable, or forecasting the past
  2. Functional: Omitting a variable or data pooled improperly
58
Q

You use the Durbin Watson test to test for what?

A

Autocorrelation

59
Q

When testing for Autocorrelation in Linear and Log Linear models, what do you use? And do you use something different for AR models?

A

Yes. Durbin watson for Linear and Log Linear.

T test for AR models

60
Q

Important, what does covariance stationary mean. What are the assumptions.

A

Finite expected value
Constant Variance, Constant covariance
Has a mean reverting level
No root unit problem

61
Q

Important, how do you make data covariance stationary

A

By First differencing data. You take the difference between a period and the period prior, that is now the new data point

62
Q

What is first differencing data

A

Making data covariance stationary

63
Q

What does the Durbin Watson test test for? ANd what is the magic number

A

Autocorrelation. It has like a permant t stat of 2. Less than 2 = NO serial correlation

64
Q

What is the difference between an AR1 model and AR 2 model

A

AR1 only has 1 lagged variable, AR2 has 2.

65
Q

In an autoregression model, if b1=>1, what happens?

A

The data is NOT covariance stationary because there is NO mean reverting level. You can not use the data.

66
Q

Which is better for data sets and why. Long term or short term data

A

Short term (yes short term). Why? Well, long term data may contain data points that have structural changes in the underlying economy or like data environment. Not good to model off

67
Q

Is a random walk covariance stationary?

A

NO.`

68
Q

How to make a random walk covariance stationary

A

First difference the data. Period 1 - Period 1-1.

69
Q

What is first differencing

A

Making data convariance stationary by taking the difference between 2 data points

70
Q

In an ar model, how to check for autocorrelation, and how do you interperet the data

A

T test. If the autocorrelation t score is BELOW/within the critical t, autocorrelation is NOT present, so the data is good. If the data is correlated, use the next AR model (AR2, AR3 etc) til the serial correlation goes away

71
Q

What is the dicky fuller test

A

The unit root test (if present we are in the clear). Basically ensuring that b1 is not a 1 (meaning no mean reversion

72
Q

How do you conduct a dicky fuller test? And what does it test for?

A

Subtract x-1 from both sides of equation. It tests if the formula has a unit root, which is needed for an AR model to be covariance stationary.

73
Q

What would adding a second lag do to an AR model? Adjust for what?

A

Seasonaolity

74
Q

If a model has arch 1, what does that mean?

A

Variance can be predicted

75
Q

What is machine learning?

A

Finding patterns then applying those patterns.

76
Q

What is a feature and what is a target is machine learning

A

A target is the y variable, the dependent variable, while the feature is the x variable

77
Q

What are training, validation and test samples

A

Training samples help a algorithm learn a pattern or relationship
Validation samples TUNE the model
Data or Test samples test the model on out of sample data

78
Q

Which is new data, in or out of sample data

A

Out

79
Q

What is undersupervised learning

A

Undersupervised learning is when a Machine learning alogrithm learns the relationships between variables when they are not labelled. They find the patterns and relationships themselves

80
Q

What is supervised learning?

A

Supervised learning is when an analyst enters the labels of a dataset

81
Q

What is a hyperparametre

A

It is when the analyst enters, it is something that contracins the learning progress of the model

82
Q

What is overfitting? What are the deteriments. Is it for supervised or undersupervised models

A

Having too many features to describe a target. The model can NOT process or explain out of sample data.
Supervised only

83
Q

Bias Error, and Variable error, what are they

A

Bias error means you have inputs that do not explain the changes in Y. This means the model is underfitted.

Variable error is when the model is overfitted. The model is great at explaining in sample data, but bad at out of sample

84
Q

How to reduce varitation error?

A

Holdout samples and K Fold cross variation

85
Q

Name the types of supervised models (5)

A
Penalised model (penalty for including increased variables)
Support Vector - classification model
K nearest neighbour - classification model - finding similarities in inputs
CART - Binary model -classification and regression tree
Emsemble/random forest - complex but low variation model
86
Q

Types of UNsupervised models

A

Principal Components - only showing the most relevant features
Clustering - K clustering - putting outputs into K clusters
Heirach Cluster - dividing clusters as they appear,

87
Q

Neutral network ML, what is it

A

Super complex and very effective. good for nonlinear

88
Q

When dealing with a random walk, if the intercept and the coefficients do not significantly differ from zero, what should you do?

A

Assume that they equal zero, so y = error

89
Q

Do random walks have unit roots

A

Yes

90
Q

Convertible bond ratio

A

Market conversion price = Convertible bond price/Conversion ratio