Quant Flashcards

(90 cards)

1
Q

What is linear regression?

A

Finding the relationship between 2 variables for predictive analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the SSE, SSR and SST

A

On a slope, one must determine the error between the line of best fit and the data points. These 3 varibles quantify that

SSR is the pRedicted deviation - its is the difference between the line of best fit and the mean of the data set

SSE is the ERROR deviation and is the difference between the line of best fit and the data point

SST is the sum of SSE and SSR - it shows the total deviation from the mean to the data point

Remember these are all SQUARED

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the formula for r squared

A

R squared = SSR / SST

It shows how well explained/predictive the model is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Is a high or low r squared meaning the relationship is greater?

A

High r squared means HIGH relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

R squred, what are the highest and lowest numbers it could be

A

It is between 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the degrees of freedom

A

Degrees of freedom are the number of variables you have in the model minus how many variables you have minus 1.

You want the degrees of freedom high to have a good mdoel

DF = n-k-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

As Degrees of freedom increases, R squared ________? and why

A

As Degrees of freedom increases, R squared decreases.

Think if you only had 2 data points, the r^2 (relationship) would be 1. Putting in more variables would DECREASE r^2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Formula for Y / relationship between x and Y

A

Y = β0 + β1 x + error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When comparing y = beta 0 + beta 1 * x + e, which is the independant and dependant variable?

A

Y is dependant, x is independant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If the confidence interval rises (from 90- 99%) does the probability of rejecting the null hypothesis go up or down? WHy

A

The probabily will go….. down. The confidence interval will get wider (to ensure we are more confient we have the right number).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a t statistic? What is the formula?

A

A t statisitc test is checking whether a hypothesised number could be the actual statistic/value of a score based on a t score, standard error, and the score we know to be true.

So it is the score we know it true + and - the t score * standard error.

The t score is found using the degrees of freedom minus 2. Get the score from the t table.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the stnadard error

A

SD / square root n

OR

Epsilon (which is Y -β1 - β2) < the formula for Y in reverse.

(Epsilon squared / n-2) ^.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How to find SSR

A

It is the line of best fit - mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to find SSE

A

Value - line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is an f test?

A

It compares 2 data sets to check if they’re statistically consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Confidence interval formula

A

= mean +- t or z score * standard error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the z scores for 90,95 and 99% ?

A
  1. 64
  2. 96
  3. 68
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Coefficient of determination is

A

r^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is correlation squared?

A

r^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

in the formula y = Y = β0 + β1 x + error , What is B0

A

β0 is the y intercept

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Confidence interval explanation and formula

A

Mean + - t or z stat * standard error.

Check if the OTHER mean (be it the actual or standard mean) is within those boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the p value,

A

The pathetic value, we want that low to reject the null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are some key assumptions to simple linear regression

A

the relationship between x and y is linear
x is uncorrelated with the error terms
Sum of residuals = 0
there is a constant variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Formula for standard deviation with Standard error

A

Square root of Standard error / n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Is variance the same as SST?
Yes
26
Formula for DOF
DOF = k+ (n-k-1)
27
MSR (mean squared regression) and MSE (mean squared Error) formulas
``` MSR = SSR / k MSE = SSE / n-k-1 ```
28
What is MSR / MSE
F stat
29
Formula for standard error in regression
square root sse / n - k - 1
30
Correlation formula, then R squared formula
Cor = Cov / omega omega R^2 = cor^2
31
F stat formula, what is means, and how to interperet it
F stat is testing if there is even a relationship between the y and x variables It is MSR/MSE Over 1 means that there is a relationship
32
Calcualte MSR and MSE
``` MSR = SSR / n-k-1 MSE = SSE/k ``` MSE/MSR = F
33
What does adjusted r^2 do
It adjusts the r^2 so that increasing the dof does NOT increase the r^2
34
Downfall of R^2?
It is not bound by 0 and 1
35
What is a dummy variable? And how to incorporate into formula?
Introducing a QUALatative variable. You give it a value of 1, and every alternative a value of 0. If it is months of the year, and you want only results collected in Jan, Jan has a value of 1, and the rest (minus one month) have a value of 0
36
What is heteroskatacity?
It is unequal variances. Pretty much that there is a relationship between the standard error and the variable's variance. You don't want that
37
What are the assumptions of multiple regression
``` There is a linear relationship The independant variables are NOT random Error = 0 Variance is constant Errors are not correlated Error is normally distributed ```
38
How does Heteroskatacity effect the standard error, and what does this mean?
It makes the standard error lower (because it can be more easily predicted etc. from variable value) meaning that it is HARDER to reject a null.
39
How to reject heteroskatcity?
Broysche Pagan Test
40
What is serial correlation?
Than an independant varialbe is correlated with itself, so it is more predictable and therefore variance is lowered. So if a stock goes up one day, it is more likley to go up the next day. That is not constant variance
41
What will serial correlation do to the t stat
Increase it meaning you wont be able to reject the null
42
What is multicollinarity
Multicolunarity means that two independent variables are closely correlated
43
What will multicolliarity do to the t stat and standard error
increase standard error and reduce t stat
44
How do you resolve multicolliarity?
Remove a variable
45
The null hypothesis is the....
not true hypothesis
46
What is autoregression?
Variable yesterday explains a variable today
47
Formula for an autoregression equation
x = b0 + b1(X-1n) +E
48
How do you detect if error terms are correlated?
Durbin Watson Test - you cant use this data if the error terms are correlated
49
I have x1, how do i get x2 using autoregression
x2 = b0+b1*X1 X1 is the same as x-1 from x2
50
Autoregressive correlation. How do you test for this, and what does the test mean?
Normal t test for this one. Find the autocorrelation / Standard error. Compare against t value. If it is NOT REJECTED, the data is all okay
51
You do a t test on the serial correlation on some time series data and find out that the null is rejected, meaning that the t stat is outside the t value, what does this mean?
Rejected null means reject that data, it is autocorrelated and not good
52
Mean regression line, what is the formula for this?
B0 / 1-b1 THis is what the data points should revert to
53
How do you work out which autoregression line you should use? e.g. data from 2 years ago or 3 years ago.
You use Root Mean Squared Error. Pretty much the Square root of MSE of both series - the smallest means you use that data set
54
What is the mean reversion from a random walk and why
There is none! It is B0 / 1-b1 B1 is always 1, so 0/0 = 0
55
Formula for a random walk and what it means
x = x-1 + random error term. It is the best guess of the value beyond that of the one in the past. x-1 + a random variable
56
Multicollinarity, Heteroskadacity and serial correlation, how are eachs' standard error effected?
Multi = multicorrelation = Multiple increase in standard error, so Multicorrelation has a higher standard error, the other ones don't have multi, meaning they have lower standard error
57
How can a model be misfitted?
Types: 1. Time-series: Serial correlation with a lagged variable, or forecasting the past 2. Functional: Omitting a variable or data pooled improperly
58
You use the Durbin Watson test to test for what?
Autocorrelation
59
When testing for Autocorrelation in Linear and Log Linear models, what do you use? And do you use something different for AR models?
Yes. Durbin watson for Linear and Log Linear. T test for AR models
60
Important, what does covariance stationary mean. What are the assumptions.
Finite expected value Constant Variance, Constant covariance Has a mean reverting level No root unit problem
61
Important, how do you make data covariance stationary
By First differencing data. You take the difference between a period and the period prior, that is now the new data point
62
What is first differencing data
Making data covariance stationary
63
What does the Durbin Watson test test for? ANd what is the magic number
Autocorrelation. It has like a permant t stat of 2. Less than 2 = NO serial correlation
64
What is the difference between an AR1 model and AR 2 model
AR1 only has 1 lagged variable, AR2 has 2.
65
In an autoregression model, if b1=>1, what happens?
The data is NOT covariance stationary because there is NO mean reverting level. You can not use the data.
66
Which is better for data sets and why. Long term or short term data
Short term (yes short term). Why? Well, long term data may contain data points that have structural changes in the underlying economy or like data environment. Not good to model off
67
Is a random walk covariance stationary?
NO.`
68
How to make a random walk covariance stationary
First difference the data. Period 1 - Period 1-1.
69
What is first differencing
Making data convariance stationary by taking the difference between 2 data points
70
In an ar model, how to check for autocorrelation, and how do you interperet the data
T test. If the autocorrelation t score is BELOW/within the critical t, autocorrelation is NOT present, so the data is good. If the data is correlated, use the next AR model (AR2, AR3 etc) til the serial correlation goes away
71
What is the dicky fuller test
The unit root test (if present we are in the clear). Basically ensuring that b1 is not a 1 (meaning no mean reversion
72
How do you conduct a dicky fuller test? And what does it test for?
Subtract x-1 from both sides of equation. It tests if the formula has a unit root, which is needed for an AR model to be covariance stationary.
73
What would adding a second lag do to an AR model? Adjust for what?
Seasonaolity
74
If a model has arch 1, what does that mean?
Variance can be predicted
75
What is machine learning?
Finding patterns then applying those patterns.
76
What is a feature and what is a target is machine learning
A target is the y variable, the dependent variable, while the feature is the x variable
77
What are training, validation and test samples
Training samples help a algorithm learn a pattern or relationship Validation samples TUNE the model Data or Test samples test the model on out of sample data
78
Which is new data, in or out of sample data
Out
79
What is undersupervised learning
Undersupervised learning is when a Machine learning alogrithm learns the relationships between variables when they are not labelled. They find the patterns and relationships themselves
80
What is supervised learning?
Supervised learning is when an analyst enters the labels of a dataset
81
What is a hyperparametre
It is when the analyst enters, it is something that contracins the learning progress of the model
82
What is overfitting? What are the deteriments. Is it for supervised or undersupervised models
Having too many features to describe a target. The model can NOT process or explain out of sample data. Supervised only
83
Bias Error, and Variable error, what are they
Bias error means you have inputs that do not explain the changes in Y. This means the model is underfitted. Variable error is when the model is overfitted. The model is great at explaining in sample data, but bad at out of sample
84
How to reduce varitation error?
Holdout samples and K Fold cross variation
85
Name the types of supervised models (5)
``` Penalised model (penalty for including increased variables) Support Vector - classification model K nearest neighbour - classification model - finding similarities in inputs CART - Binary model -classification and regression tree Emsemble/random forest - complex but low variation model ```
86
Types of UNsupervised models
Principal Components - only showing the most relevant features Clustering - K clustering - putting outputs into K clusters Heirach Cluster - dividing clusters as they appear,
87
Neutral network ML, what is it
Super complex and very effective. good for nonlinear
88
When dealing with a random walk, if the intercept and the coefficients do not significantly differ from zero, what should you do?
Assume that they equal zero, so y = error
89
Do random walks have unit roots
Yes
90
Convertible bond ratio
Market conversion price = Convertible bond price/Conversion ratio