Quantitative Methods Flashcards

1
Q

Formula for Multiple Regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Coefficient of Determination

A

R2
Measure of Goodness of Fit
Sum of Squares Regression / Sum of Squares Total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Adjusted R2

A

Adjusts R2 by the degrees of freedom;
Does not automatically increase when variables are added

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Akaike’s Information Criterion (AIC)

A

Measure of Model Parsimony ie. Lower is better fitting model

Preffered model for prediction purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Schwarz’s Bayesian Information Criteria (BIC or SBC)

A

Allows us to choose the best model among a set of models

Preffered when Goodness of Fit is Desired

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unrestricted Model

A

Full model with all independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Restricted Model

A

Also called nested Models, they take the unrestricted model and exclude one or more variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

F-Distributed Test Statistic When Comparing Restricted & Unrestricted Models

A

q is the number of restrictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

General Linear F-test

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Heteroskedasticity

A

The variance of the residuals differ across observations

Arises from Ommited Variables, Incorrect Functional Form, Extreme Values

Use Breusch-Pagan (BP) Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unconditional Heterskedasticity

A

Error variance is not correlated with Independent Variables

Not a problem for statistical inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Conditional Heterskedasticity

A

Error Variance is correlated to independent variables

Inflated T-Statistics

Use Breusch-Pagan (BP) Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Breusch-Pagan (BP) Test

A

Used to test for Heterskedasticity;
1. Run Regression
2. Run another regression with the Dependent variable being the residuals squared from step 1
3. Use Chi Squared Statistic to solve Null Hypothesis that there is no Heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Robust Standard Errors

A

Computed to correct for the effects of Heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Serial Correlation

A

Regression Errors are correlated across observations

Typically seen in Time-Series Regressions

Use Durbin Watson (DW) Test or Breusch-Godfrey (BG) Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Breusch-Godfrey (BG) Test

A

Used to Test for Serial Correlation;
1. Run the initial regression
2. Run Fitted Residuals from Step 1 as the Dependent Variable against the initial regressors + one or more lagged residuals
3. Test Hypothesis using Chi-Square Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Correcting for Serial Correlation

A

Serial-correlation consistent standard errors

Computed by Software Packages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Multicollinearity

A

Independent Variables are correlated to each other

Use variance inflation factor (VIF) to quantify multicollinearity issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Variance Inflation Factor (VIF) Formula

A

Used to test for Multicollinearity

VIF>5 Prompts Investigation
VIF>10 Serious Multicollinearity issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Correcting Multicollinearity

A
  • Excluding 1 or more variables
  • Using a different proxy for one of the variables
  • Increasing sample size

No easy way to fix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

High Leverage Point

A

Extreme Value of a Independent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Outlier

A

Extreme Value of Dependent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Leverage

A

Difference between nth independent variable and the mean of all the independent variables

Rule of Thumb: Leverage above 3(K+1/N) is potentially influential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Studentized Residual

A

Way of testing for outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Cook's Distance
Metric for identifying influential data points; How the estimated value if the regression changes after deleting an observation
26
Logistic Transformation
Transforms Qualitative Dependent Variable into a Linear relationship with the independent variables
27
Logistic Regression
28
Likelihood Ratio (LR) Test
Method to assess the fit of Logistic Regression models | Higher values (closer to 0) are better
29
Linear Trend Model Formula
Time Series with Linear Trend
30
Log-Linear Model Formula
Commonly used with time series that have exponential growth
31
Autoregressive (AR) Model Formula
Time series model regressed on its own past values
32
Covariance Stationary
1. The expected value of the Time Series must be constant and finite in all periods 2. The variance in the time series must be constant and finite in all periods 3. The covariance of the time series to itself must be constant and finite in fixed periods in the future and the past
33
Mean Reversion level for AR (1) Model
34
Random Walk
The value of a time series in one period is the same as the one in the previous time period, with an error term added. ## Footnote Use Dickey Fuller Test
35
Dickey-Fuller Test
Used to test for a Unit root; If there is a unit root, then the time series is a random walk. | Test for g=0
36
*n*-period Moving Average Formula
Used to smooth out period to period fluctuations in time series models
37
Moving Average Time-Series (MA1) Model
38
AR1 Model Adjusted for Quarterly Seasonality
39
Autoregressive Moving Average Models (ARMA)
Combines Autoregressive and Moving Average Time Series ## Footnote Can be very unstable
40
Autoregressive Conditional Heteroskedasticity (ARCH1) Model
Way of Testing if an AR Model has Heterskedasticity
41
Cointegration
Long Term finanical or economic relationship exists and don't diverge in the long run
42
Test of Conitegration between two time series that have a unit root
43
Supervised Learning
Infers patterns between inputs (features) and Outputs (targets); uses labeled data
44
Unsupervised Learning
Seeks to identify strucure in unlabeled data; Used in 1. Dimesion Reduction (reduce number of features) 2. Clustering
45
Guide to ML Algorithms
46
Overfitting
Does not generlize well to new data
47
Bias Error
Degree to which the model fits the training data; produces underfitting and in-sample errors
48
Variance Error
How much the model's results change in response to new data; Causes overfitting and out-of-sample errors
49
Base Error
Due to Randomness of Data
50
Cross-Validation
Method of reducing overfitting
51
K-Fold Cross Validation
Used to randomize the data into training and validation samples
52
LASSO (Least Absolute Shrinkage and Selection Operator)
A type of Penalized Regression that applies as features are added to the regression
53
Hyperparameter
Paramater selcted by the researcher before learning begins
54
Support Vector Machine
Optimally separates the data into two sets
55
*k*-Nearest Neighbor (KNN)
Supervised learning technique used mostly for classification and sometimes for regression
56
Classification and Regression Tree (CART)
Supervised learning used in both classificatio and regression. Commonly applied to binary classification or regression
57
Ensemble Learning
Combining the predictions from a collection of models
58
Bootstrap Aggregation (Bagging)
Technique where orignial dataset is used to create *n* number of datasets
59
Random Forrest Classifier
Large number of decision trees trained via a bagging method
60
Principal Component Analysis (PCA)
Transform many highly correlated features of data into a smaller number of uncorrelated composite variables
61
Eigenvectors
Mutual uncorrelated composite variables that are linear combinations of the original features | Represents a direction
62
Eigenvalue
Represetns the proportion of the total variance explained by the eigenvectors
63
*k*-Means Clustering
A form of Unsupervised learning
64
Hierarchical Clustering
A form of unsupervised learning
65
Choosing an ML Algorithm Flowchart
66
ML Model Building Steps
1. Conceptualization of the Modeling Task 2. Data Collection 3. Data Preperation and Wrangling 4. Data Exploration 5. Model Training
67
Text ML Model Building Steps
1. Text Problem Formulation 2. Data (Text) Curation 3. Text Preperation and Wrangling 4. Text Exploration
68
Trimming
When extreme values and outliers are removed from the dataset | Also called truncation
69
Winsorization
When extreme values or outliers are replaced by the maximum (minimum) values that are not outliers
70
Normalization Formula
Process of rescaling numeric variables in the range of [0,1]
71
Standardization Formula
Process of both centering and scaling the variables ## Footnote Data must be normally distributed to be effective
72
Confusion Matrix
73
Precision Formula
74
Recall Formula
75
Accuracy Formula
76
F1 Score Formula
77
Root Mean Squared Error