Quantitative Methods Flashcards

1
Q

Formula for Multiple Regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Coefficient of Determination

A

R2
Measure of Goodness of Fit
Sum of Squares Regression / Sum of Squares Total

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Adjusted R2

A

Adjusts R2 by the degrees of freedom;
Does not automatically increase when variables are added

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Akaike’s Information Criterion (AIC)

A

Measure of Model Parsimony ie. Lower is better fitting model

Preffered model for prediction purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Schwarz’s Bayesian Information Criteria (BIC or SBC)

A

Allows us to choose the best model among a set of models

Preffered when Goodness of Fit is Desired

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unrestricted Model

A

Full model with all independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Restricted Model

A

Also called nested Models, they take the unrestricted model and exclude one or more variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

F-Distributed Test Statistic When Comparing Restricted & Unrestricted Models

A

q is the number of restrictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

General Linear F-test

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Heteroskedasticity

A

The variance of the residuals differ across observations

Arises from Ommited Variables, Incorrect Functional Form, Extreme Values

Use Breusch-Pagan (BP) Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Unconditional Heterskedasticity

A

Error variance is not correlated with Independent Variables

Not a problem for statistical inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Conditional Heterskedasticity

A

Error Variance is correlated to independent variables

Inflated T-Statistics

Use Breusch-Pagan (BP) Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Breusch-Pagan (BP) Test

A

Used to test for Heterskedasticity;
1. Run Regression
2. Run another regression with the Dependent variable being the residuals squared from step 1
3. Use Chi Squared Statistic to solve Null Hypothesis that there is no Heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Robust Standard Errors

A

Computed to correct for the effects of Heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Serial Correlation

A

Regression Errors are correlated across observations

Typically seen in Time-Series Regressions

Use Durbin Watson (DW) Test or Breusch-Godfrey (BG) Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Breusch-Godfrey (BG) Test

A

Used to Test for Serial Correlation;
1. Run the initial regression
2. Run Fitted Residuals from Step 1 as the Dependent Variable against the initial regressors + one or more lagged residuals
3. Test Hypothesis using Chi-Square Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Correcting for Serial Correlation

A

Serial-correlation consistent standard errors

Computed by Software Packages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Multicollinearity

A

Independent Variables are correlated to each other

Use variance inflation factor (VIF) to quantify multicollinearity issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Variance Inflation Factor (VIF) Formula

A

Used to test for Multicollinearity

VIF>5 Prompts Investigation
VIF>10 Serious Multicollinearity issues

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Correcting Multicollinearity

A
  • Excluding 1 or more variables
  • Using a different proxy for one of the variables
  • Increasing sample size

No easy way to fix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

High Leverage Point

A

Extreme Value of a Independent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Outlier

A

Extreme Value of Dependent Variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Leverage

A

Difference between nth independent variable and the mean of all the independent variables

Rule of Thumb: Leverage above 3(K+1/N) is potentially influential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Studentized Residual

A

Way of testing for outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Cook’s Distance

A

Metric for identifying influential data points; How the estimated value if the regression changes after deleting an observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Logistic Transformation

A

Transforms Qualitative Dependent Variable into a Linear relationship with the independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Logistic Regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Likelihood Ratio (LR) Test

A

Method to assess the fit of Logistic Regression models

Higher values (closer to 0) are better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Linear Trend Model Formula

A

Time Series with Linear Trend

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Log-Linear Model Formula

A

Commonly used with time series that have exponential growth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Autoregressive (AR) Model Formula

A

Time series model regressed on its own past values

32
Q

Covariance Stationary

A
  1. The expected value of the Time Series must be constant and finite in all periods
  2. The variance in the time series must be constant and finite in all periods
  3. The covariance of the time series to itself must be constant and finite in fixed periods in the future and the past
33
Q

Mean Reversion level for AR (1) Model

A
34
Q

Random Walk

A

The value of a time series in one period is the same as the one in the previous time period, with an error term added.

Use Dickey Fuller Test

35
Q

Dickey-Fuller Test

A

Used to test for a Unit root; If there is a unit root, then the time series is a random walk.

Test for g=0

36
Q

n-period Moving Average Formula

A

Used to smooth out period to period fluctuations in time series models

37
Q

Moving Average Time-Series (MA1) Model

A
38
Q

AR1 Model Adjusted for Quarterly Seasonality

A
39
Q

Autoregressive Moving Average Models (ARMA)

A

Combines Autoregressive and Moving Average Time Series

Can be very unstable

40
Q

Autoregressive Conditional Heteroskedasticity (ARCH1) Model

A

Way of Testing if an AR Model has Heterskedasticity

41
Q

Cointegration

A

Long Term finanical or economic relationship exists and don’t diverge in the long run

42
Q

Test of Conitegration between two time series that have a unit root

A
43
Q

Supervised Learning

A

Infers patterns between inputs (features) and Outputs (targets); uses labeled data

44
Q

Unsupervised Learning

A

Seeks to identify strucure in unlabeled data; Used in
1. Dimesion Reduction (reduce number of features)
2. Clustering

45
Q

Guide to ML Algorithms

A
46
Q

Overfitting

A

Does not generlize well to new data

47
Q

Bias Error

A

Degree to which the model fits the training data; produces underfitting and in-sample errors

48
Q

Variance Error

A

How much the model’s results change in response to new data; Causes overfitting and out-of-sample errors

49
Q

Base Error

A

Due to Randomness of Data

50
Q

Cross-Validation

A

Method of reducing overfitting

51
Q

K-Fold Cross Validation

A

Used to randomize the data into training and validation samples

52
Q

LASSO (Least Absolute Shrinkage and Selection Operator)

A

A type of Penalized Regression that applies as features are added to the regression

53
Q

Hyperparameter

A

Paramater selcted by the researcher before learning begins

54
Q

Support Vector Machine

A

Optimally separates the data into two sets

55
Q

k-Nearest Neighbor (KNN)

A

Supervised learning technique used mostly for classification and sometimes for regression

56
Q

Classification and Regression Tree (CART)

A

Supervised learning used in both classificatio and regression. Commonly applied to binary classification or regression

57
Q

Ensemble Learning

A

Combining the predictions from a collection of models

58
Q

Bootstrap Aggregation (Bagging)

A

Technique where orignial dataset is used to create n number of datasets

59
Q

Random Forrest Classifier

A

Large number of decision trees trained via a bagging method

60
Q

Principal Component Analysis (PCA)

A

Transform many highly correlated features of data into a smaller number of uncorrelated composite variables

61
Q

Eigenvectors

A

Mutual uncorrelated composite variables that are linear combinations of the original features

Represents a direction

62
Q

Eigenvalue

A

Represetns the proportion of the total variance explained by the eigenvectors

63
Q

k-Means Clustering

A

A form of Unsupervised learning

64
Q

Hierarchical Clustering

A

A form of unsupervised learning

65
Q

Choosing an ML Algorithm Flowchart

A
66
Q

ML Model Building Steps

A
  1. Conceptualization of the Modeling Task
  2. Data Collection
  3. Data Preperation and Wrangling
  4. Data Exploration
  5. Model Training
67
Q

Text ML Model Building Steps

A
  1. Text Problem Formulation
  2. Data (Text) Curation
  3. Text Preperation and Wrangling
  4. Text Exploration
68
Q

Trimming

A

When extreme values and outliers are removed from the dataset

Also called truncation

69
Q

Winsorization

A

When extreme values or outliers are replaced by the maximum (minimum) values that are not outliers

70
Q

Normalization Formula

A

Process of rescaling numeric variables in the range of [0,1]

71
Q

Standardization Formula

A

Process of both centering and scaling the variables

Data must be normally distributed to be effective

72
Q

Confusion Matrix

A
73
Q

Precision Formula

A
74
Q

Recall Formula

A
75
Q

Accuracy Formula

A
76
Q

F1 Score Formula

A
77
Q

Root Mean Squared Error

A