Quantitative Methods Flashcards

Question 1

Q

Formula for Multiple Regression

Question 2

Q

Coefficient of Determination

Answer

A

R²
Measure of Goodness of Fit
Sum of Squares Regression / Sum of Squares Total

Question 3

Q

Adjusted R²

Answer

A

Adjusts R² by the degrees of freedom;
Does not automatically increase when variables are added

Question 4

Q

Akaike’s Information Criterion (AIC)

Answer

A

Measure of Model Parsimony ie. Lower is better fitting model

Preffered model for prediction purposes

Question 5

Q

Schwarz’s Bayesian Information Criteria (BIC or SBC)

Answer

A

Allows us to choose the best model among a set of models

Preffered when Goodness of Fit is Desired

Question 6

Q

Unrestricted Model

Answer

A

Full model with all independent variables

Question 7

Q

Restricted Model

Answer

A

Also called nested Models, they take the unrestricted model and exclude one or more variables

Question 8

Q

F-Distributed Test Statistic When Comparing Restricted & Unrestricted Models

Answer

A

q is the number of restrictions

Question 9

Q

General Linear F-test

Question 10

Q

Heteroskedasticity

Answer

A

The variance of the residuals differ across observations

Arises from Ommited Variables, Incorrect Functional Form, Extreme Values

Use Breusch-Pagan (BP) Test

Question 11

Q

Unconditional Heterskedasticity

Answer

A

Error variance is not correlated with Independent Variables

Not a problem for statistical inference

Question 12

Q

Conditional Heterskedasticity

Answer

A

Error Variance is correlated to independent variables

Inflated T-Statistics

Use Breusch-Pagan (BP) Test

Question 13

Q

Breusch-Pagan (BP) Test

Answer

A

Used to test for Heterskedasticity;
1. Run Regression
2. Run another regression with the Dependent variable being the residuals squared from step 1
3. Use Chi Squared Statistic to solve Null Hypothesis that there is no Heteroskedasticity

Question 14

Q

Robust Standard Errors

Answer

A

Computed to correct for the effects of Heteroskedasticity

Question 15

Q

Serial Correlation

Answer

A

Regression Errors are correlated across observations

Typically seen in Time-Series Regressions

Use Durbin Watson (DW) Test or Breusch-Godfrey (BG) Test

Question 16

Q

Breusch-Godfrey (BG) Test

Answer

A

Used to Test for Serial Correlation;
1. Run the initial regression
2. Run Fitted Residuals from Step 1 as the Dependent Variable against the initial regressors + one or more lagged residuals
3. Test Hypothesis using Chi-Square Test

Question 17

Q

Correcting for Serial Correlation

Answer

A

Serial-correlation consistent standard errors

Computed by Software Packages

Question 18

Q

Multicollinearity

Answer

A

Independent Variables are correlated to each other

Use variance inflation factor (VIF) to quantify multicollinearity issues

Question 19

Q

Variance Inflation Factor (VIF) Formula

Answer

A

Used to test for Multicollinearity

VIF>5 Prompts Investigation
VIF>10 Serious Multicollinearity issues

Question 20

Q

Correcting Multicollinearity

Answer

A

Excluding 1 or more variables
Using a different proxy for one of the variables
Increasing sample size

No easy way to fix

Question 21

Q

High Leverage Point

Answer

A

Extreme Value of a Independent Variable

Question 22

Q

Outlier

Answer

A

Extreme Value of Dependent Variable

Question 23

Q

Leverage

Answer

A

Difference between nth independent variable and the mean of all the independent variables

Rule of Thumb: Leverage above 3(K+1/N) is potentially influential

Question 24

Q

Studentized Residual

Answer

A

Way of testing for outliers

Question 25

Q

Cook’s Distance

Answer

A

Metric for identifying influential data points; How the estimated value if the regression changes after deleting an observation

Question 26

Q

Logistic Transformation

Answer

A

Transforms Qualitative Dependent Variable into a Linear relationship with the independent variables

Question 27

Q

Logistic Regression

Question 28

Q

Likelihood Ratio (LR) Test

Answer

A

Method to assess the fit of Logistic Regression models

Higher values (closer to 0) are better

Question 29

Q

Linear Trend Model Formula

Answer

A

Time Series with Linear Trend

Question 30

Q

Log-Linear Model Formula

Answer

A

Commonly used with time series that have exponential growth

Question 31

Q

Autoregressive (AR) Model Formula

Answer

A

Time series model regressed on its own past values

Question 32

Q

Covariance Stationary

Answer

A

The expected value of the Time Series must be constant and finite in all periods
The variance in the time series must be constant and finite in all periods
The covariance of the time series to itself must be constant and finite in fixed periods in the future and the past

Question 33

Q

Mean Reversion level for AR (1) Model

Question 34

Q

Random Walk

Answer

A

The value of a time series in one period is the same as the one in the previous time period, with an error term added.

Use Dickey Fuller Test

Question 35

Q

Dickey-Fuller Test

Answer

A

Used to test for a Unit root; If there is a unit root, then the time series is a random walk.

Test for g=0

Question 36

Q

n-period Moving Average Formula

Answer

A

Used to smooth out period to period fluctuations in time series models

Question 37

Q

Moving Average Time-Series (MA1) Model

Question 38

Q

AR1 Model Adjusted for Quarterly Seasonality

Question 39

Q

Autoregressive Moving Average Models (ARMA)

Answer

A

Combines Autoregressive and Moving Average Time Series

Can be very unstable

Question 40

Q

Autoregressive Conditional Heteroskedasticity (ARCH1) Model

Answer

A

Way of Testing if an AR Model has Heterskedasticity

Question 41

Q

Cointegration

Answer

A

Long Term finanical or economic relationship exists and don’t diverge in the long run

Question 42

Q

Test of Conitegration between two time series that have a unit root

Question 43

Q

Supervised Learning

Answer

A

Infers patterns between inputs (features) and Outputs (targets); uses labeled data

Question 44

Q

Unsupervised Learning

Answer

A

Seeks to identify strucure in unlabeled data; Used in
1. Dimesion Reduction (reduce number of features)
2. Clustering

Question 45

Q

Guide to ML Algorithms

Question 46

Q

Overfitting

Answer

A

Does not generlize well to new data

Question 47

Q

Bias Error

Answer

A

Degree to which the model fits the training data; produces underfitting and in-sample errors

Question 48

Q

Variance Error

Answer

A

How much the model’s results change in response to new data; Causes overfitting and out-of-sample errors

Question 49

Q

Base Error

Answer

A

Due to Randomness of Data

Question 50

Q

Cross-Validation

Answer

A

Method of reducing overfitting

Question 51

Q

K-Fold Cross Validation

Answer

A

Used to randomize the data into training and validation samples

Question 52

Q

LASSO (Least Absolute Shrinkage and Selection Operator)

Answer

A

A type of Penalized Regression that applies as features are added to the regression

Question 53

Q

Hyperparameter

Answer

A

Paramater selcted by the researcher before learning begins

Question 54

Q

Support Vector Machine

Answer

A

Optimally separates the data into two sets

Question 55

Q

k-Nearest Neighbor (KNN)

Answer

A

Supervised learning technique used mostly for classification and sometimes for regression

Question 56

Q

Classification and Regression Tree (CART)

Answer

A

Supervised learning used in both classificatio and regression. Commonly applied to binary classification or regression

Question 57

Q

Ensemble Learning

Answer

A

Combining the predictions from a collection of models

Question 58

Q

Bootstrap Aggregation (Bagging)

Answer

A

Technique where orignial dataset is used to create n number of datasets

Question 59

Q

Random Forrest Classifier

Answer

A

Large number of decision trees trained via a bagging method

Question 60

Q

Principal Component Analysis (PCA)

Answer

A

Transform many highly correlated features of data into a smaller number of uncorrelated composite variables

Question 61

Q

Eigenvectors

Answer

A

Mutual uncorrelated composite variables that are linear combinations of the original features

Represents a direction

Question 62

Q

Eigenvalue

Answer

A

Represetns the proportion of the total variance explained by the eigenvectors

Question 63

Q

k-Means Clustering

Answer

A

A form of Unsupervised learning

Question 64

Q

Hierarchical Clustering

Answer

A

A form of unsupervised learning

Answer 57

A

Conceptualization of the Modeling Task
Data Collection
Data Preperation and Wrangling
Data Exploration
Model Training

Answer 58

A

Text Problem Formulation
Data (Text) Curation
Text Preperation and Wrangling
Text Exploration

Answer 59

A

When extreme values and outliers are removed from the dataset

Also called truncation

Answer 60

A

When extreme values or outliers are replaced by the maximum (minimum) values that are not outliers

Answer 61

A

Process of rescaling numeric variables in the range of [0,1]

Answer 62

A

Process of both centering and scaling the variables

Data must be normally distributed to be effective

Brainscape's Knowledge GenomeTM

Quantitative Methods Flashcards

Brainscape's Knowledge Genome^TM