Study Guide for Final Flashcards

Question 1

Q

What is the intuition behind how model parameters are estimated in regression?

Answer

A

The goal is to find the line or curve that best fits the data. Where we minimize the distance between each point and the line

Question 2

Q

How do you fit models with continuous, categorical, and interaction variables?

Answer

A

Continuous variables: Standard regression shows the rate of change (e.g., 1-unit increase in 𝑋 changes 𝑌 by 𝛽).

Categorical variables: Use dummy variables (0/1) to represent categories.

Interaction terms: Include these when the effect of one variable depends on another. For example:
Gender×Education.

Question 3

Q

What does the output of multiple regression tell you?

4 things…Coefficients, Standard Errors,P-values, & 𝑅 ^2

Answer

A

Coefficients (𝛽): Show how much
𝑌 changes for a 1-unit increase in
𝑋, holding other variables constant.

Standard Errors: Indicate uncertainty in the estimate.

P-values: Tell you if the effect is statistically significant (𝑝<0.05).

𝑅 ^2: Explains the proportion of variance in 𝑌 accounted for by the model.

Question 4

Q

Why is prediction important, and how do you make predictions?

Answer

A

Predictions help forecast outcomes, test models, and inform decision-making. You predict by plugging new
𝑋 values into the regression equation:

Y^=β0+β1X1+β2X2+…

Question 5

Q

How can you tell if predictions are good or bad?

Answer

A

Use model fit metrics:
- 𝑅^2: Proportion of variance explained.
- RMSE (Root Mean Square Error): Average squared error (penalizes large errors).
- MAE (Mean Absolute Error): Average absolute deviation.
Compare predictions with observed values using:
- Cross-validation techniques.
- Residual analysis (errors between observed and predicted).

Question 6

Q

How do prediction and causal inference differ?

Answer

A

Prediction: Focuses on correlation to forecast outcomes.

Causal inference: Focuses on identifying if 𝑋 causes changes in
𝑌.

Question 7

Q

How is causality defined in social science?

Answer

A

A cause is something that produces a change in an outcome. It’s often defined through counterfactuals: “What would happen to
𝑌 if 𝑋 did not occur?”

Question 8

Q

Why is causal inference challenging?

Answer

A

The fundamental problem of causality is that you can’t observe both the treated and untreated states for the same individual. This makes it hard to isolate causal effects without counterfactual comparisons.

Question 9

Q

Why and how do we use simulations for causal inference?

Answer

A

Simulations help test causal patterns by:
- Creating synthetic data.
- Introducing confounders or treatments.
- Observing how different causal relationships affect outcomes.

Question 10

Q

Why are experiments effective for causal inference?

Answer

A

Experiments use random assignment to eliminate confounding. This isolates the causal effect of
𝑋 on 𝑌. Observational studies lack this control.

Question 11

Q

What are DAGs, and how do you use them?

Answer

A

Def: Directed Acyclic Graphs (DAGs) visually represent causal relationships.

Arrows: Show the direction of causality.

Use: DAGs identify fork, pipe, collider

Question 12

Q

What are fork, pipe, collider?

Answer

A

Fork: A variable that affects both
𝑋 and 𝑌

Pipe: A variable through which 𝑋 affects 𝑌

Collider: A variable that is caused by both 𝑋 and 𝑌. Conditioning on it (e.g., controlling) introduces bias.

Question 13

Q

Why are control variables used in causal inference?

Answer

A

Controls adjust for confounding effects to isolate the causal effect of
𝑋 on 𝑌. Practically, they are included in regression models or matching methods.

Question 14

Q

Why did the causal revolution happen, and what are natural experiments?

Answer

A

The causal revolution emphasizes causal effects, not just correlations. Natural experiments occur when randomness or exogenous factors (e.g., policies) mimic an experiment, allowing causal inference.

Question 15

Q

Why is there uncertainty in results, and how do you quantify it?

Answer

A

Uncertainty arises from sampling variability and measurement error.
Quantify uncertainty using confidence intervals, standard errors, and hypothesis tests.