Study Guide for Final Flashcards
(18 cards)
What is the intuition behind how model parameters are estimated in regression?
The goal is to find the line or curve that best fits the data. Where we minimize the distance between each point and the line
How do you fit models with continuous, categorical, and interaction variables?
Continuous variables: Standard regression shows the rate of change (e.g., 1-unit increase in π changes π by π½).
Categorical variables: Use dummy variables (0/1) to represent categories.
Interaction terms: Include these when the effect of one variable depends on another. For example:
GenderΓEducation.
What does the output of multiple regression tell you?
4 thingsβ¦Coefficients, Standard Errors,P-values, & π ^2
Coefficients (π½): Show how much
π changes for a 1-unit increase in
π, holding other variables constant.
Standard Errors: Indicate uncertainty in the estimate.
P-values: Tell you if the effect is statistically significant (π<0.05).
π ^2: Explains the proportion of variance in π accounted for by the model.
Why is prediction important, and how do you make predictions?
Predictions help forecast outcomes, test models, and inform decision-making. You predict by plugging new
π values into the regression equation:
Y^=Ξ²0β+Ξ²1βX1β+Ξ²2βX2β+β¦
How can you tell if predictions are good or bad?
- Use model fit metrics:
- π ^2: Proportion of variance explained.
- RMSE (Root Mean Square Error): Average squared error (penalizes large errors).
- MAE (Mean Absolute Error): Average absolute deviation. - Compare predictions with observed values using:
- Cross-validation techniques.
- Residual analysis (errors between observed and predicted).
How do prediction and causal inference differ?
Prediction: Focuses on correlation to forecast outcomes.
Causal inference: Focuses on identifying if π causes changes in
π.
How is causality defined in social science?
A cause is something that produces a change in an outcome. Itβs often defined through counterfactuals: βWhat would happen to
π if π did not occur?β
Why is causal inference challenging?
The fundamental problem of causality is that you canβt observe both the treated and untreated states for the same individual. This makes it hard to isolate causal effects without counterfactual comparisons.
Why and how do we use simulations for causal inference?
Simulations help test causal patterns by:
- Creating synthetic data.
- Introducing confounders or treatments.
- Observing how different causal relationships affect outcomes.
Why are experiments effective for causal inference?
Experiments use random assignment to eliminate confounding. This isolates the causal effect of
π on π. Observational studies lack this control.
What are DAGs, and how do you use them?
Def: Directed Acyclic Graphs (DAGs) visually represent causal relationships.
Arrows: Show the direction of causality.
Use: DAGs identify fork, pipe, collider
What are fork, pipe, collider?
Fork: A variable that affects both
π and π
Pipe: A variable through which π affects π
Collider: A variable that is caused by both π and π. Conditioning on it (e.g., controlling) introduces bias.
Why are control variables used in causal inference?
Controls adjust for confounding effects to isolate the causal effect of
π on π. Practically, they are included in regression models or matching methods.
Why did the causal revolution happen, and what are natural experiments?
The causal revolution emphasizes causal effects, not just correlations. Natural experiments occur when randomness or exogenous factors (e.g., policies) mimic an experiment, allowing causal inference.
Why is there uncertainty in results, and how do you quantify it?
- Uncertainty arises from sampling variability and measurement error.
- Quantify uncertainty using confidence intervals, standard errors, and hypothesis tests.
How can simulations illustrate uncertainty?
Simulate many samples to observe variability in results. This provides insights into how confident we are about estimates and predictions.
How does the law of large numbers help with uncertainty?
The law of large numbers states that as the sample size increases, the sample mean converges to the true population mean, reducing uncertainty.