7. Multivariable regression Flashcards
(15 cards)
def multivariable Ordinary Least Squares (OLS) regression
statistical method used to understand complex phenomena by analyzing the effects of multiple independent variables (factors) on a single dependent variable (outcome).
-> bivariate linear regression was for only one factor, one IV and now it is for more IV and a DV (outcome, Y)
what multivariable regression does
multivariable OLS regression lets us include more than one independent variable (X₁, X₂, X₃…) to explain variation in a dependent variable (Y).
It helps you:
*Test multiple hypotheses at once
*Control for other factors, i.e., isolate the effect of one variable while holding others constant
*Compare the strength of each factor’s effect
diff bivariate and multivariate regression
Bivariate regression = 1 X and 1 Y
→ e.g., how terrorist attacks (X) affect public opinion on immigration (Y)
Multivariate regression = multiple Xs and 1 Y
→ e.g., terrorist attacks (X₁), GDP per capita (X₂), and unemployment (X₃) all explaining immigration salience (Y)
what is R²
same as for bivariate regression
ex: R² = 0.233 → about 23.3% of the variation in immigration concern is explained by the three variables in the model.
Generally: adding relevant variables increases R², but adding irrelevant ones doesn’t help (and might hurt the model).
example
*Research Question: “What explains public support for stricter immigration policies in European countries?”
*Hypotheses: We want to test whether the following factors influence support for stricter immigration:
-Number of terrorist attacks (H₁: More attacks → more support)
-GDP per capita (H₂: Richer countries → less support)
-Unemployment rate (H₃: More unemployment → more support)
-Education level (H₄: More education → less support)
*Our Dataset (simplified)
Country - Support for stricter immigration(%) - Terror Attacks - GDP per Capita (×1000€) - Unemployment (%) -
% with Higher Education
France 68 14 38 9.0 32
Germany 45 3 45 4.5 37
Hungary 76 1 16 6.2 25
Sweden 42 2 51 3.8 41
Italy 63 5 30 10.2 28
*Regression Equation
We estimate the following linear equation
*Regression Output (results):
Variable - Coefficient (β: sum of coefficient of all the variables) - p-value
-Terror Attacks +1.80 0.008
-GDP per Capita −0.75 0.012
-Unemployment Rate +1.25 0.004
-Higher Education % −0.65 0.001
R² 0.78 -
*Interpretation of Each Coefficient
1. Intercept (β₀ = a inthelinearequat°= 22.0)
If a country had 0 terror attacks, 0 GDP, 0 unemployment, and 0% higher education, predicted support would be 22%.
(This is just a mathematical baseline — not substantively meaningful.)
- Terror Attacks (β₁ = +1.80)
For each additional terror attack, support for stricter immigration increases by 1.8 percentage points, holding all other variables constant.
✅ Statistically significant (p = 0.008 < 0.01)
➡️ If France went from 14 to 15 attacks, predicted support would go from 68% to 69.8%, assuming other variables stay the same. - GDP per Capita (β₂ = −0.75)
For every €1,000 increase in GDP per capita, support decreases by 0.75 percentage points.
✅ Significant (p = 0.012)
➡️ Richer countries are slightly less supportive of stricter immigration (maybe due to economic confidence or cosmopolitanism). - Unemployment Rate (β₃ = +1.25)
For each 1% increase in unemployment, support increases by 1.25 percentage points.
✅ Significant (p = 0.004)
➡️ In countries with more job insecurity, people may feel more threatened by immigration. - Higher Education (β₄ = −0.65)
For every 1% increase in higher education, support drops by 0.65 points.
✅ Very significant (p = 0.001)
➡️ Education likely increases tolerance or reduces fear-based attitudes.
🔢 R² = 0.78
This means the model explains 78% of the variation in immigration support across countries.
what is a logit model
(also called logistic regression, so is a kind of regression) estimates the probability that a particular binary outcome happens.
-> Use when you want to understand how the X affect the probability of an event happening.
-> you’re modeling the log-odds that Y = 1 (i.e., the event happens)
CONDITION: Y is dichotomous, meaning it only takes two values:
1 = presence of the outcome (e.g. war, alliance, protest)
0 = absence of the outcome
when to use a logit model
Use logistic regression when:
-You are predicting events that either occur or don’t occur (common in IR: wars, treaties, votes, interventions).
-Your outcome variable is binary/ dummy (yes/no, win/lose, support/oppose).
-You want to estimate the effect of your Xs (independent variables) on the likelihood of the outcome happening.
example logit model
*Research Question: What increases the probability that people consider immigration as a salient issue?
*Binary coding of the dependent variable (Y): Original variable = % of people in a country who see immigration as a top issue.
Mean = 11.8%.
*We recode:
-If above 11.8% → 1 (“immigration is salient”)
-If 11.8% or lower → 0 (“immigration is not salient”)
⏩ 156 cases coded as 1, 257 cases as 0.
*Independent variables (Xs):
-GTD_total: number of terrorist attacks
-GDP per capita
-Unemployment rate
*Regression output:
-Coefficient for GTD_total = +0.003091
Significance: p < 0.01
*Interpretation:
-A higher number of terrorist attacks increases the probability that people consider immigration a salient issue.
-Even when controlling for GDP per capita and unemployment, the effect remains positive and significant.
*But note:
-Unlike OLS, logit coefficients are not directly interpretable in percentage terms. They reflect log-odds, which are harder to interpret without transforming them into predicted probabilities.
-Instead of focusing on raw coefficients, we often compute:
.Predicted probabilities
.Marginal effects (e.g., “a one-unit increase in X increases the probability of Y by z%”)
diff multivariable OLS and logit model
- Dep variable
-Multivariable OLS: A continuous variable (e.g., % turnout, income, GDP)
-Logit (Logistic): A binary/dummy variable: 0 = no, 1 = yes (e.g., war/no war, vote yes/no) - Purpose: Method vS What it estimates
-Multivariable OLS= The average change in Y when Xs changes by 1 unit
-Logit= The probability that Y = 1, based on values of Xs - -OLS asks: “How much does voter turnout (% support) increase when education increases?”
-Logit asks: “How likely is a country to go to war (yes/no) when military spending increases?”
what is an interaction model
examines whether the effect of one independent variable (X₁) on the dependent variable (Y) depends on the level of another variable (X₂)
why do we use it?: Sometimes the relationship between X and Y is not the same for everyone or in all situations; so allows us to test conditional hypotheses like “X only matters if Z is present (or high/low).”
what are interaction terms
variables in a regression model that represent the combined effect of two (or more) independent variables (X) on the dependent variable (Y).
-used when you think: “The effect of X₁ on Y depends on the value of X₂.” So you multiply the two variables together:
-InteractionTerm=𝑋1×𝑋2
And include that as a new variable in the regression model.
what do interaction terms changes
Without the interaction term, 𝛽1 and β2 would be main effects that apply universally, regardless of the level of the other variable. But with the interaction, their interpretation becomes conditional.
what are constitutive terms of the interaction
When you add an interaction term (e.g., X1 × X2) to your regression, you must also include the individual variables (X1 and X2) separately in the model.
-> important because the interaction term by itself only tells you how the effect of one variable depends on the other.
But to understand the full model and interpret the results correctly, you need to include the main effects as well.
ex: Does the effect of terrorist attacks (X1) on migration concern (Y) depend on GDP per capita (X2)?
1. So you create an interaction term:X1×X2
2. Your regression model must include all of this:𝑌=𝛽0+𝛽1𝑋1+𝛽2𝑋2+𝛽3(𝑋1×𝑋2)+𝜀
what do coefficient tells you in interaction model?
You can have a linear regression with interactions
You can have a logistic regression with interactions
SO:
*for linear regression with interactions: coefficients is conditional:
-> the effect of one variable depends on the value of another variable.
So, a coefficient only tells you the effect of a variable when the other interacting variable is set to a specific value — usually zero.
ex:
-β₁ (X): the effect of X when Z = 0
-β₂ (Z): the effect of Z when X = 0
-β₃ (X × Z): how the effect of X changes depending on Z
*for logistic regression with interactions: coefficients tell you tell you how the log-odds of the outcome (proba of Y happening, Y=1) change when an IV increases by 1 unit.
when you do a multivar OLS, what do you look in stata
-> coefficient of one of the IV: ex nb of terrorist attacks (one of the Ivs) and immigration sallient (DV): .1771 then it means that holding all other variables constant, for every additional terrorist attack, public opposition to migration increases by 0.1771 units on average.
(what’s important is the sign and then p value)
-> then look with the p-value if it is statistically significant
-> also you can check R²