2. hypotheses and comparisons Flashcards
(23 cards)
why do we need comparisons and hypotheses in political reaserch?
In political science—especially when we study international relations (IR)—we’re often asking “why” questions:
Why do some countries go to war while others don’t?
Why do trade agreements succeed in some regions but fail in others?
To answer these questions scientifically, we need to build and test theories.
def and utility (in pol sc) of a theory
A theory is a structured way to explain how and why things happen.
In political science, theories help us:
Identify important factors (e.g. power, interests, institutions)
Understand causal relationships (e.g. Does more democracy reduce the likelihood of war?)
Propose explanations that we can then test with real-world data
➡️ So, theories give us hypotheses (specific, testable claims) that we can compare using evidence.
def comparisons
how we spot patterns (e.g. comparing democratic vs authoritarian regimes).
def hypotheses and how to make it useful
specific prediction derived from a theory (e.g. If a state is democratic, it is less likely to go to war with another democracy)
->To make hypotheses useful, we must:
Define our concepts clearly (what do we mean by “democracy”?)
Measure them reliably (how do we measure “likelihood of war”?)
what is science according to Popper
true scientist is not someone who has all the answers, but someone who keeps questioning, testing, and trying to disprove ideas.
🧪 Falsifiability:
A scientific statement must be falsifiable. That means:
It must be possible to imagine evidence that would prove it wrong.
Examples:
✔️ “Countries with more trade are less likely to go to war” – testable ✅
❌ “Peace is caused by good vibes” – not testable ❌
explain “all models are wrong, but some are useful” from George Box (+ his def of a model)
no model can capture every detail of reality—especially in the complex world of international politics. But that’s okay!
A model is:
-A simplified, abstract representation of reality.
-A tool to help us focus on the essential parts of a theory.
-A way to communicate ideas clearly and test them empirically.
-Even though models simplify, they can still be very useful—especially when they:
-Help us understand political behavior
-Let us predict outcomes
-Clarify causal relationships
def causal explenation
a logical story that tells us:
How and why something happens
What causes what
We use variables to express these ideas:
Type of Variable_Role in the Theory_Example
-> Independent Variable (IV)_The cause_Level of democracy in a country
-> Dependent Variable (DV)_The effect_Likelihood of war with neighbors
🧩 A good causal theory tells us why a change in the IV causes a change in the DV.
What Makes a Good Explanation (or Theory)?
- causal mechanism (A good explanation identifies how the cause leads to the effect)
2; A Good Explanation Is Causal - hypotheses (how you test theories)
- making comparisons
explain a good explenation is causal as a component of a good explenation
A good theory doesn’t just describe what’s happening—it explains why it happens. It connects:
A dependent variable (DV) – the outcome or effect you’re trying to explain
To one or more independent variables (IVs) – the causes or influences
🔄 “X causes Y because…” – that’s the heart of causal explanation.
EX:
poor theory: Why do some people support increasing the Social Security budget while others don’t?
Here, the dependent variable is: opinion about the Social Security budget.
->✖️ Poor Explanation (Tautology):
People support it because they think we should spend more on it.
This isn’t helpful—it’s circular and non-causal.
🤏 Slightly Better:
Democrats and Republicans have different opinions on Social Security.
Okay, this points toward a causal factor (party affiliation), but it’s too vague. How and why does partisanship shape those views?
✅ Much Better Explanation:
Party identification shapes people’s views because partisanship forms early, often through parental influence. Later, citizens look to their party’s leaders for cues. Since Democratic leaders tend to support social programs, Democrats are more likely to support Social Security spending.
This is a causal process, and it’s testable. It shows:
A clear link between IV and DV
A mechanism (how partisanship influences views)
A plausible, research-based story
explain hypotheses as a component of a good explenation
A hypothesis is a testable statement about the relationship between IV and DV.
Good hypothesis format:
In a comparison of [units of analysis], those with [a value on IV] will be more likely to have [a value on DV] than those with [a different value on IV].
🔁 For every research hypothesis, there’s a null hypothesis:
It says there is no relationship between the IV and DV.
You test whether you can reject this null hypothesis with evidence.
explain making comparisons as a component of a good explenation
Hypotheses suggest comparisons. They imply a research design.
Testing depends on variable types:
IV Type | DV Type | Method
Categorical | Categorical | Cross-tabulation
Categorical | Interval | Mean comparison
*categorical= nominal or ordinal
def categorical variable
(also called Qualitative Variables):
These variables represent categories or groups.
They can be divided into distinct categories that don’t have a meaningful order or ranking (nominal variables), or they can have a specific order (ordinal variables).
def numerical variable
(also called Quantitative Variables):
These variables represent measurable quantities and can be expressed numerically.
Discrete variables: Countable values, typically integers. Examples: number of children, number of cars.
Continuous variables: Can take any value within a range. Examples: height, weight, temperature.
what are the 3 rules for cross-tabs
- Rule 1:
◦ The independent variable (IV) defines the columns.
◦ The dependent variable (DV) defines the rows.
◦ Each cell contains the raw number of cases.
◦ Each column is totaled at the bottom.- Rule 2:
◦ Always calculate percentages by the categories of the IV (i.e. down each column).
◦ This helps you compare people within each IV category (e.g. within Democrats or within Republicans). - Rule 3:
◦ Compare percentages across IV categories for a given value of the DV.
◦ Example: Compare the percentage of Democrats vs. Republicans who favor increased spending.
- Rule 2:
✅ When to Use:
* Both IV and DV are nominal or ordinal (e.g. party ID, opinion categories, gender, education level).
when do we use mean comparison table
when the DV is interval-level and the IV is nominal or ordinal.
📐 Example Hypothesis:
“In a comparison of countries, those having higher per capita GDP will ratify more international environmental treaties than countries with lower GDP.”
📋 What It Does:
* It shows the average (mean) value of the DV for each group defined by the IV.
* This allows us to compare group means and test whether the IV is associated with differences in the DV.
✅ When to Use:
* DV is interval (e.g. number of treaties ratified, average income, voting turnout rate).
* IV is nominal or ordinal (e.g. region, democracy vs. authoritarian, income group).
as a summary, describe those concepts:
1 Hypothesis
2 Null Hypothesis
3 IV (Independent Variable)
4 DV (Dependent Variable)
5 Causal Mechanism
6 Intervening Variable
7 Cross-tabulation
8 Mean comparison
1 A testable claim about the relationship between IV and DV
2 Asserts that there is no relationship between IV and DV
3 The presumed cause
4 The presumed effect
5 Explains how and why the IV affects the DV
6 A variable that lies in between the IV and DV, helping to explain the process
7 Use when both IV and DV are categorical
8 Use when DV is interval-level and IV is categorical
what are bar charts
Purpose: Best used when your independent variable (IV) is nominal—that is, it consists of distinct categories with no inherent order (e.g., gender, religion, party ID).
Reminder: here the vertical axis does not necessarily show percentages of cases in each category of the IV. Instead, it often shows:
-The mean value of the dependent variable (DV) within each category of the IV, or
-Another summary statistic relevant to the DV.
what are line charts
Usefulness: Ideal for interval or ordinal independent variables, where values follow a meaningful order (e.g., age, income brackets).
Advantage: Higher data-ink ratio (from Edward Tufte’s design principles), meaning they:
-Communicate patterns in data more clearly
-Use less visual clutter (e.g., no need for full bars when a line does the job)
Application:
-Traditionally paired with mean comparisons
-Can depict changes over time or gradients in responses across ordered categories
line and bar charts when to use which?
Chart Type | IV Type | Typical Use Case:
Bar Chart | Nominal | Cross-tabulations, group comparisons
Line Chart | Ordinal / Interval | Trend analysis, mean differences across ordered categories
pos and neg relationship btwn variables
ONLY FOR ordinal, interval and ratio-level variables not nominal bcs Nominal categories don’t increase or decrease, instead, we talk about differences in proportions or probabilities
Positive (Direct): As the IV increases, the DV also increases (e.g., education level ↑ → political interest ↑).
Negative (Inverse): As the IV increases, the DV decreases (e.g., income ↑ → likelihood to vote for a particular candidate ↓).
linear and non-linear relationship btwn variables
*pos and neg are often linear relationships, where:
-The change in DV is consistent across values of the IV.
-Only meaningful when the IV is interval (equal unit spacing matters).
*Non-linear Relationships: Curvilinear
These don’t follow a straight line and suggest that the effect of the IV on the DV depends on the value or range of the IV.
Common shapes:
-U-shaped: Very low and very high values of the IV correspond to high DV values (e.g., political activism is high among both low- and high-income earners).
-Inverted U: Middle values of IV show higher DV than extremes.
-Example: A V- or U-shape may show that moderate income earners are less supportive of a policy compared to both low and high-income groups.
can the median can be equal to Q1 or Q3
Yes, but it’s unusual.
It implies that at least 25% of the data fall at a single point. This often happens in heavily skewed or clustered distributions.
For example: a median = Q1 suggests that the bottom 50% of values are tightly packed or identical, while the top 50% vary more.
Should the median line in a box plot split the box into two equal parts?
Ideally, yes—if the distribution is symmetric.
But in practice, the position of the median line reflects the skewness of the data:
If the median is closer to Q1, it’s right-skewed
If the median is closer to Q3, it’s left-skewed