4. foundations of statistical interference Flashcards

Question 1

Q

def statistical interference (=déductions)

Answer

A

the process of using data from a sample to make estimates or decisions about a population (the full group you care about).

Imagine you want to know what all voters in a country think about a policy. Instead of asking all 50 million people, you survey just 1,500. Statistical inference helps you draw conclusions from those 1,500 responses, and estimate what the whole population thinks.

Question 2

Q

def population; sample; population parameter; sample statistics

Answer

A

-pop= the full set of people or things you’re interested in (e.g., all voters).
-sample= smaller group drawn from the population (e.g., 1,500 surveyed voters).
-pop parameter= a true but unknown value (e.g., % of all voters who support a policy).
- sample statistics= an estimate based on the sample (e.g., % of surveyed voters who support it).

Question 3

Q

def random sampling

Answer

A

A random sample means every member of the population had an equal chance of being selected.

This minimizes bias and makes your estimates more accurate.

Question 4

Q

what is the Central Limit Theorem (CLT)

Answer

A

When you take repeated random samples from a population and calculate their means, the distribution of those means (the sampling distribution of the mean) tends toward a normal distribution as the sample size increases — regardless of the shape of the population distribution.

2 cases:
*If the population is already normally distributed:
Then even small samples (e.g., less than 30) will produce sample means that are normally distributed.

*If the population is not normally distributed: Then you usually need a sample size of 30 or more for the CLT to apply and for the sample means to approximate a normal distribution.

It allows us to use the normal distribution for inference even when the population is not normally distributed.

Question 5

Q

what does sampling distribution of the mean means?

Answer

A

When you take repeated random samples from a population and calculate their means, it is the distribution of those means

Question 6

Q

def random sampling error and standard error

Answer

A

*Random sampling error: the natural difference between the sample statistic ( an estimate of the population parameter) and the actual population parameter.

*Standard error (SE): the average size of that error across many samples.
-Smaller SE → more precise estimate.
-SE gets smaller as the sample size increases.

Question 7

Q

what is the CI

Answer

A

Confidence interval: abt precision and certainty (95%-> see seminar CI (2))
gives a range in which we expect the true population parameter to lie.

For example, if 60% of a sample supports a policy, the 95% CI might be [57%, 63%].

This means we are 95% confident the true percentage of supporters in the whole population is between 57% and 63%.

another ex: A 95% confidence interval means that in the long run, 95% of such intervals will contain the true parameter

Question 8

Q

def sampling distribution

Answer

A

the theoretical distribution of a statistic (like a mean) across all possible random samples from a population.

It shows us how much variation we can expect just by chance, and it helps define the standard error

-> EX of sampling distribution (here with the mean): farm with 1,000 eggs. You’re trying to estimate the average weight of all your eggs.
Step 1: Take a sample
You randomly pick 5 eggs, weigh them, and calculate the average:
Sample 1: 58g, 62g, 59g, 61g, 60g → Sample mean = 60g
Then you do it again and again…
Step 2: Collect all the sample means
You now have 1,000 sample means: 60g, 59g, 61g, 58.5g, 60.2g, 59.8g, …
You plot those 1,000 means on a graph. It forms a sampling distribution of the mean — a new curve showing how sample means vary.

Question 9

Q

def distribution

Answer

A

describes how often different values of a variable occur. It can be:

Discrete (e.g., number of children: 0, 1, 2, …)

Continuous (e.g., height, weight, test scores)

Each distribution has:

A shape (e.g., bell-shaped, skewed)

A center (mean, median, or mode)

A spread (variance or standard deviation)

Graph: Distributions are often visualized with histograms or smooth curves (like the bell curve).

Question 10

Q

difference btwn normal and population distribution

Answer

A

ex: Let’s say you’re studying public support for joining NATO in a region:

The population distribution is the real distribution of support across all people in that region. Maybe it’s skewed if one age group dominates.

You take a sample and assume that the sampling distribution of the mean (according to the Central Limit Theorem) is normally distributed, which lets you calculate a confidence interval.

Question 11

Q

what is a normal distribution

Answer

A

specific, continuous distribution:
Bell-shaped and symmetric around the mean

Defined by two parameters:
-Mean (μ): the center
-Standard deviation (σ): how spread out the data are

🔢 The 68-95-99.7 Rule (Empirical Rule):
-About 68% of the values lie within + or - 1σ of the mean
-95% within ±2σ
-99.7% within ±3σ

🔁 Mode, Median, Mean
In a perfect normal distribution: Mean = Median = Mode

ex: If you’re studying public support for sanctions in different EU countries and survey 1,000 people in each country, the distribution of responses (like support level from 1 to 10) might form a normal distribution. You could then say: “In most countries, support centers around 6, with few people giving very low (1–2) or very high (9–10) scores.

Question 12

Q

what is standardization

Answer

A

to compare across different scales (ex: Math test scores out of 100 with Reaction times in seconds and Heights in centimeters), we use standardization:

A Z-score is a way to standardize a value — it tells you:

❝How far is this value from the average, measured in standard deviations?❞

Formula:
𝑍=(𝑋−𝜇)/ 𝜎
Where:
X = your value (e.g., a score)
μ (mu) = the mean (average)
σ (sigma) = the standard deviation (how spread out the data is

ex: The average test score is 70; The standard deviation is 10; A student scored 85
Then the Z-score is:

𝑍= (85−70)/10= 1.5
🔍 This means the student scored 1.5 standard deviations above the average.

The standard normal distribution has:
Mean = 0; SD = 1
This lets you apply probabilities and percentiles universally using the Z-table.
-> EX: 2 tests, one maths and one histroy: Your Math Z-score = 2 → You scored 2 standard deviations above the mean
Your History Z-score = 1

Question 13

Q

margin of error def

Answer

A

the range above and below your sample estimate where you expect the true population value to lie.
-> ME = Critical Value × SE; with:
*Critical Value depends on the confidence level (e.g., 95%, 99%) and comes from a probability distribution (z-distribution for large samples, t-distribution for small ones).
*SE is the standard error, which measures how much your sample statistic (e.g., the sample mean) would vary from sample to sample.

Question 14

Q

we have to trade off cost and precisions while making samples

Answer

A

Larger samples give more precise estimates (smaller SE → smaller ME).

But larger samples are more expensive or time-consuming to collect.

Hence, there’s a trade-off between the cost of collecting more data and the precision of the estimate.

Question 15

Q

what is the student’s t-distribution?

Answer

A

When the sample size is small (typically under 30), we use the Student’s t-distribution instead of the normal distribution:
*The t-distribution is wider (more uncertainty) and depends on degrees of freedom (sample size - 1).
*The smaller the sample, the fatter the tails of the distribution, leading to larger critical values → wider confidence intervals.

Question 16

Q

what are population size; population mean; x̄ (“x bar”); and pop standard deviation

Answer

Study These Flashcards

A

-pop size: rep by N and assumed to be very large (bcs most of the time unknown)
-pop mean: rep by mu (grec sign) and often unknown
- x̄ (“x bar”): the sample mean
-pop standard deviation: rep by sigma (o in grec) and measures variation in a pop characteristic

Question 17

Q

what is s; n; P; q

Answer

Study These Flashcards

A

Sample standard deviation
sample size
Proportion of sample with a certain value
q=p bar

Question 18

Q

standard error of a sample proportion

Answer

Study These Flashcards

A

How much p might vary from the real population proportion
Formula: SE = √[pq / n]
🔹 Example: If p = 0.2, q = 0.8, and n = 1000:
SE = √[(0.2)(0.8)/1000] = √[0.16/1000] ≈ 0.0126
This SE is then used in CI: p ± 1.96 × SE → 0.2 ± 0.025 → [0.175, 0.225]

Question 19

Q

what is the difference btwn dispersion and distribution and deviation

Answer

Study These Flashcards

A

Distribution: The overall pattern of how data values are spread across the range of values.
*Focuses on: The shape, center, and spread of the data.
*Examples of distribution types:
-Normal distribution (bell-shaped)
-Uniform distribution (evenly spread)
-Bernoulli distribution (sucess-failure)
Dispersion: The degree to which data values vary from each other and from the average (mean/median).
*Focuses on: How spread out the values are within the distribution.
*Common measures of dispersion:
-Range (max - min)
-Variance
-Standard deviation
-Interquartile range (IQR)
deviation: difference between an individual data point and the mean (average) of the dataset.

Question 20

Q

what are the 2 types of distribution?

Answer

Study These Flashcards

A

*uniform distibution (random sampling; mean is the center +sd)
*Bernoulli distribution (ex coin: 50%-50% pile ou face)

Question 21

Q

what is population shape?

Answer

Study These Flashcards

A

the distribution of values in the entire population. It can be:

Normal: bell-shaped and symmetric

Skewed: values are stretched more to one side (right or left)

Uniform: all values are equally likely

Bimodal or Multimodal: has two or more peaks
ex: normal: wolen’s height in a country (almost everyone around the mean and few bigger or smaller)
Knowing the shape is useful for choosing appropriate statistical methods.

Question 22

Q

random sampling =/ random assignment

Answer

Study These Flashcards

A

Random Sampling — WHO gets into the study
🔹 Definition:
A method of selecting individuals at random from a larger population so that each person has an equal chance of being included in the sample.

🔹 Purpose:
To ensure the sample represents the population → increases external validity (generalizability).

🔹 Example (IR context):
You want to study public opinion on international aid. You randomly select 1,000 citizens from across 50 countries to participate in a survey.

🎲 2. Random Assignment — WHO gets what treatment
🔹 Definition:
Once you have your sample, you randomly assign participants to different groups (e.g., treatment vs. control).

🔹 Purpose:
To ensure that differences between groups are due to the treatment, not pre-existing differences → increases internal validity (causal inference).

🔹 Example (IR context):
From your sample of 1,000, you randomly assign half to receive a news article about aid success, and the other half to receive a neutral article. Then you compare how opinions differ between groups.

4. foundations of statistical interference Flashcards

(22 cards)