stats midterm Flashcards

1
Q

Statistics:

A
  • The practice or science of collecting and analyzing numerical data in large quantities to interpret, summarize, and present it in a meaningful way.
  • A numerical fact or datum: a piece of data that provides information on a particular subject, often used in reference to quantitative research or studies.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Data

A

-Information, especially facts or numbers, collected to be examined and considered and used to help decision-making
-Information in an electronic form that can be stored and used
by a computer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data literacy

A

the combination of skills and mindsets that allows individuals to find insights and meaning within their data to enable effective, data-informed decision-making

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Data literacy imparts the skills and mindset to find

A

meaning
within data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Politics

A

-The activities of the government, members of law-making organizations, or people who try to influence the way a country is governed
-The relationships within a group or organization that allow
particular people to have power over others

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Political science

A

uses data to figure out the correct answer to important questions like these

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Two styles of research

A

-Qualitative
-Quantitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Qualitative research

A

based on information that cannot be easily measured, such as people’s feelings, rather than on information that can be shown in numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Quantitative research

A

related to information that can be shown in numbers and amounts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Topic

A

a matter dealt with in a text, discourse, or conversation; a
subject

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Theory

A

a plausible general principle or body of principles offered to
explain phenomena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A causal theory differs from a theory in that it

A

explicitly states the
relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Variable

A

a characteristic, number, or quantity that can be measured
or counted and can take on different values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Invariance

A

The property of remaining unchanged regardless of changes in the conditions of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

A hypothesis is even more - than a causal theory

A

specific

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A hypothesis - the variables

A

operationalizes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Operationalization

A

precisely defining the variables and how they are measured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Pre-registration

A

makes your hypothesis and plan for
hypothesis testing public

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Once you have - your research plan, you can test your hypotheses

A

pre-registered

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Hypothesis testing

A

the use of statistics on data to test a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Methodology

A

the use of statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Empirical analysis

A

the use of statistics on observational
data – not experimental data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Empirical testing

A

the use of statistics on observational data to test a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Hypothesis testing uses statistics to test:

A
  1. whether an association exists between the two variables,
  2. the strength of any association between the two variables, and
  3. the probability that the association between the two variables is
    due to random chance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Normative arguments include words like
“should” or “ought to.”
26
parsimonious
27
Time dimension
points at time in which your data changes
28
Time-series data
a sequence of data points collected or recorded at successive points in time, typically at equally spaced intervals, that represents how a particular variable or set of variables changes over time
29
Hierarchical dimension
the level at which your data changes
30
Multi-level data
data that is structured in multiple nested levels, where observations are grouped within higher-level units
31
Spatial dimension
geographic locations in which your data changes
32
Cross-sectional data
data collected at a single point in time from multiple units, such as states or countries, to analyze variations across those units
33
Moderator (Z)
a variable that influences the strength or direction of the relationship between an independent and a dependent variable in a study.
34
Mediator (Z)
a variable that explains the process or mechanism through which an independent variable affects a dependent variable, acting as an intermediary in the relationship11
35
Formal theory
a framework that uses mathematical models and logical structures to rigorously analyze and predict the behavior of complex systems or phenomena
36
Rational choice theory
individuals make decisions by systematically evaluating the costs and benefits to maximize their personal utility or advantage
37
Utility
the sum of all benefits of an action minus the sum of all costs from that action
38
Utility maximizer
an individual who seeks to make choices that yield the highest possible level of benefit based on their preferences and available options
39
Expected utility
the overall anticipated satisfaction or benefit (utility) derived from a particular choice or outcome
40
Game theory
a branch of formal modeling that focuses on analyzing strategic interactions between rational decision-makers, where the outcome for each participant depends not only on their own choices but also on the choices of others
41
The prisoner's dilemma
a classic game theory scenario where two individuals, who cannot communicate, face a choice between cooperating with each other or betraying one another
42
Social choice theory
a domain within formal modeling that examines how individual preferences can be aggregated to make collective decisions
43
Intransitive Preferences
a preference structure that violates the transitivity condition. For example, an individual might prefer option A over option B, option B over option C, but still prefer option C over option A (A > B, B > C, but C > A).
44
Spatial models
a specialized form of formal modeling that incorporate spatial or geographic dimensions into the analysis of strategic interactions.
45
Spatial models of voting
a formal modeling approach used to analyze how voters' preferences and spatial positioning influence electoral outcomes
46
Preference mapping
voters and candidates are positioned on a spatial map (often a one-dimensional or two-dimensional continuum) based on their ideological or policy preferences
47
Vote maximization
candidates choose positions or policies to maximize their votes, typically moving towards the median voter or the center of voter preferences to appeal to the largest segment of the electorate
48
Equilibrium analysis
The model identifies equilibrium points, where candidates' positions stabilize because any deviation would result in fewer votes. The most common equilibrium is the median voter theorem, where candidates converge to the preferences of the median voter
49
Causal relationship
a connection between two variables where one variable directly influences or determines the outcome of the other
50
Confounder
a variable that influences both the independent and dependent variables, potentially leading to a misleading or spurious association between them.
51
Spurious relationship
a false or misleading association between two variables that is actually caused by a third, confounding variable, rather than a direct causal link between the two
52
Control variable
a variable or condition that is held constant or regulated in an experiment or study to isolate the effect of the independent variable on the dependent variable, ensuring that the results are not influenced by extraneous factors
53
Deterministic relationship
a connection between two variables where one variable's value is precisely determined by the value of the other, with no randomness or uncertainty involved
54
Probabilistic relationship
a connection between two variables where changes in one variable are associated with changes in the likelihood or probability of different outcomes in the other variable, but the relationship is not perfectly predictable
55
Observational data
information collected from real-world observations or measurements without conducting experiments
56
Experimental data
information collected from experiments where variables are systematically manipulated to observe their effects on other variables, allowing for causal inferences
57
Randomized controlled trials (RCTs)
experimental studies where participants are randomly assigned to either a treatment group or a control group to evaluate the effectiveness of an intervention while minimizing biases
58
Treatment group
a group of participants in a study that receives the treatment or intervention being tested, allowing researchers to assess its effects compared to a control group
59
Random assignment
the process of randomly allocating participants to control and treatment groups in a study to ensure that each group is comparable and to eliminate selection bias
60
Selection bias
when the sample of participants in a study is not representative of the population being studied, leading to distorted or unrepresentative results
61
Randomized controlled trials are considered the gold standard for causal research because they can cross the
four causal hurdles.
62
Experiments can exhibit low levels of
external validity
63
External validity
the degree to which one can be confident that the results of an analysis apply to the broader population
64
Natural experiments
experiments that leverage naturally occurring random variations or events to investigate causal effects, without direct manipulation of the independent variable by the researcher
65
Natural experiments exhibit high levels of
internal validity
66
Controlled experiments
studies that compare the effects of an intervention or treatment between pre-selected groups that are not randomly assigned, aiming to assess causal relationships while controlling for confounding variables
67
Quasi-experiments
research designs that aim to evaluate interventions or treatments without full randomization, often using pre-existing groups or natural conditions to infer causal relationships
68
Observational research
research designs in which the researcher does not have control over values of the independent variable because the independent variable occurs naturally
69
Survey item
a specific question or statement in a survey designed to gather data on a particular aspect of a respondent's attitudes, opinions, or behaviors
70
Open-ended items
items that allow respondents to provide their answers in their own words
71
Ranking item
item that asks respondents to rank a list of choices according to their preferences or importance
72
Likert scale
response options that allow respondents to rate their level of agreement or disagreement with a series of statements on an interval scale, typically ranging from "strongly disagree" to "strongly agree
73
Binary response option
a type of response with only two choices
74
Multi-item scales
multiple questions or items that measure a single underlying construct
75
Scale validation
the process of assessing whether a multi-item scale accurately and reliably captures the construct it is intended to measure, ensuring that it reflects the intended attributes and performs consistently across different contexts and populations
76
Demographic items
data collected about respondents' characteristics, such as age, gender, education level, income, and ethnicity
77
Population
the entire group of individuals or units from which a sample is drawn and to whom the survey findings are intended to generalize
78
Sample
a subset of individuals or units selected from a larger population for the purpose of conducting a survey or study to draw conclusions about the entire population
79
Sample
to select and examine a subset of a population or data set to draw conclusions or make inferences about the larger population
80
Sample size (N)
the number of individual units or observations selected from a population for a study, used to ensure the results are statistically reliable and representative of the larger group
81
Statistical power
the probability that a statistical test will correctly reject a false null hypothesis, thereby detecting an effect or relationship if one truly exist
82
Representative sample
a subset of a population that accurately reflects the characteristics and diversity of the larger group, allowing the results to be generalized to the entire population
83
Probability sample
when each member of the population has a known, non- zero chance of being selected for the sample, allowing for statistical inference and generalization to the population
84
Non-probability sample
when members of the sample are not selected at random, making it difficult to determine the likelihood of any member being chosen and limiting the ability to generalize the findings
85
Convenience samples
a type of non-probability sample where participants are selected based on their easy availability and proximity to the researcher, rather than through random sampling, which can lead to biases and limited generalizability
86
Quantitative research
a method of inquiry that focuses on collecting and analyzing numerical data to identify patterns, test hypotheses, and make generalizations about a population
87
Conceptual clarity
forming a precise definition for and clear understanding of the concepts being studied
88
Concept
a broad, abstract idea or general notion that provides a foundational understanding
89
Construct
a specific, measurable version of a concept used in research to operationalize and test theoretical ideas
90
Face validity
the extent to which a measurement tool appears to measure what it is supposed to measure, based on casual inspection
91
Construct validity
the extent to which a variable or measurement is related to other measures that theory suggests should be related
92
Content validity
the extent to which a variable or measurement accurately represents all of the elements that define the concept it is intended to measure
93
Reliability
the consistency and stability of a measurement tool across repeated applications
94
Survivorship bias
when only the entities that have "survived" a particular process are considered, leading to a skewed understanding or conclusion.
95
Qualitative research
a method of inquiry that focuses on understanding and interpreting the meanings, experiences, and perspectives of individuals or groups through non-numerical data, such as interviews, observations, and texts
96
Categorical variables
represent categories or groups and do not have a numeric value
97
Nominal variables
categorical variables with no inherent order or ranking among the categories.
98
Ordinal variables
categorical variables that have a meaningful order or ranking, but the intervals between the categories are not necessarily equal.
99
Numerical variables
represent quantities and can be measured on a numeric scale
100
Continuous variables
can take any value within a range and can be subdivided into finer increments with equal unit distances
101
Discrete variables
can only take specific, distinct values, often counts or integers
102
Rank statistics
a class of statistics used to describe the variation of continuous variables based on their ranking from lowest to highest values
103
Quartile
a statistical term that divides a dataset into four equal parts, with each quartile containing 25% of the data
104
Box-whisker plot
a graphical representation of data that displays the median, quartiles, and potential outliers, using a box to show the interquartile range and "whiskers" to indicate the range of the data
105
Moments
numerical measures derived from the data values themselves and their positions relative to the mean or origin
106
The zero-sum property of the mean
if you subtract the mean of a dataset from each data point, the sum of these deviations will always be zero
107
The mean of a variable is often called its
expected value because it is the value you would most expect the variable to take.
108
Variance (second moment)
a measure of the dispersion of a variable around its mean
109
Standard deviation
another measure of the dispersion of a variable around its mean.
110
Kernal density plot
a visual depiction of the distribution of a single variable based on a smoothed calculation of the density of cases across the range of values
111
Skewness (third moment)
a measure that indicates the symmetry of the variable’s distribution around the mean
112
Kurtosis (fourth moment)
a measure that indicates the steepness of the distribution of a variable
113
Even when we go all out to get information about every U.S. citizen in the Census, we still have
lots of nonrespondents.
114
Convenience sample
a sample such that each member of the underlying population does NOT necessarily has an equal probability of being selected.
115
Statistical inference
the process of using what we know about a sample to make probabilistic statements about the broader population.
116
Parameters
parameters are numerical values that describe certain characteristics or features of a sample or an entire population, such as the mean, variance, or proportion.
117
Central limit theorem
a fundamental result from statistics indicating that if one were to collect an infinite number of random samples and plot the resulting sample means, those sample means would be distributed normally around the true population mean
118
Distribution
a mathematical function that describes the probabilities of different outcomes in a random variable or set of data
119
Data generating process
the underlying mechanism or model that describes how data is produced and collected
120
Independent outcomes
an outcome whose occurrence is not influenced by the outcome of another event.
121
Normal distribution
a bell-shaped statistical distribution that can be entirely characterized by its mean and standard deviation.
122
standard deviation numbers
* One standard deviation in each direction captures 68.3% of the area under the curve. * Two standard deviations in each direction captures 95.5% of the area under the curve. * Three standard deviations in each direction captures 99.7% of the area under the curve.
123
Standard error (of the mean)
the standard deviation of the sampling distribution means. -It is the measure of the variability or dispersion of sample means around the population mean
124
Confidence intervals
a probabilistic statement about the likely value of a population characteristic based on the observations in a sample.
125
hypothesis
a testable statement predicting a relationship or effect between variables, often framed as an expectation of what will happen
126
null hypothesis
a specific type of hypothesis that assumes no effect or no difference between variables and serves as a baseline to test against
127
Counterfactual
an alternative scenario or condition that contrasts with the proposed effect or relationship in the hypothesis, effectively serving as the null hypothesis which assumes no effect or difference
128
Critical value
a predetermined threshold derived from a particular statistical distribution used to conduct a statistical test
129
Significance level
the probability of rejecting the null hypothesis when its actually true, representing the threshold for statistical significance.
130
Test statistic
a value calculated by: * identifying the sample statistic (e.g., the mean), * determining its standard error (e.g. standard error of the mean), and * using a specific formula to assess how far the sample result deviates from the null hypothesis
131
p-value
the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed, assuming the null hypothesis is true
132
In the social sciences, the standard p-value threshold is
p = 0.05.
133
Statistical significance
an indication that an observed effect or relationship in the data is unlikely to have occurred by random chance alone. (assuming the null hypothesis is true and the study is repeated an infinite number of times by drawing random samples from the same population, less than 5% of these results will be more extreme than the current result.)
134
When a result is statistically significant, that does not mean that
the alternative hypothesis is proven to be true. It just means you can reject the null hypothesis
135
Chi-squared test of tabular association
a statistical test that evaluates whether observed categorical data align with the expected frequencies based on a specific hypothesis
136
Contingency table
a matrix that displays the frequency distribution of two categorical variables, showing how their values intersect
137
Degrees of freedom
the number of independent values or quantities that can vary in a statistical calculation, typically indicating the number of values that are free to vary after certain constraints are applied
138
The shape of the Chi-square distribution depends on the
degrees of freedom
139