Final Exam Flashcards

Terms (82 cards)

1
Q

Character variable

A

(also called qualitative variable)
refers to a characteristic that can’t be quantifiable.
Categorical variables can be either nominal or ordinal.
variable that contains text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

External validity

A

generalizability of a study/research

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Counterfactual outcome

A

Potential outcome under whichever condition (treatment/
control) was not received in reality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Selection bias

A

results from improper selection of a cohort that does not closely represent the greater population for which the study aims to be applicable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ecological fallacy

A

incorrectly using aggregate data to make inferences about individuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Mean

A

The “average” number; found by adding all
data points and dividing by the number of data points.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Median

A

The middle number; found by ordering all
data points and picking out the one in the middle (or if there are two middle numbers, taking the mean of
those two numbers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Mode

A

The most frequent number — that is, the
number that occurs the highest number of times.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Standard deviation

A

Measures the average distance of the observations
to the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Randomized Experiment

A

is a type of study design in which treatment assignment is randomized

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Randomized Treatment

A

makes the treatment and
control groups on average identical to each other in all observed and unobserved pre-treatment characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Factual Outcome

A

potential outcome under the condition received in reality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Sample

A

Subset of individuals chosen for study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Representative Sample

A

accurately reflects the
characteristics of the population from which it is drawn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Random Sampling

A

makes the sample and the target popu-
lation on average identical to each other in all observed and unobserved characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Histogram

A

The histogram of a variable is the visual representation of its distribution through bins of different heights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Sampling frame

A

Complete list of individuals in a population that can be sampled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Unit nonresponse

A

Occurs when someone who has been selected to be part of survey sample refuses to participate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Item nonresponse

A

Occurs when survey respondent refuses to answer certain question.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Misreporting

A

Occurs when respondents provide inaccurate or false information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Average Treatment Effect

A

is defined as the average of the individual causal effects of X on Y across a group of individuals. It is the average change
in Y caused by a change in X for a group of individuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Outlier

A

Extreme value in the variable/ data point that differs significantly from other observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Pre-treatment Characteristics

A

Characteristics of the individuals in a study before the treatment is administered; by definition, these characteristics cannot be affected by
the treatment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Standard Deviation

A

average distance of each data point from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Experimental Data
Data collected from a randomized experiment
26
Observational Data
Data collected about naturally occurring events.
27
Correlation Coefficient
is a statistic that summarizes the relationship between two variables, X and Y, with a number denoted as cor(X,Y) in mathematical notation. It summarizes the direction and strength of the linear association between the two variables
28
Confounding Variable
confounding variable is a variable that affects both (i) the likelihood to receive the treatment X and (ii) the outcome Y
29
Frequentist interpretation of probability
is the proportion of its occurrence among infinitely many identical trials ► Example: probability of heads when flipping a coin
30
Bayesian interpretation of probability
probabilities represent one’s subjective beliefs about the relative likelihood of events ► Example: probability of rain in the afternoon
31
Bernoulli Distribution
probability distribution of a binary variable
32
Normal Distribution
approximation for many non-binary variables
33
Sampling Variability
Refers to the fact that the value of a statistic varies from one sample to another because each sample contains a different set of observations drawn from the target population
34
The Law of Large Numbers
As the sample size increases, the sample mean of X approximates the population mean of X
35
The Central Limit Theorem
As the sample size increases, the standardized sample mean of X can be approximated by the standard normal distribution
36
Hypothesis Testing
Methodology based on proof by contradiction: We start by assuming the contrary of what we would like to prove and show how this assumption leads to a logical contradiction
37
Null Hypothesis
what you are trying to disprove
38
Alternative Hypothesis
what you are trying to provide evidence for
39
P-Value
The P stands for probability and measures how likely it is that any observed difference between groups is due to chance.
40
Statistical Significance
the result is statistically significant at the 5% level when it is distinguishable from zero using 5% as the rejection threshold
41
Scientific Significance
a result is scientifically significant when it is large enough to be consequentiall
42
Z-statistic
test statistic whose distribution under the null hypothesis is the standard normal distribution
43
Significance Level
determines the rejection threshold of the test and characterizes the probability of false rejection of the null hypothesis.
44
Confidence Interval
provides the range of values that is likely to include the true value of the parameter
45
Margin of error
defined as half the width of the estimator's confidence interval
46
Estimation error
The estimation error is the difference between the estimate and the true value of the parameter.
47
Test statistic
A function of observed data that can be used to test the null hypothesis
48
Unbiased estimator
estimator for which the average estimation error over multiple samples is zero; estimator that provides, on average, accurate results
49
Standard Normal Distribution
The standard normal distribution is the normal distribution with mean 0 and variance 1.
50
Mutually exclusive events
events that do not share any outcomes
51
Sample space
Omega; the set of all possible outcomes produced by a trial; considered an event in itself
52
Trial (in terms of probability)
action or set of actions that produces outcomes of interest
53
Outcome (in terms of probability)
The result of a trial
54
Event (in terms of probability)
A set of outcomes; an event is said to occur if any one of the possible outcomes included in the event is realized
55
Post-treatment variable
Variables affected by the treatment: X ----> post-treatment variable
56
Coefficient of determination (R-Squared)
ranges from 0 to 1 and measures the proportion of the variation of the outcome variable explained by the model.
57
Multivariate linear regression model
This is a statistical model used to predict the value of one dependent variable based on two or more independent variables.
58
Slope of a linear regression
B (the Greek letter beta) is the slope coefficient
59
Y-intercept of a linear regression
Y is the outcome for observation i
60
Predictor variable
variable that we use as the basis for our predictions; predictors are also known as independent variables
61
Outcome variable
variable that we are trying to predict based on the values of the predictor(s); outcome variables are also known as dependent variables
62
Bivariate linear regression model
predicts the value of one dependent variable based on only one independent variable
63
Prediction error (error term) (residual)
measures how far our prediction is from the observed value; it is the difference between the observed outcome and the predicted outcome.
64
Scatter plot
is the graphical representation of the relationship between two variables, where one variable is plotted along the x-axis, and the other is plotted along the y-axis.
65
Frequency table of a variable
shows the values the variable takes and the number of times each value appears in the variable.
66
Two-way proportion table
shows the proportion of observations that take each combination of values of two specified variables.
67
Density histogram
histogram that uses densities instead of frequencies as the height of the bins, where densities are defined as the proportion of the observations in the bin divided by the width of the bin.
68
Difference-in-means estimator
is defined as the average outcome for the treatment group minus the average outcome for the control group
69
Numeric non-binary variable
A nonbinary variable can take more than two values, such as distonce={1.452, 2.345, 0.298} and dice_roll={2, 4, 6}.
70
Causal relationship
refers to the cause-and-effect connection between two variables in which a change in one variable systematically produces a change in the other
71
Critical value
cut-off point of the test statistic used to determine whether to reject the null hypothesis
72
Interquartile range (IQR)
Q3-Q1
73
Treatment group
observations that received the treatment
74
Control group
observations that did not receive the treatment.
75
Treatment variable
variable whose change may produce a change in the outcome variable
76
Independent variable
X treatment / predictor
77
Numeric binary variable
A binary variable can take only two values; we define binary variables as taking only 1 s and 0s
78
Internal validity
The internal validity of a study refers to the extent to which its causal conclusions are valid for the sample of observations in the study.
79
Two-way frequency table
shows the number of observations that take each combination of values of two specified variables.
80
1st/3rd quartile
25th percentile / 75th percentile
81
Dependent variable
Y / Outcome/ Effect
82
Bivariate linear regression model
This is a simpler model that predicts the value of one dependent variable based on only one independent variable.