Midterm Flashcards
learn (122 cards)
Correlation
The correlation between two features of the world is the extent to which they tend to occur together.
Positively correlated:
When higher (lower) values of one variable tend to occur with higher (lower) values of another variable, we say that the two variables are positively correlated .
Negatively correlated
When higher (lower) values of one variable tend to occur with lower (higher) values of another variable, we say that the two variables are negatively correlated
Uncorrelated:
When there is no correlation between two variables, meaning that higher (lower) values of one variable do not systematically coincide with higher or lower values of the other variable, we say that they are uncorrelated .
Line of best fit:
A line that minimizes how far data points are from the line on average, according to some measure of distance from data to the line
Mean ( μ)
The average value of a variable
Deviation from the mean:
the distance between an observation’s value for some variable and the mean of that variable
Variance ( σ2)
A measure of how variable a variable is . It is the average of the square of the deviations from the mean
Standard deviation ( σ)
Another measure of how variable a variable is. The standard deviation is the square root of the variance .t has the advantage of being measured on the same scale as the variable itself and roughly corresponds to how far the typical observation is from the mean (though, like the variance, it puts more weight on observations far from the mean)
Covariance (cov)
measure of the correlation between two variables. It is calculated as the average of the product of the deviations from the mean
Covariance is a measure of how much two variables change together. It indicates whether an increase in one variable corresponds to an increase or decrease in another
However, it is not standardized and depends on the units of the variables involved, making it hard to interpret the strength of the relationship.
Correlation coefficient (r)
Another measure of the correlation between two variables. It is calculated as the covariance divided by the product of the variances . The correlation coefficient takes a value between − 1 and 1, with − 1 reflecting perfect linear negative dependence, 0 reflecting no correlation, and 1 reflecting perfect linear dependence
The square of the correlation coefficient (r^2)
It takes values between 0 and 1 and is often interpreted as the proportion of the variation in one variable explained by the other variable. But we have to pay careful attention to what we mean by “explained. ” Importantly, it doesn’t mean that variation in one variable causes variation in the other
It provides an indication of how well the regression equation fits the data. A higher r^2 indicates a better fit, suggesting that the independent variable is successful in explaining or predicting variations in the dependent variable.
Strength of Relationship: The closer
r^2 is to 1, the stronger the linear relationship between the two variables. If r^2 is close to 0, it suggests a weak or no linear relationship.
Interpretation: For example, if
r^2 is 0.75, it means that 75% of the variance in the dependent variable is explained by the independent variable.
sum of squared errors
The sum of the square of the distance from each data point to a given line of best fit . This gives us one way of measuring how well the line fits/describes/explains the data .
OLS regression Line
The line that best fits the data, where best fits means that it minimizes the sum of squared error .
Slope of the regression line or regression coefficient:
The slope of the regression line describes how the value of one variable changes, on average, when the other variable changes . The slope of the regression line is the covariance of two variables divided by the variance of one of them, sometimes also called the regression coefficient .
Causal Effect
Informally, the change in some feature of the world that would result from a change to some other feature of the world. Formally, the difference in the potential outcomes for some unit under two different treatment statuses
Counterfactual comparison
A comparison of things in two different worlds or states of affairs, at least one of which does not actually exist.
Treatment
Terminology we use to describe any intervention in the world. We usually use this terminology when we are thinking about the causal effect of the treatment, so we want to know what happens with and without the treatment. Importantly, although it sounds like medical terminology, treatment as we use it can refer to anything that happens in the world that might have an effect on something else
Potential outcomes framework
framework for representing counterfactual EQUATION:
Potential Outcome
The potential outcome for some unit under some treatment status is the outcome that unit would experience under that (possibly counterfactual) treatment status
Fundamental problem of causal inference:
This refers to the fact that, since we only observe any given unit in one treatment status at any one time, we can never directly observe the causal effect of a treatment.
Heterogeneous treatment effects
When the effect of a treatment is not the same for every unit of observation (as in the case of flu shots and virtually every other interesting example of a causal relationship), we say that the treatment effects are heterogeneous. Sometimes we’re still interested in the average effect even though we know the treatment effects are heterogeneous, and sometimes we want to explicitly study the nature of the heterogeneity. (In contrast, when discussing the unlikely possibility that treatment effects are the same for every unit, we would refer to homogeneous treatment effects.)
Selecting on the dependent variable
Examining only instances when the phenomenon of interest occurred, rather than comparing cases where it occurred to cases where it did not occur.
Dependent variable:
The variable associated with the outcome we are trying to describe, predict, or explain.