Midterm Flashcards
(86 cards)
What is the Sample Average Treatment Effect? (SATE)
How do you find it?
SATE = mean of the Treatment variable - mean of the control variable
formula:
SATE 1/n * sum (Yi(1) - Yi(0))
How do you find Mean?
What is a detriment to using mean?
add together all of the numbers and divide the sum by the total amount of numbers
Detriment: can be influenced by outliers which pull the average too high or too low.
How do you find the median?
What is the benefit of using median
If you have an odd amount of numbers locate the exact middle number.
If you have an even amount of numbers locate the two middle numbers, add them together, and divide the sum by 2.
benefit: more robust against the impact of outliers
How do you find Range?
subtract the minimum number from the maximum number
How do you find the interquartile range?
Subtract Q1 from Q3
How do you determine if a number is an outlier?
You must find the highest and lowest limit of the dataset for non-outlier numbers. To find the lowest acceptable number take Q1 - 1.5IQR. To find the highest acceptable number take Q3 + 1.5IQR. If the number in question is below or above either of these numbers it is an outlier.
How do you find the three Quartiles?
Start by finding the median of the entire list. The median is considered Q2. The median then separates the list into two halves. Locate the median of the first half of the list, this median is Q1. Locate the median of the second half of the list, this median is Q3.
How do you find Standard Deviation? What is the formula?
Formula:
SD = sqrt (1/n-1 * sum (xi-mean of X)^2
Steps:
1.) find the mean of X
2.) Subtract the mean from each x variable
3.) Square each result from step 2
4.) Add together all the squares
5.) Divide the sum of the squares by the total number of observations minus 1
6.) Square root the result of step 5
The result of step 6 is the standard deviation
How do you find variation
SD^2
Square the standard deviation
What is the formula for the Correlation Coefficient (r)
<you will not compute this by hand!>
What is r telling you?
How is it written?
how is it described?
r= 1/(n-1) * sum of ((Xi-mean of x/ SD of X) * Yi-mean of y/SD of Y))
R tells you
-The strength and direction of a relationship between variables.
-How similar the measurements of two or more variables are across a dataset.
- How close the variables move together
will be between -1 and 1.
Can be described high or low, positive or negative, or no correlation (0).
How do you find mean prediction error?
1.) for each variable point subtract the predicted value from the actual value of the point.
2.) Add all of those values together.
3.) divide the sum by the total number of values.
How do you find the root mean square error (RMSE)?
RMSE = sqrt (RSS/n)
1.) find the value of RSS (subtract predicted y from real y, square the results, add the squares
2.) divide RSS by the total number of values
3.) square root the result
What is the equation for a linear regression model?
Y= α +βX + ε
What do the variables in the linear regression model mean?
Y= α +βX + ε
Y: dependent variable, what you are trying to predict
α: alpha, is the y-intercept. Where y is when X=0
β: Beta, slope, the increase in Y when X has a one-unit increase
X: independent variable, the predictor
ε: error term, the observed error.
How do you find residuals (the error term)
Actual y - predicted y
How do you find the residual sum of squares (RSS)?
(What is the formula)
RSS= sum of (Yi-Ŷ)^2
1.) subtract the predicted value of y from the actual value of why for each data point.
2.) square each result
3.) add together all of the squares
The result of step 3 is the RSS
How do you find the total sum of squares (TSS)?
(What is the formula)
TSS = sum of (Yi-Ȳ)^2
1.) subtract the mean of y from each y value in the data set.
2.) square the results of each subtraction in step 1
3.) add together all of the squares
The result of step 3 is the TSS
What is R2 and how do you find it?
what does it tell you?
What is the formula?
R2 is the proportion of variation in Y explained by the model.
Tells you how well a model fits the data
R2 = 1 - (RSS/TSS)
1.) find RSS
2.) find TSS
3.) Divide RSS by TSS
4.) Subtract the result to step 3 from 1
Result of step 4 is R2
What is the counterfactual?
What is the factual?
Counterfactual = what would have happened absent a condition or treatment, what would have been observed
Factual = What was actually observed
What is the fundamental problem of causal inference?
The counterfactual can never be observed
- you must infer the counterfactual outcomes as accurately as possible, but will never actually know what would have happened.
What is the rule of causality?
(ie. ice cream sales and suicide)
association does not equal causation!
How can you figure out counterfactuals?
What is the problem with this tactic?
Matching- find a similar unit that matches as close as possible
Problem: you cannot match everything and this introduces confounders
What are confounders?
How do you minimize confounders
variables associated with treatment
and outcome, they impact the results and make it difficult to attribute changes to the treatment.
Can be observed or unobserved.
Minimize by using randomized controlled trials
What are Randomized Controlled Trials and how do they work to minimize confounders?
RCT is when scientists randomize the treatment to make the treatment and control groups identical on average.
The groups are similar in terms of all, observed and unobserved, characteristics. This allows scientist to be able to attribute any differences in outcome to the treatment variable and rule out confounders.