Causal inference Flashcards
(19 cards)
whats the difference between causal discovery and causal inference?
Causal discovery; What factors (X) cause a specific outcome (Y)?. Identify underlying causal structures from data
Causal inference: How much does treatment (X) affect outcome (Y)?. Determining how much specific factors affect an outcome
what are causal estimands calculated in causal inference?
-Average treatment effect (ATE): average effect of a treatment across an entire population
-Conditional average treatment effect (CATE): average effect of a treatment for a specific subgroup within the population
-Average treatment effect on the treated (ATT): average effect of a treatment on those who were treated
-Individual treatment effect (ITE): the effect of a treatment on an individual
pros/cons/assumptions of RCTs
pros: gold standard due to random assignment
cons: can be expensive, time consuming, not feasible or ethically wrong
limitations: if the sample is a specific narrowly defined population it might not translate to real word applicants
-possible bias: selection bias, performance bias (unequal adherence), detection bias (differential assessment methods)
assumptions: participants are randomly assigned, no confounders, participants represent general/target population
effect estimator: mean diff between groups (ATE)
test: t-test (continuous), chi square test (binary), gives p-value if < 0.05, statistically significant, reject null hypothesis
what non experimental methods can be used to test causal inference?
-propensity score matching: pair individuals who received the treatment with those who didn’t based on similar characteristics, reduces bias from confounders
-Instrumental variables: uses variables related to the treatment but not directly to the outcome to isolate the causal effect by providing a source of variation in the treatment that isn’t influenced by confounders
-double machine learning: uses machine learning to control for confounders by creating ‘clean’ versions of treatment and outcome variables, by subtracting estimated effects from confounders
-difference in differences: compares changes in outcomes over time between a treatment and control group, helps to control for confounders that effect both groups similarly over time
-bayesian structural time series: use bayesian methods to model time series data. the effect of intervention is inferred by comparing observed data with a counterfactual scenario to estimate what would have happened without intervention
-regression discontinuity design: uses a cutoff or threshold to assign treatment. compare individuals just above and below the cutoff to estimate affect by assuming they are similar
what assumptions are required to test the validity of an A/B test design framework?
-participants are randomly assigned to either group
-no selection bias & group representative of population
-independent observations: the outcome of one unit doesnt affect the outcome of another
-no external changes during the test
what method to use when there is a pre-existing difference between groups? (experimental data)
controlled-experiment using pre-experiment data (CUPED).
Make predictions of the post-experiment data using pre-experiment data to estimate the baseline for an individual without treatment.
what does it mean when an experiment is underpowered? how to solve?
An experiment is underpowered when the treatment effect is too small relative to the metric’s variance for a given sample size (ie variance in data is too large to detect an effect even if it actually is significant)
CUPED (Controlled-experiment Using Pre-Experiment Data) tries to remove variance in a metric that can be accounted for by pre-experiment information - variance that pre-experiment data can explain is a metric unrelated to the effects of the experiment and therefore can be removed
reducing variance helps to increase power to detect small effects
what method to use when some people in the treatment group do not actually receive the treatment? (experimental data)
complier average causal effect (CACE) - average causal affect of compliers
-adjusts the intention to treat effect with the compliance rate in order to estimate the treatment effect for the subpopulation that is actually being treated
-estimated using an instrumental variable approach where the instrument is the group assignment and the actual treatment received is the outcome (influenced only by the instrument)
-CACE = intent to treat effect (effect of assignment on outcome) / effect of assignment on treatment (portion of compliers)
-assumption: treatment assignment affects outcome only through treatment being received
when do you use propensity score matching? what is it?
propensity score is the conditional probability of someone being assigned to a treatment given a set of covariates.
-estimate propensity score using logistic regression
-units in the treatment group are then matched with units in the control group with similar propensity scores
-select control and treated units with similar characteristics. reduces selection bias by creating balanced treatment and control groups
assumptions:
-treatment is independent of outcome given covariates (ie all confounders are observed and included)
-positive probability of receiving treatment and control for every value of the covariates
limitations:
-can’t control for unobserved variables/confounders
-matching can result in discarding unmatched units, reducing sample size and statistical power
-quality of matching depends on the correct propensity score, poor model choice or omission of variables can lead to biased estimates
-doesn’t work well when treated and control groups are too different (lack of common support)
-use when: randomisation not possible, data is not observed over time, you have full confounder and have pre-intervention covariates - when data is rich in confounders and they can all be observed
-ATT
-same tests
what is an alternative to regular propensity score matching?
inverse probability of treatment weighting (IPTW): instead of matching individuals, each observation is weighted based on the inverse of the probability that they received the treatment they actually received - estimates ATE instead of ATT
-same assumptions as PSM
-limitation: if units have propensity scores near 1 or 0 the resulting weights can cause high variance and unreliable estimates
what is the regression discontinuity method? when to use?
the idea is that units just above and just below the cutoff are likely similar in all respects except for the treatment, so use the jump to estimate the causal effect
-controls for selection bias around the cutoff
-when assignment to treatment/control occurs around a threshold or cutoff
-assumptions:
-the potential outcome is smooth around the cutoff
-no sudden jumps in the outcome at treatment unless caused by the treatment
-the individuals can’t manipulate control to fall on either side of the cutoff
-units near the cutoff are similar and can be thought of as randomly assigned to treatment/control.
-limitations:
-estimates are only local to the cutoff point, can’t generalise to other parts of the variable
-need large sample near cutoff
-use when: data is not repeated over time, treatment assignment depends on a sharp cutoff
-local average treatment effect (only around the cutoff point)
what is interrupted time series design? and when to use?
evaluate causal impact by analysing the trends before and after treatment. estimate whether there is a change at the time the event takes place. use when treatment occurs at a known specific time
-model the outcome variable over time and look for either a jump in the outcome or change in trajectory
assumptions:
-no other events occur at same time that could affect outcome
-the outcome variable follows a steady trend before treatment
-errors are not correlated, or they are accounted for
limitations:
-need sufficient pre and post data
-no control group, making it harder to rule out confounders that changed at the same time
-measures: effect of treatment on the treated
what is the difference in differences approach? when to use?
estimates the causal effect of a treatment by comparing the before-and-after differences in outcomes between a treatment group and a control group. the difference of the differences isolates the effect of the treatment
assumptions:
-parallel trends: in the absence of treatment, the difference in outcomes between the treatment and control groups would have remained constant over time
-no confounders in only the treatment group that occurred at time of intervention
-individuals in each group are comparable over time
limitations:
-hard to verify parallel trends, but can check pre-treatment trends
-time varying confounders can still affect results -try adjust for these
use when: you have both control and treated group observed before and after intervention
measure: DiD estimator (ATT) (additional change in treatment group due to treatment)
-p-value
how can you use instrumental variables to estimate causation? what scenarios to use this in?
use a variable/instrument that:
-affects the treatment
-affects the outcome only through the treatment (not directly)
-use the instrument to predict the treatment, then use the predicted treatment to estimate the outcome
-tries to isolate the causal effect
assumptions:
-instrument and causal variable are strongly correlated
-instrument affects the outcome only through the causal variable
-instrument is uncorrelated with other unobserved factors
limitations:
-if the instrument is weak (not strongly correlated with variable) then estimates are biased
-only estimates the local average treatment effect (only for compliers affected by the instrument)
-you need as many valid instruments as endogenous variables
use when: randomisation not possible, the treatment variable is correlated with unobserved confounders (you suspect endogeneity)
-you cant measure all confounders
-Local average treatment effect (treatment effect for compliers)
what is sensitivity analysis?
last resort when no instrument and unmeasured confounders,
-idea is to quantify how strong an unmeasured confounder would need to be to invalidate conclusion
when given a case study question/hypothesis that asks you to test a hypothesis around a cause (e.g. we think this is causing this, how would you test?), what steps should you go about to answer? Assume A/B test is possible/the answer
- understand the question: ensure you understand what the parts of it are saying. ask follow up if needed - clarify definitions
-find out if change is slow over period of time or drastic (if drastic likely not due to a feature)
-start with exploratory analyis e.g graph of what is happening
-trying to sense check whether the hypothesis is correct - validate hypothesis
- think about how to check problem, by test or analysis
- share recommendations
whats the framework for designing an A/B test?
-use feature and number of orders as example
- Hypothesis formulation:
-null hypothesis and alternate hypothesis
-null: no change in number of orders with feature
-alternate: some change - network effect:
-when running an experiment all experiment units should be independent of each other
-can users influence each other?
-identify if there is a network effect problem that needs to be considered - randomisation unit
-based on network effect - power analysis - estimate the sample size
- length of experiment
- A/A test and basic sanity check
- experiment analysis
-run A/B test, with one group using the feature
-check if statistically significant change in order numbers between groups during experiment period - recommendation
how to do A/B test if feature is already released?
Ablation - reverse of a/b test. cease to show the feature to a group and use them as treatment group. test in same way
how to do A/B test when results will take long time to show?
holdback test - give feature to most people so you can get gains from it sooner. but get small test and control group. compare holdback test and control over longer period of time