interview questions Flashcards
(31 cards)
titration vs maintenance ASM CRF project study design
Index: ASM initiation, with a 6 month baseline period to describe characteristics and comorbidities
Exposure: time at risk defined by physicians in titration and maintenance
Outcome: To analyze treatment patterns, seizure outcomes, and HRU/costs based on patient charts
The review focuses on the impact of ASM during titration vs maintenance phases.
What statistical methods were used for analyzing seizure outcomes?
Generalized estimating equations (GEEs)
GEEs account for correlation between repeated patients in titration and maintenance.
generalized estimating equation (GEE)
To analyze clustered or correlated data, particularly in longitudinal studies
Need to specify:
-link function and distribution
-variance function
-correlation structure
GEEs allow for the specification of link functions and distributions.
Types of correlation structures used in GEEs.
Independence: no correlation
Exchangeable: constant correlation between all time points
AR (autoregressive): correlation decays with time lag
Unstructured: every pair can have diff correlation
What is the purpose of MeDRA codes in clinical data?
To establish hierarchy for coding adverse events (AEs) and grouping system organ classes (SOCs)
goes from lowest level term to SOCs
example taking reasons for discontinuing titration in ASMs from CRF free response to group by broader higher level categorize and summarize
This helps summarize statistics from raw responses.
Thesis project GLMs
- Gaussian link with robust standard errors to estimate RD
- Poisson log link with robust sandwich variance estimators to estimate RR
reflection: Wouldnt use linear probability models - it did what i intended by estimating risk differences but because outcome was binary, it can also risk having probabilities fall outside 0 and 1
What is Targeted Maximum Likelihood Estimation (TMLE)?
A semiparametric, doubly robust causal inference method
-Machine learning for flexible estimation of nuisance parameters (like propensity scores and outcome regressions)
-Targeting step to improve bias-variance tradeoff and achieve efficient estimates
-Likelihood-based inference to provide valid confidence intervals and p-values
It addresses bias from model misspecification and enhances precision.
When is TMLE particularly useful?
- Estimating causal effects from observational data
- Using nonparametric or machine learning models
- Handling complex data structures
thesis project takeaway: Because i had randomized trial data, using these types of models was probably overkill - TMLE is great for causal inference with observational data because it uses propensity scores but because patients were randomized this was unnecessary. But overall TMLE’s flexibility and double robustness were helpful in validating that my findings of GLMs were not sensitive to a particular model form
What is the purpose of CDISC standards?
To ensure quality and regulatory compliance of clinical data
Standards include SDTM, ADaM, and CDASH.
Define SDTM.
Structure of data, standardizing the format of collected data like demographics, adverse events, and lab results
What does ADaM stand for?
Analysis Data Model, used to create analysis-ready datasets
It ensures traceability from raw data to statistical analysis.
What is the difference between fixed effects and random effects?
Fixed effects are of direct interest and assumed the same across experiments; random effects represent variability across randomly sampled units.
Fixed effects estimate specific coefficients, while random effects estimate the variance of those coefficients.
What is a Type I error?
Rejecting the null hypothesis when it is actually true
This is controlled by the significance level, alpha.
What is survival analysis used for?
To analyze time to an event and handle censoring
type of survival analysis = cox PH
Commonly used models include Kaplan-Meier and Cox regression.
What does intention-to-treat (ITT) analysis entail?
Including all randomized participants in their assigned groups, regardless of protocol adherence
This preserves the benefits of randomization.
How is the Proportional Reporting Ratio (PRR) calculated?
PRR = [A/(A+B)] / [C/(C+D)]
Where A, B, C, and D represent various report counts.
How do you handle missing data?
Depends on data and final results.
If the data was missing at random, imputation methods could work, like multiple imputation which would preserve variability and reduce bias.
i imagine for regulatory submissions, could see the argument for worst case imputation, in which we use the worst outcome for a more conservative estimate.
A complete case analysis could be done if the data had small enough missingness.
However, for patient data, safety in particular, these methods might not be ideal because we do not want to miss out on actual patient information and data.
For longitudinal analyses, I often will consider implementing inverse probability of censoring weighting. I could calculate probability of not being lost to follow up, essentially finding a propensity for missingness and use that probability to stratify results or weight
Different methods include imputation and complete case analysis.
What are the assumptions of linear regression?
- Linearity (residual vs fitted plots)
- Independence
- Homoskedasticity (residual vs fitted plots)
- Normality of residuals (Q-Q plots, shapiro-wilks)
- No multicollinearity (variance inflation factors)
What is ‘immortal time bias’?
A period during which the outcome cannot occur is incorrectly classified as exposed, leads to artificially lower event rates in exposed group
fix by aligning risk window correctly, e.g., defining second prescription fill, use time dependent exposure definitions in cox model
This bias can lead to misleading conclusions in time-to-event analysis.
How can time at risk be defined in longitudinal safety studies?
Factors like pharmacokinetics, latency of adverse event, and treatment patterns
Accurate definition of time at risk is crucial for incidence estimation.
When is exact logistic regression preferred?
When the sample size is very small or events are extremely limited
it uses MCMC (Markov chain Monte Carlo) sampling, It calculates conditional probabilities directly, based on enumeration of all possible datasets given fixed margins.
Exact logistic regression computes probabilities directly without relying on large-sample approximations.
What is MSM?
Marginal Structural Models
accounts for time varying covariates, updates/calculates exposure, covariates (IPTW), IPCW during intervals
good for long latency periods/observation periods, rare outcomes, time varying covariates
CKD proposal example: because guideline medications are varied, depends on many factors, patients often switch and can have time dependent covariates, will use MSMs
MSMs are used to adjust for time-varying covariates in longitudinal data.
When to use bayesian models
Bayesian models are valuable when prior information exists or when data are sparse and unstable. In pharmacovigilance, we might borrow strength across drugs in the same class or use prior safety data.
For example, in a Bayesian hierarchical model, we can model adverse event rates across several related drugs, assuming that the rates follow a common prior distribution.
This is useful in rare disease drug approvals or pediatric safety surveillance, where trials are small. The Bayesian approach also naturally accommodates uncertainty and produces credible intervals
Explaining findings to non statistical audience
focus on translating numbers into meaningful insights: RR 1.6 –> pts in exposure A had 60% higher risk of outcome than exposure B
also use visual aids, KM curves, risk difference plots