interview questions Flashcards

(31 cards)

1
Q

titration vs maintenance ASM CRF project study design

A

Index: ASM initiation, with a 6 month baseline period to describe characteristics and comorbidities
Exposure: time at risk defined by physicians in titration and maintenance
Outcome: To analyze treatment patterns, seizure outcomes, and HRU/costs based on patient charts

The review focuses on the impact of ASM during titration vs maintenance phases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What statistical methods were used for analyzing seizure outcomes?

A

Generalized estimating equations (GEEs)

GEEs account for correlation between repeated patients in titration and maintenance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

generalized estimating equation (GEE)

A

To analyze clustered or correlated data, particularly in longitudinal studies

Need to specify:
-link function and distribution
-variance function
-correlation structure

GEEs allow for the specification of link functions and distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of correlation structures used in GEEs.

A

Independence: no correlation
Exchangeable: constant correlation between all time points
AR (autoregressive): correlation decays with time lag
Unstructured: every pair can have diff correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of MeDRA codes in clinical data?

A

To establish hierarchy for coding adverse events (AEs) and grouping system organ classes (SOCs)

goes from lowest level term to SOCs

example taking reasons for discontinuing titration in ASMs from CRF free response to group by broader higher level categorize and summarize

This helps summarize statistics from raw responses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Thesis project GLMs

A
  • Gaussian link with robust standard errors to estimate RD
  • Poisson log link with robust sandwich variance estimators to estimate RR

reflection: Wouldnt use linear probability models - it did what i intended by estimating risk differences but because outcome was binary, it can also risk having probabilities fall outside 0 and 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Targeted Maximum Likelihood Estimation (TMLE)?

A

A semiparametric, doubly robust causal inference method

-Machine learning for flexible estimation of nuisance parameters (like propensity scores and outcome regressions)
-Targeting step to improve bias-variance tradeoff and achieve efficient estimates
-Likelihood-based inference to provide valid confidence intervals and p-values

It addresses bias from model misspecification and enhances precision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is TMLE particularly useful?

A
  • Estimating causal effects from observational data
  • Using nonparametric or machine learning models
  • Handling complex data structures

thesis project takeaway: Because i had randomized trial data, using these types of models was probably overkill - TMLE is great for causal inference with observational data because it uses propensity scores but because patients were randomized this was unnecessary. But overall TMLE’s flexibility and double robustness were helpful in validating that my findings of GLMs were not sensitive to a particular model form

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of CDISC standards?

A

To ensure quality and regulatory compliance of clinical data

Standards include SDTM, ADaM, and CDASH.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define SDTM.

A

Structure of data, standardizing the format of collected data like demographics, adverse events, and lab results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does ADaM stand for?

A

Analysis Data Model, used to create analysis-ready datasets

It ensures traceability from raw data to statistical analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the difference between fixed effects and random effects?

A

Fixed effects are of direct interest and assumed the same across experiments; random effects represent variability across randomly sampled units.

Fixed effects estimate specific coefficients, while random effects estimate the variance of those coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a Type I error?

A

Rejecting the null hypothesis when it is actually true

This is controlled by the significance level, alpha.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is survival analysis used for?

A

To analyze time to an event and handle censoring

type of survival analysis = cox PH

Commonly used models include Kaplan-Meier and Cox regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does intention-to-treat (ITT) analysis entail?

A

Including all randomized participants in their assigned groups, regardless of protocol adherence

This preserves the benefits of randomization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How is the Proportional Reporting Ratio (PRR) calculated?

A

PRR = [A/(A+B)] / [C/(C+D)]

Where A, B, C, and D represent various report counts.

17
Q

How do you handle missing data?

A

Depends on data and final results.

If the data was missing at random, imputation methods could work, like multiple imputation which would preserve variability and reduce bias.

i imagine for regulatory submissions, could see the argument for worst case imputation, in which we use the worst outcome for a more conservative estimate.

A complete case analysis could be done if the data had small enough missingness.

However, for patient data, safety in particular, these methods might not be ideal because we do not want to miss out on actual patient information and data.

For longitudinal analyses, I often will consider implementing inverse probability of censoring weighting. I could calculate probability of not being lost to follow up, essentially finding a propensity for missingness and use that probability to stratify results or weight

Different methods include imputation and complete case analysis.

18
Q

What are the assumptions of linear regression?

A
  • Linearity (residual vs fitted plots)
  • Independence
  • Homoskedasticity (residual vs fitted plots)
  • Normality of residuals (Q-Q plots, shapiro-wilks)
  • No multicollinearity (variance inflation factors)
19
Q

What is ‘immortal time bias’?

A

A period during which the outcome cannot occur is incorrectly classified as exposed, leads to artificially lower event rates in exposed group

fix by aligning risk window correctly, e.g., defining second prescription fill, use time dependent exposure definitions in cox model

This bias can lead to misleading conclusions in time-to-event analysis.

20
Q

How can time at risk be defined in longitudinal safety studies?

A

Factors like pharmacokinetics, latency of adverse event, and treatment patterns

Accurate definition of time at risk is crucial for incidence estimation.

21
Q

When is exact logistic regression preferred?

A

When the sample size is very small or events are extremely limited

it uses MCMC (Markov chain Monte Carlo) sampling, It calculates conditional probabilities directly, based on enumeration of all possible datasets given fixed margins.

Exact logistic regression computes probabilities directly without relying on large-sample approximations.

22
Q

What is MSM?

A

Marginal Structural Models

accounts for time varying covariates, updates/calculates exposure, covariates (IPTW), IPCW during intervals

good for long latency periods/observation periods, rare outcomes, time varying covariates

CKD proposal example: because guideline medications are varied, depends on many factors, patients often switch and can have time dependent covariates, will use MSMs

MSMs are used to adjust for time-varying covariates in longitudinal data.

23
Q

When to use bayesian models

A

Bayesian models are valuable when prior information exists or when data are sparse and unstable. In pharmacovigilance, we might borrow strength across drugs in the same class or use prior safety data.

For example, in a Bayesian hierarchical model, we can model adverse event rates across several related drugs, assuming that the rates follow a common prior distribution.

This is useful in rare disease drug approvals or pediatric safety surveillance, where trials are small. The Bayesian approach also naturally accommodates uncertainty and produces credible intervals

24
Q

Explaining findings to non statistical audience

A

focus on translating numbers into meaningful insights: RR 1.6 –> pts in exposure A had 60% higher risk of outcome than exposure B

also use visual aids, KM curves, risk difference plots

25
Designing a PASS study
I would start by defining the research question, including the safety outcome of interest and the exposure drug(s). Then I’d define the cohort, using inclusion/exclusion criteria to ensure comparability and capture the time at risk. I'd carefully define time zero—often the drug initiation—and use a new-user design to avoid immortal time bias. Then I’d define the outcome and covariates, possibly using a lookback period for baseline covariates. To adjust for confounding, I’d consider propensity score methods, including matching or inverse probability weighting. If there are time-varying covariates affected by prior treatment, I’d use MSMs Finally, I’d specify sensitivity analyses, assess robustness, and make sure the study is designed in accordance with FDA/EMA PASS guidance
26
Assumptions of causal inference
Consistency (SUTVA): Treatment is well-defined, no interference Ignorability: No unmeasured confounding Positivity: All groups are comparable Exchangeability: Comparable groups even without randomization No Measurement Error: Data quality supports valid inference Model Specification: Correct estimation procedure Temporal Precedence: Cause precedes effect
27
SAP in longitudinal (vs clinical trial)
- prespecfication less strict, often includes exploratory analyses -extensive confounding adjustment because lack of randomization (i.e. multivariable models, matching, weighting) -lots of missing data (i.e. multiple imputation, weighting methods) -rare/no interim analyses and stopping rules -use causal inference methods -lots of sensitivity analyses for assumptions used
28
competing risks in survival analysis
For cause-specific hazard estimation, I might use a standard Cox model but censor patients at the time of competing events. Choice depends on whether the focus is on etiologic association (cause-specific) or absolute risk prediction (subdistribution)
29
what is type II error
failing to reject the null hypothesis
30
what is power
1 - type 2 error rate the probability of detecting a true effect
31
how does sample size calculation calculated
balances type 1 and type 2 error risk, depend on expected effect size, variability, alpha, and desired power—typically 80% or 90%