Bayesian Flashcards

Question 1

Q

Why did you choose a Bayesian hierarchical framework for your models?

Answer

A

flexible modelling of complex structured data
incorporating uncertainty at multiple levels
incorporates prior information, updating beliefs based on the data
naturally accommodates missing values as unknown quantities which are estimated during MCMC
allows for posterior predictive model checking where uncertainty in parameter estimates can be quantified

Question 2

Q

What are the benefits of using MCMC for inference?

Answer

A

accurate approximation of posterior distributions, especially when closed-form solutions are unavailable
uncertainty quantification through credible intervals
complex model flexibility, including non-standard likelihoods and latent variable structures

Question 3

Q

What are the advantages of using Bayesian inference in the context of compositional data?

Answer

A

integration over uncertainty in parameter estimation and predictions
handling of structural zeros and missing values without imputation
use of count-based models (e.g. GDM), directly working with observed data and incorporating prior knowledge.
flexibility in modelling hierarchical structures in compositional datasets

Question 4

Q

How does your hierarchical model handle structural zeros without imputation?

Answer

A

addressed by using model-based approaches
no need for arbitrary imputation; instead, zeros are treated as valid outcomes within the probabilistic framework
accounted for through splitting the data based on presence and absence of zeros in the components or explicit modelling the through distributions that allow for zeros (e.g. GDM).

Question 5

Q

How do Bayesian models help with missing data?

Answer

A

handled through the posterior distribution, where missing values are drawn from distributions during MCMC

Question 6

Q

What sampling methods did you use and why?

Answer

A

MCMC (Markov Chain Monte Carlo)
necessary due to the non-conjugate and hierarchical structure of the models
allowed flexible posterior exploration

Question 7

Q

How did you specify priors in your Bayesian models, and how sensitive are your results to them?

Answer

A

used mainly weakly informative
allowing the posterior to be data-driven, without over reliance on the prior
For example, Dirichlet and Beta priors were used for compositional proportions, while Normal priors were set on spline coefficients and latent variables
*

Question 8

Q

Can you explain how latent variables were used in your models?

Answer

A

Forensic: clustering labels
Time Series: HMM - latent hidden states
Spatial: spatial penalised regression splines
allowed capturing unobserved structure, enhancing flexibility and interpretability

Question 9

Q

What is the role of NIMBLE in your modelling?

Answer

A

NIMBLE was chosen because:
* NIMBLE is a flexible and efficient package for fitting a wide range of statistical models, particularly those that are computationally intensive and involve complex hierarchical structures
* NIMBLE models are written in the BUGS language and then compiled automatically into C++, which allows for fast execution
* results in efficient MCMC sampling
* ability to handle custom distributions (e.g. GDM)
* flexibility in writing model-specific samplers
* integration of R-based model specification with compiled C++ speed

Question 10

Q

How do you ensure convergence in your MCMC chains?

Answer

A

visual inspection of trace plots
potential Scale Reduction Factor (PSRF)
running multiple chains with different initial values

Question 11

Q

How did you handle model uncertainty?

Answer

A

model comparison using posterior predictive model checks, cross-validation and predictive performance metrics (e.g., Brier Score, ECE).
latent variables where necessary to accommodate uncertainty in class membership, HMM states or spatial variation These approaches allowed you to quantify both parameter and structural uncertainty in predictions.

Question 12

Q

What is the PSRF and how did you use it?

Answer

A

used to assess convergence of MCMC chains
ensure multiple chains converged to the same posterior distribution
ratio between-chain and within-chain
PSRF close to 1 indicates convergence

Question 13

Q

How does your thesis contribute to the field of Bayesian statistics?

Answer

A

developing GDM-based hierarchical models for multilevel compositional data
incorporating custom distributions and latent variables for non-standard data types.
demonstrating Bayesian decision support for real-world applications (e.g., forensic, public health, ecology).

Question 14

Q

Why were custom distributions necessary in your implementation?

Answer

A

standard distributions (e.g., Multinomial, Dirichlet) were insufficient for modelling specific data
GDM offers more flexibility - not written in many probabilistic programming tools
implemented custom likelihood functions in NIMBLE to accommodate these structures within a Bayesian framework.

Question 15

Q

Explain the hierarchical aspect of each method?

Answer

A

Forensic: multiple measurements per fragment, multiple fragments per item - hierarchical model with multiple levels
COVID-19: weekly counts per variant per country - HMM with group-specifc parameters (each country / variant)
Trees: proportions of tree species over a grid - splines with group-specific parameters (each tree type)

Bayesian Flashcards

(15 cards)