Comparative Flashcards
(6 cards)
How does your work compare to log-ratio based approaches?
- bypasses log-ratio transformations, instead modelling compositional data in its original form taking into account both the relative and absolute values
- overcomes many limitations of applying log-ratio approaches specifically when there are zero or missing values present or the data has a count structure
What are the key differences between GDM and Generalised-Dirichlet or Multinomial models?
- GDM extends both the GD and Multinomial distribution to provide a flexible distribution to model compositional counts
- GDM provides a flexible covariance structure which can model overdispersion in the counts
- GDM extends the GD by allowing x = 0 and x = N (total)
- Multinomial part explains some of the variability in the counts, while the GD component flexibly explain all other random variability - capturing different variance patterns
- GDM can also handle zeros (Multinomial yes, GD no) and missing values (Multinomial and GD no) in the compositions
What trade-offs did you make between model complexity and interpretability?
- implemented Bayesian hierarchical models that are more complex computationally but provide greater flexibility and interpretability
- models allow latent structures (clusters, hidden states, spatial effects) to be specified
- opting for a Bayesian framework, I was also able to quantify uncertainty throughout the model, which enhances both interpretability and trustworthiness of the results
How does your model scale with higher dimensions (more components)?
- each framework can scale reasonably well, but the MCMC sampling becomes more computationally demanding as complexity increases
- may not be an issue depending on the problem and time constraints of the application
How would your models perform with compositional data from other domains?
- Although each framework was only tested using one example of compositional data - each approach is widely applicable across all domains of compositional data which consists of similar features to those presented.
Example:
* Environmental - Classification of soil type from repeated measurement of compositional values
* Ecological - Tracking evolution of species over time
* Epidemiological - Examining disease prevalence of multiple diseases in an area
How does your method address overdispersion?
Overdispersion is addressed through the Generalised Dirichlet Multinomial, which extends the Multinomial by allowing flexible variance and correlation structures among components, component-specific dispersion parameters.
This avoids underestimating uncertainty, a common issue in traditional Multinomial.