Chapter 5 - Trees Flashcards
(17 cards)
What is spatial compositional data?
- consist of compositional data observed across geographic locations
- both the relative structure of the composition and its spatial information are crucial
- example: proportion of tree species types across plots in a forest
What motivated your choice of the spatial tree species data?
- real-world application, representing species diversity across the geographic region
- compositional structure (proportions of multiple tree types per location)
How did you incorporate spatial structure into your model?
- spatial penalised regression splines were used to model the latent spatial variation across locations
- splines allow the model to borrow strength across neighbouring locations while remaining flexible
- spline basis functions were incorporated as latent variables in the linear predictor of the GDM model, allowing the mean composition to vary smoothly over space
What are spatial penalised regression splines and why did you use them?
- spatial penalised regression splines incorporate a penalty term in the spline to prevent overfitting
- smoothness of the spatial surface is controlled by a penalty term (lambda)
- high lambda - more smooth curve
- low lambda - more flexible, wiggly curve
- provides a computationally efficient and flexible way to model the spatial structure
How did you handle missing values in the spatial data?
- missing values in the spatial compositions were handled directly as the GDM can handle and predict missing values
- sampled during the MCMC process
How did you evaluate predictive accuracy across spatial locations?
- posterior predictive model checking was used to assess how well the model produced missing tree counts
What were the benefits of using GDM for spatial data?
- GDM can handle overdispersed counts
- accommodates zero counts naturally
- combined with spatial splines, it provided flexible way of understanding complex spatial relationships
How are spatial penalised regression splines incorporated in your spatial framework?
- included as random effects in the linear predictor of the GDM model
- coefficients were inferred jointly with other parameters in a Bayesian framework
- splines captured smooth spatial variation
What were the key evaluation metrics used to assess model performance?
- MAE
- RMSE
- Bayesian Coverage and mean width of uncertainty intervals
- Xi - out-of-sample R^2 quantifying prediction error variance relative to the baseline
How did you ensure fair comparisons between your models?
- using the same tree species and randomly sampled spatial locations
- using the same basis function / model specification
How did your GDM models compare with GAMs in predictive accuracy?
- GDM outperformed in producing counts that were more similar to the original counts
- occurred across the different levels of missing components, overall for each tree species and overall - for both MAE, RMSE and xi
In what scenarios did your methods outperform traditional methods the most?
- GDM outperforms could be explained by the GDM’s ability to predict missing values with compositional constraints.
- Specifically, within the GDM, if the model observes high counts of one tree species, it knows to predict low counts for the remaining species.
- In contrast, the GAM lacks this knowledge, leading to over-prediction of very high counts.
Therefore, the GDM is an effective tool for modelling spatial compositional data
What ecological interpretations can be made from your tree species spatial model?
- model revealed distinct areas in the grid that were dominated by particular species
- could help identify hotspots of species diversity and areas where specific species are likely to be under or over represented
How generalisable are your methods to other types of compositional data?
Spatial compositional data:
* Environmental data (e.g. farming crops in different field)
* Epidemiological data (e.g. disease prevalence)
The framework is broadly useful for compositional data (counts or proportions) with spatial information.
What are the main limitations of your approaches?
- computational cost - running the model
- spatial - decision on what spatial model to use (e.g. spatial penalised regression spline, GMRF, CAR)
How would your model handle spatio-temporal data?
The current spatial GDM model could be extended to spatio-temporal settings by
* adding temporal spline terms
* dynamic latent processes, including time-varying covariates in the linear predictor
This would allow the model to capture temporal evolution of spatial composition patterns such as species migration or seasonal effects
Describe your model’s latent structure?
- spline-based spatial random effects - capturing smooth variation of the compositions