Chapter 3 - Forensic Flashcards

(24 cards)

1
Q

What is the Napier approach and how does it relate to your framework?

A
  • Napier approach - Bayesian hierarchical model for compositional data with structural zeros
  • accounts for zeros by splitting the data manually based on presence and absence of elements prior to fitting the model
  • my work builds upon this by automating the split of the compositional elements to make it more accessible for real-world situations
  • done through conducting clustering to the data prior to fitting the model
  • proposing an integrated clustering approach which clusters the items in the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does your integrated clustering model work and why is it beneficial?

A
  • combines class membership inference and parameter estimation within the MCMC
  • Unlike pre-clustering approaches, which fix cluster labels in advance, this method treats class labels as latent variables, updating them during sampling
  • accounts for classification uncertainty
  • minimising user decisions and manual input
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What were the main features of the forensic glass data?

A
  • elemental proportions of glass fragments which contain a large proportion of zero values
  • observation belongs to a known type of glass (e.g., window, container), with repeated measurements per item and fragment
  • goal is to classifying fragments to their original glass use type
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How did you handle high zero counts in elements like Fe?

A
  • split the data into subsets / groups based on the presence and absence of compositional elements
  • results in subsets which contain solely absent components
  • reduce the impact that the large proportion of zeros may have on the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why was oxygen chosen as the denominator for the compositional ratios?

A
  • require a component that is always non-zero
  • oxygen is always present this was chosen to be the denominator
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the rationale for the pre-clustering vs. integrated clustering approaches?

A
  • pre-clustering: first step at automating the Napier approach
  • automates through clustering glass items prior to fitting the model
  • still requires subjective decisions
  • integrated clustering: treats cluster labels as latent variables and models these within the model
  • using the probability of an element being present or absent as a prior
  • potentially leads to more optimal clustering as not solely based on an indicator matrix
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the advantages of your integrated clustering method?

A
  • flexible and widely applicable
  • removes manual intervention
  • no need for any pre-treatment to the data
  • avoids bias from fixed labels
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How did you measure classification performance?

A
  • correct classification rates
  • Brier Score - assess accuracy of prediction model
  • Expected Calibration Error (ECE) - assess how well predicted probabilities aligned with observed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What role did cross-validation play in your analysis?

A
  • Five-fold cross-validation was used to provide robust evaluation across all models by allowing each item to be a ‘test’ item once (unknown glas use type)

Note that ‘test’ data usually refers to data for which the value of the response is treated as unknown, in this case this would be the compositions. Here, however, the compositions can be seen by the model and the item type is the unknown quantity to be predicted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How did you deal with unknown class labels in MCMC chains?

A
  • unknown class labels were treated as latent variables within the model
  • values were sampled during MCMC, informed by the data ihood under each class and prior class probabilities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the implications of your findings for forensic science?

A
  • offer a robust interpretable tool for forensic glass analysis
  • tool allows for forensic glass fragments to be classified to help assess if they are the same glass type as a crime scene
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Can you explain how you handled the hierarchical structure in the forensic glass application?

A
  • hierarchical structure was handled through fitting a hierarchical model with both fixed and random effects
  • observations from the same glass use type shared parameters, accounting for within-type correlation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why did you choose a square-root transformation in some models?

A
  • applied to stabilise variance and approximate normality for transformed compositional ratios
  • avoids any imputation to be conducted in order to take the transformation (e.g. vs the log)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What were the key challenges in modelling the forensic glass data?

A
  • High number of zero values - Fe
  • Small sample sizes for some glass use types - headlamp
  • Compositional nature of the data, complicating traditional modelling techniques
  • Hierarchical nature of the data - multiple measurements per fragment per item
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How did your models perform compared to standard classification approaches on the glass data?

A
  • achieved higher classification accuracy
  • lower Brier Score
  • better calibration (ECE)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What were the key evaluation metrics used to assess model performance?

A
  • Classification accuracy: to quantify the proportion of correctly classified items
  • Brier Score: assess the accuracy of prediction model through the mean squared error between predicted probabilities and true outcomes
  • Expected Calibration Error (ECE): assess how well the predicted probabilities align with observed
17
Q

How did you ensure fair comparisons between your models?

A
  • all approaches were evaluated using identical cross-validation data
  • transformation of compositional ratio was identical across approaches
18
Q

What are Brier scores and Expected Calibration Error, and what do they indicate?

A
  • Brier Score: assess accuracy of prediction models, quantifies the mean squared error of predicted probabilities
  • lower scores indicate better predictive accuracy and confidence calibration
  • Expected Calibration Error (ECE): assesses how well the predicted probabilities align, measures the difference between predicted probabilities and the actual observed
  • lower ECE indicates that predicted probabilities closely match real-world outcomes
19
Q

What does a low Brier Score indicate in your models?

A
  • indicate better predicitve accuracy
  • confidence in the predicted values
20
Q

How did you use the Expected Calibration Error (ECE)?

A
  • ECE was used to assess the reliability of probabilistic predictions
  • computed by binning predicted probabilities and comparing average prediction confidence to the observed frequency in each bin
21
Q

In what scenarios did your methods outperform traditional methods the most?

A
  • overall correct classification of all glass use types
  • correct classification of car and building windows
  • lowest Brier score for car and building windows
  • lowest ECE
22
Q

How might your forensic glass model be used in real forensic investigations?

A
  • classification of evidence: assigning recovered glass fragments to potential glass use types with quantified uncertainty
  • evaluating evidential strength: offering posterior probabilities
  • enhancing computational efficiency: providing easier computation of probabilistic classification glass use type as done in model
23
Q

How generalisable are your methods to other types of compositional data?

A
  • applicable to other compositional data applications which contain a large proportion of zeros and a hierarchical structure

Example: soil compositions with multiple measurements

24
Q

What are the main limitations of your approaches?

A
  • computational cost running integrated clustering approach
  • poor classification of headlamps - could be improved