Compositional Data Flashcards

(13 cards)

1
Q

What is compositional data?

A
  • compositional data consists of sets of non-negative vectors
  • takes form of proportions, percentages, counts
  • either measured directly as proportions that sum to one or measured in absolute terms with different totals
  • interest is in the size of components relative to the total and relative to each other
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is compositional data challenging to analyse with traditional statistical methods?

A
  • standard statistical methods (e.g. linear regression, generalised linear models) often assume independence in response variables given covariate/random effects and may yield biased or otherwise misleading results when that assumption is not valid
  • compositional data analysis typically requires specialised methods that address the compositional structure of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the simplex and how does it relate to the sample space of compositional data?

A
  • simplex defines the sample space of compositional data from the original / historic defintion
  • assumes the component values are proportions that sum to one
  • constrained subset of the Euclidean space where all components are non-negative and sum to a constant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why are log-ratio transformations commonly used in compositional data analysis?

A
  • map the compositional values to an unconstrained space (e.g. Euclidean)
  • enables use of standard statistical modelling frameworks to then be applied
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the limitations of using log-ratio transformations?

A
  • if zeros are present - log-ratio undefined
  • if missing values in components - log-ratio requires complete component values to be correctly defined. may not produce sensible results - as the relative proportions not properly computed
  • if the compositional values are counts - potentially discards information on how the total impacts the variance and values that the counts can take, when the total count is small, more problematic - reduces the unique values the counts can take
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can you explain the difference between structural and rounded zeros in compositional data?

A
  • structural zeros - zeros considered true zeros and represent an absence of the component
  • rounded zeros - zeros not considered true zeros and represent a measurement error or falling below the limit of detection
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is your work different from traditional compositional data analysis?

A
  • my work considers compositional data outwidth the traditional compositional data defintion - allowing compositional data to be considered in its absolute values rather than forcing it to be defined on the simplex
  • alternative approaches that do not reqiure log-ratio transformations
  • capturing both absolute and relative information, incorporating zeros and missing values and naturally fitting counts.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How does your thesis challenge the traditional definition of compositional data?

A
  • considers compositional data outwidth the traditional compositional data definition - allowing compositional data to be considered in its absolute values rather than forcing it to be defined on the simplex
  • considers the absolute values can carry important information as they can influence the variance and overall dynamics of the data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are Aitchison’s principles?

A
  • scale invariance - change of scale does not impact the relationships of the ratios
  • subcompositional coherence - relationships between parts remain valid even when analysing a subset of components
  • subcompositional dominance - if a component dominates in the full composition it must dominate in any subcomposition
  • permutation invariance - order of the components does not impact the analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is spurious correlation in the context of compositional data?

A
  • early work of compositional data based upon
  • arises due to sum constraint
  • increase in one component relative to the total reduces the share of the other components, induces a negative correlation in the relative values
  • when compositional data lies outside simplex - lead to misleading information

Example: number of Ford crashes is not negatively correlated with the number of VW crashes. However, when consider the proportions - a negative correlation can emerge. An increase in the proportion of Ford crashes relative to the total would automatically reduce the proportion of Volkswagen crashes, even if the actual number of Volkswagen crashes remains unchanged.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is modelling raw counts instead of proportions sometimes preferable?

A
  • Preserves absolute information: which is often informative to learn about the overall dynamics and variance of the data
  • Avoids methods to deal with zeros: handle zeros in their natural form
  • Allows overdispersion modelling: capturing greater variability in the counts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is it important to preserve both absolute and relative information in compositions?

A

Preserving both aspects allows a richer understanding of the data:
* Relative information: captures proportional relationships (e.g., dominance of one component).
* Absolute information: reflects scale (e.g., total abundance), which can be crucial in classification, surveillance, or ecological contexts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do ternary diagrams work and what are their limitations?

A
  • way of visualsing a three-part composition
  • near vertex - high concentration of that component / near centre - equal proportions of all components
  • limitations: limited to 3 components, difficult to interpret densities or uncertainty
How well did you know this?
1
Not at all
2
3
4
5
Perfectly