canonical clustering Flashcards
(18 cards)
What is the purpose of Canonical Correlation Analysis (CCA)?
To analyze relationships between two sets of variables by finding linear combinations that are maximally correlated.
Give a real-world example where CCA can be used.
To assess whether gene frequencies in populations relate to environmental variables or whether personality traits relate to communication styles.
How does CCA differ from simple correlation?
CCA looks at correlations between sets of variables, not just between individual variables.
What are canonical variates?
Linear combinations of variables from each set that are maximally correlated with each other.
What is the role of eigenvalues in CCA?
They represent the squared canonical correlations and indicate the strength of the relationship between the canonical variates.
What does a large eigenvalue (close to 1) indicate?
A strong relationship between the canonical variates from the two sets.
What does an eigenvalue close to 0 mean?
That the corresponding canonical variate pair has a weak or no relationship.
How do you interpret the coefficients in canonical variates?
Variables with large positive or negative coefficients are most associated with that canonical variate.
What is tested in hypothesis testing for CCA?
Whether the canonical correlations are significantly different from 0, i.e., whether a meaningful relationship exists between the two sets.
Why should categorical variables be excluded in CCA?
Because CCA is based on correlation, which is defined for continuous variables. Including categorical variables can distort the analysis.
How many canonical variate pairs can be formed in CCA?
The number of canonical variates is equal to the minimum of the number of variables in each set: min(p, q).
Why should data be standardized before running CCA?
Because variables with larger variance can dominate the canonical weights; standardizing gives each variable equal influence.
What does the matrix B⁻¹CᵗA⁻¹C represent in CCA?
It represents the relationships between the two sets of variables after removing shared variation within each set, used to compute canonical correlations.
What does it mean for canonical pairs to be orthogonal?
Each pair of canonical variates (U₁, V₁), (U₂, V₂), etc., are uncorrelated with each other and capture independent patterns of association.
What is redundancy in CCA?
Redundancy refers to how much variance in one set of variables is explained by the canonical variates from the other set.
what is barrett’s test?
Bartlett’s test assesses whether the canonical correlations are statistically significant. The test statistic: χ² = –(n – (p + q + 3)/2) × ∑ ln(1 – λᵢ), follows a chi-square distribution with pq degrees of freedom.
what should our null hypothesis be when we do cc?
should be that there is no significant correlation between any pairs of X or Y
how do we reject the null?
with a high χ2 (and so low P value) meaning that there IS a significant correlation between at least one pair of variables across groups.
or test statistic is greater than critial value