canonical clustering Flashcards

(18 cards)

1
Q

What is the purpose of Canonical Correlation Analysis (CCA)?

A

To analyze relationships between two sets of variables by finding linear combinations that are maximally correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give a real-world example where CCA can be used.

A

To assess whether gene frequencies in populations relate to environmental variables or whether personality traits relate to communication styles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does CCA differ from simple correlation?

A

CCA looks at correlations between sets of variables, not just between individual variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are canonical variates?

A

Linear combinations of variables from each set that are maximally correlated with each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the role of eigenvalues in CCA?

A

They represent the squared canonical correlations and indicate the strength of the relationship between the canonical variates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a large eigenvalue (close to 1) indicate?

A

A strong relationship between the canonical variates from the two sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does an eigenvalue close to 0 mean?

A

That the corresponding canonical variate pair has a weak or no relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you interpret the coefficients in canonical variates?

A

Variables with large positive or negative coefficients are most associated with that canonical variate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is tested in hypothesis testing for CCA?

A

Whether the canonical correlations are significantly different from 0, i.e., whether a meaningful relationship exists between the two sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why should categorical variables be excluded in CCA?

A

Because CCA is based on correlation, which is defined for continuous variables. Including categorical variables can distort the analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How many canonical variate pairs can be formed in CCA?

A

The number of canonical variates is equal to the minimum of the number of variables in each set: min(p, q).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why should data be standardized before running CCA?

A

Because variables with larger variance can dominate the canonical weights; standardizing gives each variable equal influence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the matrix B⁻¹CᵗA⁻¹C represent in CCA?

A

It represents the relationships between the two sets of variables after removing shared variation within each set, used to compute canonical correlations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does it mean for canonical pairs to be orthogonal?

A

Each pair of canonical variates (U₁, V₁), (U₂, V₂), etc., are uncorrelated with each other and capture independent patterns of association.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is redundancy in CCA?

A

Redundancy refers to how much variance in one set of variables is explained by the canonical variates from the other set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is barrett’s test?

A

Bartlett’s test assesses whether the canonical correlations are statistically significant. The test statistic: χ² = –(n – (p + q + 3)/2) × ∑ ln(1 – λᵢ), follows a chi-square distribution with pq degrees of freedom.

17
Q

what should our null hypothesis be when we do cc?

A

should be that there is no significant correlation between any pairs of X or Y

18
Q

how do we reject the null?

A

with a high χ2 (and so low P value) meaning that there IS a significant correlation between at least one pair of variables across groups.

or test statistic is greater than critial value