canonical clustering Flashcards

Question 1

Q

What is the purpose of Canonical Correlation Analysis (CCA)?

Answer

A

To analyze relationships between two sets of variables by finding linear combinations that are maximally correlated.

Question 2

Q

Give a real-world example where CCA can be used.

Answer

A

To assess whether gene frequencies in populations relate to environmental variables or whether personality traits relate to communication styles.

Question 3

Q

How does CCA differ from simple correlation?

Answer

A

CCA looks at correlations between sets of variables, not just between individual variables.

Question 4

Q

What are canonical variates?

Answer

A

Linear combinations of variables from each set that are maximally correlated with each other.

Question 5

Q

What is the role of eigenvalues in CCA?

Answer

A

They represent the squared canonical correlations and indicate the strength of the relationship between the canonical variates.

Question 6

Q

What does a large eigenvalue (close to 1) indicate?

Answer

A

A strong relationship between the canonical variates from the two sets.

Question 7

Q

What does an eigenvalue close to 0 mean?

Answer

A

That the corresponding canonical variate pair has a weak or no relationship.

Question 8

Q

How do you interpret the coefficients in canonical variates?

Answer

A

Variables with large positive or negative coefficients are most associated with that canonical variate.

Question 9

Q

What is tested in hypothesis testing for CCA?

Answer

A

Whether the canonical correlations are significantly different from 0, i.e., whether a meaningful relationship exists between the two sets.

Question 10

Q

Why should categorical variables be excluded in CCA?

Answer

A

Because CCA is based on correlation, which is defined for continuous variables. Including categorical variables can distort the analysis.

Question 11

Q

How many canonical variate pairs can be formed in CCA?

Answer

A

The number of canonical variates is equal to the minimum of the number of variables in each set: min(p, q).

Question 12

Q

Why should data be standardized before running CCA?

Answer

A

Because variables with larger variance can dominate the canonical weights; standardizing gives each variable equal influence.

Question 13

Q

What does the matrix B⁻¹CᵗA⁻¹C represent in CCA?

Answer

A

It represents the relationships between the two sets of variables after removing shared variation within each set, used to compute canonical correlations.

Question 14

Q

What does it mean for canonical pairs to be orthogonal?

Answer

A

Each pair of canonical variates (U₁, V₁), (U₂, V₂), etc., are uncorrelated with each other and capture independent patterns of association.

Question 15

Q

What is redundancy in CCA?

Answer

A

Redundancy refers to how much variance in one set of variables is explained by the canonical variates from the other set.

Question 16

Q

what is barrett’s test?

Answer

Study These Flashcards

A

Bartlett’s test assesses whether the canonical correlations are statistically significant. The test statistic: χ² = –(n – (p + q + 3)/2) × ∑ ln(1 – λᵢ), follows a chi-square distribution with pq degrees of freedom.

Question 17

Q

what should our null hypothesis be when we do cc?

Answer

Study These Flashcards

A

should be that there is no significant correlation between any pairs of X or Y

Question 18

Q

how do we reject the null?

Answer

Study These Flashcards

A

with a high χ2 (and so low P value) meaning that there IS a significant correlation between at least one pair of variables across groups.

or test statistic is greater than critial value

canonical clustering Flashcards

(18 cards)