misc test 3 prep Flashcards
(10 cards)
Clustering and LDA both separate data into groups. What is the main difference between Clustering and LDA?
With clustering, the groups are not known in advance
What question does canonical correlation analysis answer?
What, if any, relationships between groups of variables exist, and how strong are they?
T or F: If the correlation between two U1 and V1 is .95, this means that there is a significant relationship between these groups of variables.
False. The classical correlation tells us about the strength of the relationship, not the significance. Any combination of strong/weak, significant/insignificant is possible
T or F: Scaling values before a cluster analysis can change the results
True. If variables have differing units, one may dominate distance calculation if not scaled.
T or F; Using Forgy usually results in more spread out intial centroids than random partition.
True. Since random partition averages a sample of points to find a centroid, it will be closer to the group average than a single point estimation.
T or F: If the original variables are independent, there is a unique solution for the coefficients of U1 and V1
False. There are many solutions, but they are (usually) scales of each other. Note that every scale of an eigenvector is also an eigenvector for the same value, and loadings are not normalized in CCA, so different algorithms may produce solutions that are scales of each other
T or F: If the original variables are dependent, there are multiple solutions for the coefficients of U1 and V1 which are not scales of each other.
True. We can solve for one variable in terms of the others, replace that variable within U1 and obtain a new solution.
T or F: Variables with a VIF score over 5 are highly dependent on other variables.
True. If V IF > 5 then |r| > .89, indicating a strong linear relationship.
A higher number of observations makes Bartlett’s test more sensitive (more likely to observe a true effect).
True. A higher n value makes the chi^2 value higher for the same eigenvalues
Give one specific way that multicollinearity affects the various calculations we perform
Some potential answers:
Some variables may be overemphasized, and thus contribute more to our analysis
than others.
Solutions to maximization problems are not unique up to scaling, so solutions may become unstable.
Matrices may become singular (non-invertible) or nearly singular which may prevent us from finding inverse matrices, or make solutions more sensitive to inputs.