Principal Components Analysis Flashcards
(9 cards)
What is Principal Components Analysis (PCA)?
An unsupervised process by which principal components are computed, and the subsequent use of these components in understanding the data
How does PCA help with large data sets of correlated variables?
PCA helps summarize by extracting representative variables aka the ones that explains most of the variability collectively in the original set
Describe the PCA process and its significance?
Compute the principal components then plot them against each other to produce the lowest dimensional views of the data. This is much simpler than looking at the whole dataset. Fitting our principal components to the highest variance, we are producing low-dimensional linear surfaces that are closest to the observations
Describe the first component of PCA
The normalized linear combination of the features that has the largest variance – want to understand this as much as possible
Describe the second component of PCA
Has maximal variance out of all linear combinations that are uncorrelated with the first principal component
Why is the second component of PCA orthogonal to the first?
To understand best possible with remaining data after the first principal is in place
What is the zero-correlation condition of the first and second component equivalent to what?
The direction of the second component being perpendicular or orthogonal to the first principal component direction
What is the effect of scaling the variables in PCA?
Variability is driving the decision of where components land. So variables should be centered to have mean 0 so we scale with a standard deviation of 1 where we transfer weights of variables and then perform PCA
What is a scree plot and what do we want from it?
Used to identify the number of principal components to consider in our analysis. We want the smallest number of principal components to require a good understanding of the data.