Principal Components Analysis Flashcards

Question 1

Q

What is Principal Components Analysis (PCA)?

Answer

A

An unsupervised process by which principal components are computed, and the subsequent use of these components in understanding the data

Question 2

Q

How does PCA help with large data sets of correlated variables?

Answer

A

PCA helps summarize by extracting representative variables aka the ones that explains most of the variability collectively in the original set

Question 3

Q

Describe the PCA process and its significance?

Answer

A

Compute the principal components then plot them against each other to produce the lowest dimensional views of the data. This is much simpler than looking at the whole dataset. Fitting our principal components to the highest variance, we are producing low-dimensional linear surfaces that are closest to the observations

Question 4

Q

Describe the first component of PCA

Answer

A

The normalized linear combination of the features that has the largest variance – want to understand this as much as possible

Question 5

Q

Describe the second component of PCA

Answer

A

Has maximal variance out of all linear combinations that are uncorrelated with the first principal component

Question 6

Q

Why is the second component of PCA orthogonal to the first?

Answer

A

To understand best possible with remaining data after the first principal is in place

Question 7

Q

What is the zero-correlation condition of the first and second component equivalent to what?

Answer

A

The direction of the second component being perpendicular or orthogonal to the first principal component direction

Question 8

Q

What is the effect of scaling the variables in PCA?

Answer

A

Variability is driving the decision of where components land. So variables should be centered to have mean 0 so we scale with a standard deviation of 1 where we transfer weights of variables and then perform PCA

Question 9

Q

What is a scree plot and what do we want from it?

Answer

A

Used to identify the number of principal components to consider in our analysis. We want the smallest number of principal components to require a good understanding of the data.

Principal Components Analysis Flashcards

(9 cards)