Principal Components Analysis Flashcards

(9 cards)

1
Q

What is Principal Components Analysis (PCA)?

A

An unsupervised process by which principal components are computed, and the subsequent use of these components in understanding the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does PCA help with large data sets of correlated variables?

A

PCA helps summarize by extracting representative variables aka the ones that explains most of the variability collectively in the original set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the PCA process and its significance?

A

Compute the principal components then plot them against each other to produce the lowest dimensional views of the data. This is much simpler than looking at the whole dataset. Fitting our principal components to the highest variance, we are producing low-dimensional linear surfaces that are closest to the observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the first component of PCA

A

The normalized linear combination of the features that has the largest variance – want to understand this as much as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe the second component of PCA

A

Has maximal variance out of all linear combinations that are uncorrelated with the first principal component

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why is the second component of PCA orthogonal to the first?

A

To understand best possible with remaining data after the first principal is in place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the zero-correlation condition of the first and second component equivalent to what?

A

The direction of the second component being perpendicular or orthogonal to the first principal component direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the effect of scaling the variables in PCA?

A

Variability is driving the decision of where components land. So variables should be centered to have mean 0 so we scale with a standard deviation of 1 where we transfer weights of variables and then perform PCA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a scree plot and what do we want from it?

A

Used to identify the number of principal components to consider in our analysis. We want the smallest number of principal components to require a good understanding of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly