PCA Final Flashcards

(33 cards)

1
Q

PCA

A

Principal Component Analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

PCA is a

A

dimensionality reduction technique

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Big idea 1: Take dataset in high dimension space

A

and transform it so it can be represented in low dimension space, with minimal or no loss of information

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Big idea 2: Extract

A

latent information from the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The PCA transformation results in

A

a smaller number of principal components that maximizes the variation of the original dataset, but in low dimension space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

These principal components are

A

linear combinations of the original variables, and become the new axes of the dataset in low dimension space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

3 goals of PCA

A

Feature reduction: reduce the number of features used to represent the data
The reduced feature set should explain a large amount of information (or maximize variance)
Make visible the latent information in the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

PCA creates

A

projections (principal components) in the direction that captures most of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Sparser data has

A

greater variance (spread out)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Denser data has

A

lesser variance (clustered together)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The projections will always be

A

orthogonal to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Mathematics behind PCA

A

Eigenvalues and Eigenvectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Mathematics equation

A

Matrix A times eigenvector X = Eigenvalue times eigenvector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Eigenvalue and Eigenvector meaning

A

An eigenvector of a matrix is a nonzero vector that, when it is multiplied by the matrix, does not change its direction. Instead, the vector is simply scaled by some factor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Eigenvector are vectors that

A

remain unchanged when multiplied by A, except for a change in magnitude. Their direction remains unchanged when a linear transformation is applied to it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When we eigendecompose, when we decompose matrix, do eigendecomposition

A

if my matrix has n columns or n dimensions, i am going to have n eigenvalues and n eigenvectors

17
Q

Our matrix/dataset gets decomposed into

A

Eigenvectors
Eigenvalues

18
Q

Should we standardize for PCA?

A

Yes, always standardize

19
Q

five fields returned from prcomp(A,…)

A

sdev
rotation
center
scale
x

20
Q

sdev

A

Square root of the eigenvalues, ordered from largest eigenvalue to the smallest

21
Q

rotation

A

Matrix whose columns contain the eigenvectors (also called principal loadings)

22
Q

center

A

Mean of the columns of the matrix A

23
Q

scale

A

std dev of the columns of the matrix A

24
Q

x

A

Data from matrix A in rotated space (also called principal component scores)

25
How is the data in rotated space computed
dot product
26
Top and right axis indicate
tell you where these vectors are going to occur, for loading vectors
27
Bottom and left axis indicate
The scores by which we situate the data point in their new rotated states
28
How many principal components do we need?
As many that explain most of the variance, and adding any more to the model results in diminishing gains in variance
29
Key idea: What is the proportion of variance
contributed by each principal component loading?
30
Total Variation
sum of all PC
31
Proportion of variance explained by ith principal component loading
PCi / TotalVariation
32
variance is
squared std dev
33
What do you have to do before attempting to use observations in any model?
Transform all of your observations (in sample, out of sample) from their natural representation to principal component scores