WK 9 Flashcards
What are data/dimension reduction and what do they do?
They are mathematical and statistical procedures that reduce a large set of variables to a smaller set
What is the goal in principal components analysis?
Goal is to explain as much of the total variance in a data set as possible
What are the steps in principal components analysis?
-starts with original data
-calculates covariances (correlations) between variables
-applies procedure called eigendecompostition to calculate a set of linear composites of the original variables
What does principal components analysis do?
It repackages the variance from the correlation matrix into a set of components, through the process of eigendecompostion
What is the first component?
It is the linear combination that accounts for the most possible variance
What are the second and subsequent components?
Second component accounts for second largest amount of variance after the variance accounted for by the first is removed
- third accounts for third largest etc
What does each component account for?
Each component accounts for as much remaining variance as possible
If variables are closely related, what number of correlations do they have, and how do we represent them?
If variables are closely related, they have large correlations, then we can represent them by fewer composites
If variables are not very closely related, what number of correlations do they have, and how do we represent them?
If variables are not very closely related, they have small correlations, then we will need more composites to adequately represent them.
If variables are entirely uncorrelated, how many components do we need?
We will need as many components as there were variables in the original correlation matrix
What is eigendecomposition?
It is a transformation of the correlation matrix to re-express it in terms of eigenvalues and eigenvectors
How many eigenvectors and eigenvalues do you have for each component?
There is one eigenvector and one eigenvalue for each component
What are eigenvalues?
Eigenvalues are a measure of the size of the variance packaged into a component
What do larger eigenvalues mean?
They mean that the component accounts for a large proportion of the variance
What do eigenvectors provide information on?
They provide information on the relationship of each variable to each component
What are eigenvectors?
They are sets of weights (one weight per variable in original correlation matrix)
e.g., if we had 5 variables each eigenvector would contain 5 weights
What will the some of the eigenvalues equal?
The sum of the eigenvalues will equal the number of variables in the data set
What is the covariance of an item with itself?
The covariance of an item with itself is 1
When you add up the covariance of items, what do you get?
Adding these up = total variance
What does a full eigendecomposition account for?
It will account for all variance distributed across eigenvalues so the sum of the eigenvalues must = 1