WK 9 Flashcards by Crystal Z

What are data/dimension reduction and what do they do?

They are mathematical and statistical procedures that reduce a large set of variables to a smaller set

How well did you know this?

Not at all

Perfectly

What is the goal in principal components analysis?

Goal is to explain as much of the total variance in a data set as possible

How well did you know this?

Not at all

Perfectly

What are the steps in principal components analysis?

-starts with original data
-calculates covariances (correlations) between variables
-applies procedure called eigendecompostition to calculate a set of linear composites of the original variables

How well did you know this?

Not at all

Perfectly

What does principal components analysis do?

It repackages the variance from the correlation matrix into a set of components, through the process of eigendecompostion

How well did you know this?

Not at all

Perfectly

What is the first component?

It is the linear combination that accounts for the most possible variance

How well did you know this?

Not at all

Perfectly

What are the second and subsequent components?

Second component accounts for second largest amount of variance after the variance accounted for by the first is removed
- third accounts for third largest etc

How well did you know this?

Not at all

Perfectly

What does each component account for?

Each component accounts for as much remaining variance as possible

How well did you know this?

Not at all

Perfectly

If variables are closely related, what number of correlations do they have, and how do we represent them?

If variables are closely related, they have large correlations, then we can represent them by fewer composites

How well did you know this?

Not at all

Perfectly

If variables are not very closely related, what number of correlations do they have, and how do we represent them?

If variables are not very closely related, they have small correlations, then we will need more composites to adequately represent them.

How well did you know this?

Not at all

Perfectly

If variables are entirely uncorrelated, how many components do we need?

We will need as many components as there were variables in the original correlation matrix

How well did you know this?

Not at all

Perfectly

What is eigendecomposition?

It is a transformation of the correlation matrix to re-express it in terms of eigenvalues and eigenvectors

How well did you know this?

Not at all

Perfectly

How many eigenvectors and eigenvalues do you have for each component?

There is one eigenvector and one eigenvalue for each component

How well did you know this?

Not at all

Perfectly

What are eigenvalues?

Eigenvalues are a measure of the size of the variance packaged into a component

How well did you know this?

Not at all

Perfectly

What do larger eigenvalues mean?

They mean that the component accounts for a large proportion of the variance

How well did you know this?

Not at all

Perfectly

What do eigenvectors provide information on?

They provide information on the relationship of each variable to each component

How well did you know this?

Not at all

Perfectly

What are eigenvectors?

They are sets of weights (one weight per variable in original correlation matrix)
e.g., if we had 5 variables each eigenvector would contain 5 weights

How well did you know this?

Not at all

Perfectly

What will the some of the eigenvalues equal?

The sum of the eigenvalues will equal the number of variables in the data set

How well did you know this?

Not at all

Perfectly

What is the covariance of an item with itself?

The covariance of an item with itself is 1

How well did you know this?

Not at all

Perfectly

When you add up the covariance of items, what do you get?

Adding these up = total variance

How well did you know this?

Not at all

Perfectly

What does a full eigendecomposition account for?

It will account for all variance distributed across eigenvalues so the sum of the eigenvalues must = 1

How well did you know this?

Not at all

Perfectly

We use eigenvectors to think about the nature of components. To do so, what do we do?

Study These Flashcards

We convert eigenvectors to PCA loadings

What does a PCA loading give?

Study These Flashcards

A PCA loading gives the strength of the relationship between the item and the component

What is the range of PCA loadings?

Study These Flashcards

Range from -1 to 1

In a PCA loading, what does a higher absolute value indicate?

Study These Flashcards

The higher the absolute value, the stronger the relationship

What will the sum of squared loadings for any variable on all components equal?

The sum of square loadings for any variable on all components will equal 1 - that is all the variance in the item is explained by the full decomposition

Where does dimension reduction come from?

Comes from keeping only the largest components

What can our decisions on how many components to keep be guided by?

- set a amount of variance you wish to account for - scree plot - minimum average partial test (MAP) - parallel analysis

What is the simplest method we can use to select a number of components?

Simply state a minimum variance we wish to account for (We then select the number of components above this value)

What is a scree plot based on?

Based on plotting the eigenvalues

What are you looking for in a scree plot?

Looking for a sudden change of slope

What is the scree plot assumed to show?

It is assumed to potentially reflect the point at which components become substantively unimportant (The points should drop, as variance decreases across the components)

In a scree plot, what is inferred as the slope flattens?

As the slope flattens, each subsequent component is not explaining much additional variance

On a scree plot, what is on the x-axis?

The component number

On a scree plot, what is on the y-axis?

The eigenvalue for each component

How do we decide what components to keep using a scree plot?

Keep the components with eigenvalues above a kink in the plot

What does the minimum average partial (MAP) test do?

MAP extracts components iteratively from the correlation matrix

What is the scree plot assumed to show?

It is assumed to potentially reflect the point at which components become substantively unimportant (the points should drop, as variance decreases across the components)

What is the trend we see with MAP values?

At first this quantity goes down with each component extracted but then it starts to increase again

What components does MAP keep?

MAP keeps the components from point at which the average squared partial correlation is at its smallest (point before there is an increase)

How do we obtain results of the MAP test?

using vss() function from psych package

What is parallel analysis?

Parallel analysis simulates datasets with same number of participants and variables but no correlations

What does parallel analysis compute?

It computes an eigen-decompostion for the simulated datasets

What does parallel analysis compare?

It compares the average eigenvalue across the simulated datasets for each component

What happens if a real eigenvalue exceeds the corresponding average eigenvalue from the simulated datasets?

It is retained

How do we conduct parallel analysis in R?

fa.parallel() function in psych package

What is a limitation of scree plots?

Scree plots are subjective and may have multiple or no obvious kinks

What is a limitation of parallel analysis?

Parallel analysis sometimes suggests too many components (over-extraction)

What is a limitation of MAP?

MAP sometimes suggests too few components (under-extraction)

What should you do if your MAP and parallel analysis disagree?

If the two tests disagree with each other, think about parallel analysis as an absolute maximum, and MAP a minimum value -> therefore you have a range in which optimum answer is probably within

How are component loadings calculated and how can they be interpreted?

Component loadings are calculated from the values in the eigenvectors and they can be interpreted as the correlations between variables and components

In a component loading matrix, when looking at the output what are SS loadings?

They are the eigenvalues

What does a good PCA solution explain?

It explains the variance of the original correlation matrix in as few components as possible

WK 9 Flashcards

(52 cards)