week 7 exploratory factor analysis Flashcards
(21 cards)
What is Factor Analysis?
A factor analysis reduces the number of variables designed to measure a similar thing to one dimension or factor.Factor analyses can confirm or reject theories by comparing patterns between variables. Used for concepts which are harder to precisely measure. Often (eg anger) is implied to exist due to other responses being present (eg I feel like breaking things).
IMPORTANT POINTS:
1. Psychological constructs are often abstract & hypothetical.
2.We can never directly or physically measure them.
3.Scores on psychological tests rarely provide absolute, ratio measures of psychological attributes
-all we can do is compare our scores to other scores (norms)
-or, examine the pattern of relationships b/n variables to see if they fit with out theories.
TERMINOLOGY
1. Factor=Unobserved Variable. eg. Anger. The thing we are theorising exists, but we cannot directly measure. May also sometimes be called a Dimension, or a Latent Variable.
2. Observed Variable= what we actually get scores of. eg in this example Indicators of Anger.May also be called Variables/Observed Measures/Observed Variables/Indicators.
HOW IS FACTOR ANALYSIS USED?
1. To provide evidence of psychological constructs
eg Personality (how many dimensions or factors??)
eg Intelligence and abilities
eg Attitudes
eg Behaviours
2.To provide psychometric information on the validity of psychological measures.
eg Construct validity
eg Discriminant & convergent validity
eg Item selection
eg Factorial validity
FACTORIAL VALIDITY
There are 2 forms of factor analysis;
1. Exploratory
2. Confirmatory
EXPLORATORY FACTORIAL ANALYSIS
There are no specifications as to which factors variables should load onto.
Used to explore how many factors are present in our data.
Statistically driven (but guided by theory).
Factorial Validity
In exploratory factor analysis, which is the focus here, the output is used to guide the explanation of the results. If the output fits the theory, you have factorial validity.Factor analysis looks at the relationships between continuous variables and the patterns of correlations in the data. These relationships are then assessed by correlation coefficients, which indicate how much variance is shared between two or more variables. Unlike Pearson’s correlations, which show shared variance between two variables, a factor analysis looks at shared variance between three or more variables and is driven by data and statistics.
Correlations measure how much two variables vary, whereas variance is a standard deviation, e.g. how much scores vary along a scale. Without the equations and statistics, factor analysis is essentially how much the variables overlap. Using the previous example about anger, let’s now assume we have ten variables. The factor analysis will then look at how these ten variables relate, e.g. they may overlap to form one factor or overlap differently to form a different factor. Nevertheless, it is a shared variance.
Relationships are assessed using Correlation Coefficients (r).
Factorial validity 2
Factor analysis looks at how much factors overlap together. Just looking at often more than 2 variables and how they overlap. It is about shared variance.
Eigen Value= a measure of how much shared variance there is across all of the items.
The pattern of shared vairance must correspond to the hypothesised patterns of constructs in the “real world”. eg we expect there to be a weak relationship (ie not much) between Authoritarian Parenting Style and Reasoning Parenting Style because we believe a parent will mostly follow 1 style or another.
Factorial validity 3
Factorial validity 4
The theory of Authoritarian Parenting vs Reasoned Parenting. ie this is what we anticipate the relationship looks like. So we are trying to determine the correlation between 2 latent/unobserved variables.
factorial validity 5
(demons=demonstrate)
Shows how the authoritarian factors are correlating with each other and how the reasoned facors are correlating with each other positively. Authoritarian facors are negatively correlating with reasoned factors.
factorial validity 6
Imagine if we able to 3d plot the factors. Below is not accurate but gives a sense of how factor analysis works.
Factor analysis works by iteration (over and over again). The Eigen Value is the first thing found in the factor analysis and is all the shared variance of all the factors. 2.54 divided by 7 (have 7 factors)=0.36=36%.
After finding the first shared variance, this is removed, and the analysis re-calcuates to see if there is another shared variance that can further explain the variance.
Factorial validity 7
The process of finding eigen values is repeated.
Factor 2 explains 25% of the variance (still divide by 7 factors each time).
the iterations are done 7 times. Theoretically, could have 7 factors, but from our theory, we are hoping to reduce them to 2 factors (authoritarian or reasoned).
factorial validity 8.
The scree plot helps visualise the factors and how many can reasonably reduce to. Look for where the “elbow”. So visually, maybe would be looking at reducing 7 factors to 2 or 3 factors?
Extractions, Interpretations & Correlations
When using SPSS, you’ll need to select a method of extraction. This is the assumption SPSS takes in deciding how many factors to obtain and what variables comprise those factors. There are several different types of methods, with principal component analysis (PCA) being one of the more commonly known. However, PCA assumes that all items share 100% of the variance, which is never the case due to various forms of error variance. Hence, while PCA can be a good method to make sense of what’s going on, it is more efficient and less error-prone to use a different factor analysis technique, e.g. maximum likelihood. The reason is that this leaves you with more refined data, as only the unique shared variance (not including error) is used to identify factors.
PCA is not really Factor Analysis. There are many different methods for Factor Analysis. Factor Analysis starts with the error (non shared variance removed).
PRINCIPAL COM[PONENTS ANALYSIS;
-scores on each variable are standardised so that variance =1.
-The total amount of variance is equal to the number of variables.
-the factors that are extracted from all available variance (including error)
-assumes the factors are responsible for all of the variance in each variable.(note this is a huge assumption andoften wrong. eg in the example re parenting style, there is likely to be variation other than parenting style eg due to social pressure and reluctant to be truthful, or eg due to “can’t remember” etc. ie not all variation in answers is just due to parentingstyle.)
FACTOR ANALYSIS
-only uses the shared variance b/n all variables when extracting factors.
-assumes the factors are not responsible for all of the variance in each variable.
-acknowledges error in measurement and the uniqueness of each variable.
Communalities
Communalities will vary depending on if used PCA or Factor Analysis.
For Pca, the initial communality will always be 1 (because has not yet extracted anything). Look at the communality after extraction. eg for Smack 54% of the variation in smack (via PCA) is explained by all of the factors.
ML=Maximum Liklihood, a type of Factor Analysis. In the ML, smack explained 34% of the shared variance.
how to work out how many factors?
Spss is not wonderful for doing this.
There are 4 ways;
-Kaiser’s Criterion-traditionally most popular.Uses the rule that Eigen value has to be at least 1, to be a factor.
-Cattell’s scree plot. Items on the straight line (beyond “the elbow” are not big enough to be factors)
-Choose your own (theory driven). eg can tell spss that you want 2 factors. These first 3 are fairly subjective.
-chi squared comparison ie chi square compare 2 factor model and 3 factor model and see which fits best etc.
Table below shows 2 factors have been determined as their extracted eigen values were greater than 1.
factor loadings
Whilst there are various ways to determine how many factors, should also take care to consider are they meaningful? ie From our theory, we would expect the authoritarian aspects to load togethr and the reasoned aspects to load together.
A factor loading tells us how far a variable is from the factor vector. Closer, has a higher factor loading. You probably want a factor loading to be =/> 0.4 (but also depends on sample size).
Ideally, factor loadings should show a simple structure.Factor loadings tell us which factor a variable belongs to (ideally only 1 factor but life isusually not that simple).
The smaller your sample size, the higher your cut ioff needs to be for a factor loading to be significant.
what is a “good” factor loading?
our sample was approx 300 so used .35 or .4 as a good, meaningful factor loading cut off point.
eg good factor loading
highlighted from the eg, which variables are considered to have good factor loading. ie So this was not a simple pattern (so theory not brilliant). They are also more complicated ways of calculating factors.
Rotating factors
If you are not getting a simple structure and want to represent the variables better, you can rotate the factors and move the axes that the factors are on. There are two primary types of rotation: orthogonal and oblique.
If the factors are expected not to be correlated, the more suitable rotation would be the orthogonal (right angle) rotation to separate the factors from each other. However, if factors are expected to be correlated, the oblique rotation would be more suitable.
The matrix will be different depending on the type of rotation used, the type of factor analysis you do and the principal components or factor analysis.
MATRICES
1. No rotation & orthogonal rotation:a) Component matrix (PCA)
b) Factor matrix (maximum liklihood
2.Oblique rotation: a)components matrix, structure matrix, patterna matrix (PCA)
b) Factor matrix, structure matrix, pattern matrix (Maximum liklihood)
3. Component/pattern matrices; actual factor loadings (similar to regression coefficients)
4. Structure matrix;correlation b/n the item/variable and the factor.
Factorial validity, chi-square difference and conditions for factor analysis
After the factor analysis is completed, you can see whether the patterns in the data fit what was expected. For that, you could use a diagram as a visual representation of the results. In this diagram are the two factors that have a weak to negative relationship between them, as expected. The results also give us chi-square statistics, which tells us about how well the model fits the data. In this case, it is as follows:
Chi squared = 15.18, p=0.06
This is not significant, so the model is not different from an ideal pattern of correlations. Therefore, the data is a good fit with our theory.
The ideal correlations are what we’d find in a perfect hypothetical world where there is no error variance, and our correlations were exactly in accordance with the theory. That is, all of the authoritarian parenting items share 100% of the variance, as do the reasoning parenting items.
So the parenting styles are explaining 47% of the variance in the parents’ responses. Ideally would have preferred it to be over 50%, but how much needs to be changes with the situation.
(Chi squared statistic is always comparing 1 tablet with another…)
Factorial validity, chi-square difference and conditions for factor analysis2
In the background, spsss is also calculating the ideal correlations (ie if the 2 factor model is true), but will not show it unless requested.
ie we want the chi squared statistic to be non significant, because this means our theory fits!
chi-squared difference test
conditions for factor analysis.