Latent Class Analysis Flashcards
What is the main goal of latent class analysis?
You categorise people e.g what calculator are you?
Describe a latent class profile
The items go along the x axis while the probabilities of scoring on that item go along the y axis. Lines are then constructed for each classification giving the probability for each item.
What data is applicable to latent class analysis?
Categorical latent data and categorical observed data
The latent variable, θ, is categorical
• (e.g., θ=1, θ=2, θ=3, or
θ=“avoidant”, θ=“anxious”, θ=“secure”, )
Give some examples of categorical latent variables in psychology
- Developmental stages (Piaget)
- Attachment types A,B,C,D
- Deviant behavior: Autism, Dyslexia
- Learning styles
- Personality types
- Types of anti-social behaviour
- Mastery versus non-mastery of a skill
How do some researchers (who aren’t super into methods) categorise subjects? What problems arise here? (4)
Using their own ad-hoc criteria ( cutting off continuous data at certain points to create categories): • Criteria are arbitrary • You cannot falsify classes • You cannot find new classes • No explicit falsifiable model
You need a clear theory which gives a reason for these cut off points, otherwise this makes it very difficult.
What are the categories referred to as?
Each category is called a “latent class”. A subject only belongs to one class
What assumption is there about the items within each class?
Within each class, the items are independent (local independence)
Therefore, what is the goal of LCA?
Classify the subjects to the latent classes on the basis of the observed item scores
Compare the IRT model function to the LCA model function
In IRT, the conditional probabilities, 𝑃 (𝑌𝑝𝑖 =1 |𝜃𝑝) , are given by a line as q is continuous. The line is charaterised by the item parameters.
In LCA, the conditional probabilities, 𝑃 (𝑌𝑝𝑖 =1 | 𝜃𝑝 =𝑡) , are single points as the latent trait is categorical. The item parameters are these conditional probabilities. (see doc). The categories are along the x axis while the conditional probabilities are along the y axis when graphed, however a table is more commonly used.
Describe the what you feed to R in LCA (or what R makes for you) in regards to the factor structure, i.e instead of the covariance matrix in factor analysis
There is a score pattern listing out the possible outcomes (e.g 0000, all wrong; 0110, a and d wrong, b and c correct). Beside it is a column labelled Fijkl which lists how much each outcome was observed in the data. So if pattern 0000 has 93 beside it, then 93 participants scored all 4 items incorrectly. Fijkl just means the frequency of a response e.g 𝑖 is the response to A (𝑖 =0 or 𝑖 =1).
Instead of a covariance matrix, there are M independent pieces of information (response patterns minus 1): 𝑀 =𝐶^𝑛 −1
where 𝐶 is the number of categories, and 𝑛 is the number of items. So in 2 categories and four items:
𝑀 =𝐶^𝑛 −1 =2^4 −1=15 independent pieces of information
𝐶^𝑛 = ?
Why do we subtract 1?
𝐶^𝑛 = the number of response patterns
We -1 because if we know all except one of the frequencies, we can calculate the last one based on the total number of observations. Therefore it is not an independent piece of information.
Describe the typical output of a latent class model
A table with the classifications forming the rows and the size forming the first column and the items forming the rest of the columns. The conditional probabilities are then given for each item given each class.
What is meant by the class size?
The proportion of people in that class(ification)
Describe a probability notation for the equation used in LCA
Probability of being in class 𝑡 and observing response vector [𝑖𝑗𝑘𝑙]is given by:
𝑃 (𝑋𝑝 = [𝑖𝑗𝑘𝑙] & 𝜃𝑝 = 𝑡 ) = 𝑃 (𝜃𝑝 =𝑡) ×𝑃 (𝑋𝑝1 =𝑖|𝜃𝑝 =𝑡)×𝑃(𝑋𝑝2 =𝑗|𝜃𝑝 =𝑡) ×𝑃 (𝑋𝑝3 =𝑘|𝜃𝑝 =𝑡)×𝑃(𝑋𝑝4 =𝑙|𝜃𝑝 =𝑡)
i.e
The probability of a pattern and being in a class = The probability of being in a class (class size) x the probability of that answer pattern (e.g prob of 1 on A x prob 0 on b etc)
Explain the formula given in the book for this calculation
They use weird notation:
π|ABCDX, ijklt| = π|X,t| π|A|X,it| π|B|X,jt| π|C|X,kt| π|D|X,lt|
π refers to probability
X is the latent variable (instead of theta)
π|X,t| is the class size: the probability of being on level t on X. π|X,1| is the probability of being in class 1
π|A|X,it| is the conditional probability of scoring correctly on item A given level t. π|A,1|X,1| is the probability of scoring correctly on item A given that you’re in class 1. (table in docs)
What are the main approaches to parameter estimation in LCA (2) and what approach will we use?
Bayesian estimation and maximum likelihood
We will do maximum likelihood using the ‘Ica.r’ R package developed by Han van der Maas
How do you calculate the maximum likelihood?
if the probability of observing response vector [𝑖𝑗𝑘𝑙] in class 𝑡 is the probability of response A x response B…
and the probability of observing response vector [𝑖𝑗𝑘𝑙] AND class 𝑡 is the probability of being in a class x probability of response A x response B…
Then the overall probability is of observing [𝑖𝑗𝑘𝑙] is summing the probability of observing response vector and class t over the classes: π|ABCD, ijkl| = Et π|X,t| π|A|X,it| π|B|X,jt| π|C|X,kt| π|D|X,lt|
So for the probabuility of scoring 1111, you would multiply the probabilities of scoring 1111 in class 1, then multiply the probabilities of scoring 1111 in class 2 and adding the results.
What is the purpose of the log-likelihood function?
The probabilities of observing the data are to the power of a larger number (Fijkl) and thus are extremely small. The log of these values are easier to deal with
What two algorithms can be used for maximisation?
Expectation Maximization (EM) algorithm
Newton-Raphson algorithm
Why is knowing these calculation steps useful?
1) understanding how you estimate the parameters of a latent class model
2) This process will be used to assess model fit in a bit
What two methods are described to maximise the likelihood?
Expectation Maximization (EM) algorithm
Newton-Raphson algorithm
What two steps are involved in the expectation maximisation algorithm?
- E step:
Determine the expected values of the latent classes given some initial values for the other parameters - M step:
Maximize the likelihood using these expected values to obtain new values for the other parameters
Iterate between 1 and 2 until the parameter values don’t change much anymore. When they stay around the same value you know the models have converged.
What is involved in the Newton-Raphson algorithm?
Approximates the log-likelihood function locally using linear functions to find directions towards the maximum
e.g start somewhere random, use a linear approximation to figure out which direction to go. Use another linear approximation in that direction, new linear approximation etc.
How do we scale the latent variable in LCA?
For the first time in this course we don’t need to . The latent variable has a scale defined by the number of categories