Latent Class Analysis Flashcards by oisin mcelwain

What is the main goal of latent class analysis?

You categorise people e.g what calculator are you?

How well did you know this?

Not at all

Perfectly

Describe a latent class profile

The items go along the x axis while the probabilities of scoring on that item go along the y axis. Lines are then constructed for each classification giving the probability for each item.

How well did you know this?

Not at all

Perfectly

What data is applicable to latent class analysis?

Categorical latent data and categorical observed data

The latent variable, θ, is categorical
• (e.g., θ=1, θ=2, θ=3, or
θ=“avoidant”, θ=“anxious”, θ=“secure”, )

How well did you know this?

Not at all

Perfectly

Give some examples of categorical latent variables in psychology

Developmental stages (Piaget)
Attachment types A,B,C,D
Deviant behavior: Autism, Dyslexia
Learning styles
Personality types
Types of anti-social behaviour
Mastery versus non-mastery of a skill

How well did you know this?

Not at all

Perfectly

How do some researchers (who aren’t super into methods) categorise subjects? What problems arise here? (4)

Using their own ad-hoc criteria ( cutting off continuous data at certain points to create categories):
• Criteria are arbitrary
• You cannot falsify classes
• You cannot find new classes
• No explicit falsifiable model

You need a clear theory which gives a reason for these cut off points, otherwise this makes it very difficult.

How well did you know this?

Not at all

Perfectly

What are the categories referred to as?

Each category is called a “latent class”. A subject only belongs to one class

How well did you know this?

Not at all

Perfectly

What assumption is there about the items within each class?

Within each class, the items are independent (local independence)

How well did you know this?

Not at all

Perfectly

Therefore, what is the goal of LCA?

Classify the subjects to the latent classes on the basis of the observed item scores

How well did you know this?

Not at all

Perfectly

Compare the IRT model function to the LCA model function

In IRT, the conditional probabilities, 𝑃 (𝑌𝑝𝑖 =1 |𝜃𝑝) , are given by a line as q is continuous. The line is charaterised by the item parameters.

In LCA, the conditional probabilities, 𝑃 (𝑌𝑝𝑖 =1 | 𝜃𝑝 =𝑡) , are single points as the latent trait is categorical. The item parameters are these conditional probabilities. (see doc). The categories are along the x axis while the conditional probabilities are along the y axis when graphed, however a table is more commonly used.

How well did you know this?

Not at all

Perfectly

Describe the what you feed to R in LCA (or what R makes for you) in regards to the factor structure, i.e instead of the covariance matrix in factor analysis

There is a score pattern listing out the possible outcomes (e.g 0000, all wrong; 0110, a and d wrong, b and c correct). Beside it is a column labelled Fijkl which lists how much each outcome was observed in the data. So if pattern 0000 has 93 beside it, then 93 participants scored all 4 items incorrectly. Fijkl just means the frequency of a response e.g 𝑖 is the response to A (𝑖 =0 or 𝑖 =1).

Instead of a covariance matrix, there are M independent pieces of information (response patterns minus 1): 𝑀 =𝐶^𝑛 −1
where 𝐶 is the number of categories, and 𝑛 is the number of items. So in 2 categories and four items:
𝑀 =𝐶^𝑛 −1 =2^4 −1=15 independent pieces of information

How well did you know this?

Not at all

Perfectly

𝐶^𝑛 = ?

Why do we subtract 1?

𝐶^𝑛 = the number of response patterns

We -1 because if we know all except one of the frequencies, we can calculate the last one based on the total number of observations. Therefore it is not an independent piece of information.

How well did you know this?

Not at all

Perfectly

Describe the typical output of a latent class model

A table with the classifications forming the rows and the size forming the first column and the items forming the rest of the columns. The conditional probabilities are then given for each item given each class.

How well did you know this?

Not at all

Perfectly

What is meant by the class size?

The proportion of people in that class(ification)

How well did you know this?

Not at all

Perfectly

Describe a probability notation for the equation used in LCA

Probability of being in class 𝑡 and observing response vector [𝑖𝑗𝑘𝑙]is given by:
𝑃 (𝑋𝑝 = [𝑖𝑗𝑘𝑙] & 𝜃𝑝 = 𝑡 ) = 𝑃 (𝜃𝑝 =𝑡) ×𝑃 (𝑋𝑝1 =𝑖|𝜃𝑝 =𝑡)×𝑃(𝑋𝑝2 =𝑗|𝜃𝑝 =𝑡) ×𝑃 (𝑋𝑝3 =𝑘|𝜃𝑝 =𝑡)×𝑃(𝑋𝑝4 =𝑙|𝜃𝑝 =𝑡)
i.e
The probability of a pattern and being in a class = The probability of being in a class (class size) x the probability of that answer pattern (e.g prob of 1 on A x prob 0 on b etc)

How well did you know this?

Not at all

Perfectly

Explain the formula given in the book for this calculation

They use weird notation:
π|ABCDX, ijklt| = π|X,t| π|A|X,it| π|B|X,jt| π|C|X,kt| π|D|X,lt|

π refers to probability
X is the latent variable (instead of theta)

π|X,t| is the class size: the probability of being on level t on X. π|X,1| is the probability of being in class 1

π|A|X,it| is the conditional probability of scoring correctly on item A given level t. π|A,1|X,1| is the probability of scoring correctly on item A given that you’re in class 1. (table in docs)

How well did you know this?

Not at all

Perfectly

What are the main approaches to parameter estimation in LCA (2) and what approach will we use?

Study These Flashcards

Bayesian estimation and maximum likelihood

We will do maximum likelihood using the ‘Ica.r’ R package developed by Han van der Maas

How do you calculate the maximum likelihood?

Study These Flashcards

if the probability of observing response vector [𝑖𝑗𝑘𝑙] in class 𝑡 is the probability of response A x response B…

and the probability of observing response vector [𝑖𝑗𝑘𝑙] AND class 𝑡 is the probability of being in a class x probability of response A x response B…

Then the overall probability is of observing [𝑖𝑗𝑘𝑙] is summing the probability of observing response vector and class t over the classes:
π|ABCD, ijkl| = Et π|X,t| π|A|X,it| π|B|X,jt| π|C|X,kt| π|D|X,lt|

So for the probabuility of scoring 1111, you would multiply the probabilities of scoring 1111 in class 1, then multiply the probabilities of scoring 1111 in class 2 and adding the results.

What is the purpose of the log-likelihood function?

Study These Flashcards

The probabilities of observing the data are to the power of a larger number (Fijkl) and thus are extremely small. The log of these values are easier to deal with

What two algorithms can be used for maximisation?

Study These Flashcards

Expectation Maximization (EM) algorithm

Newton-Raphson algorithm

Why is knowing these calculation steps useful?

Study These Flashcards

1) understanding how you estimate the parameters of a latent class model
2) This process will be used to assess model fit in a bit

What two methods are described to maximise the likelihood?

Study These Flashcards

Expectation Maximization (EM) algorithm

Newton-Raphson algorithm

What two steps are involved in the expectation maximisation algorithm?

Study These Flashcards

E step:
Determine the expected values of the latent classes given some initial values for the other parameters
M step:
Maximize the likelihood using these expected values to obtain new values for the other parameters

Iterate between 1 and 2 until the parameter values don’t change much anymore. When they stay around the same value you know the models have converged.

What is involved in the Newton-Raphson algorithm?

Study These Flashcards

Approximates the log-likelihood function locally using linear functions to find directions towards the maximum

e.g start somewhere random, use a linear approximation to figure out which direction to go. Use another linear approximation in that direction, new linear approximation etc.

How do we scale the latent variable in LCA?

Study These Flashcards

For the first time in this course we don’t need to . The latent variable has a scale defined by the number of categories

How do we carry out statistical identification in LCA?

• The number of parameters should not exceed the number of independent pieces of information - In LCA this can happen if you have too few observed variables for your model

How do you calculate the df in LCA?

𝑑𝑓 = 𝑀–𝑘 • 𝑀: number of independent pieces of information • 𝑘: number of parameters 𝑀 =𝐶𝑛 −1 (discussed before) 𝑘: manually count ``` For instance, fit a 3 class model to 5 dichotomous items 𝑀 =2^5 −1=31 (two (dichotomous items)^# items) 𝑘 =5×3(conditional probabilities) +3−1 (class probability parameters) =17 𝑑𝑓 =31−17=14 ``` (5 items x 3 classes) + (3 classes - 1 class) since you can estimate it based on the other two- so 2 independent pieces of information

What other expected value is left to calculate in this model?

Expected number of subjects

How do you calculate the expected number of subjects

The goal is to get an expected value for how many participants get a particular score e.g 0000 For this you multiply the probability of observing this score 0000 based on observations by the sample size π|ABCD, ijkl| x N fijkl column in R

What does the goodness of fit statistic calculate?

(Fijkl - fijkl)^2 / fijkl aka the difference between the observed number of subjects and the predicted number of subjects squared and divided by the predicted number of subjects. To get the goodness of fit (x^2) you sum these values

What dioes it mean if the goodness of fit statistic is significant?

Your observations are significantly different than you would expect according to the model. You hope that it is insignificant.

How else can you calculate the goodness of fit?

G^2: the sum of (the the observed number of subjects multiplied by the log of the observed number divided by the estimated number) multiplied by 2 Asyntotically these are the same- as the sample size approaches infinity these values will be the same however in practice they are often different.

Name three comparative fit measures, what they require and how to calculate them

• Likelihood ratio test (nested models): • 𝜒2 = −2(𝑙𝑜𝑔𝐿𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 –𝑙𝑜𝑔𝐿𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡) with 𝑑𝑓 = 𝑘𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 −𝑘𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 or equivalently • 𝜒2 =𝐺2𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 −𝐺2𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 with 𝑑𝑓 =𝑑𝑓𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 –𝑑𝑓𝑢𝑛𝑐𝑜𝑛𝑠𝑡𝑟𝑎𝑖𝑛𝑡 • Don’t use the pearson 𝜒2 * Akaike Information Criterion (AIC) * 𝐴𝐼𝐶 =−2𝑙𝑜𝑔𝐿−2×𝑑𝑓 * Compare to competing model, no nesting necessary * Bayesian Information Criterion (BIC) * 𝐵𝐼𝐶 =−2𝑙𝑜𝑔𝐿−log(𝑁)×𝑑𝑓 * Compare to competing model, no nesting necessary Note that: here 𝑑𝑓 is used (not 𝑘 as in factor analysis)

How would you obtain the model statistics from your analysis in R give that you saved your model as 'res'?

summary(res)

In docs there is a screenshot of LCA R output describe the relevant information

It gives statistics for an 'estimated model' and a 'saturated model'. The estimated model is the model you created, the saturated model is a model where you have a single parameter for each response pattern (df = 0), this is therefore a perfect fit as you can reproduce your dataset perfectly. It first gives the number of parameters and log-likelihood of each model, each previously explained. You look at the statistics for the estimated model typically. It then also gives the BIC for model comparison. It then gives the likelihood ratio test betwee the estimated and saturated model and the pearson chi square.

Contrast the parameters involved in IRT, FA and LCA

``` IRT has the following parameters: • Difficulty • Discrimination • Guessing • (Factor mean) • (Factor variance) ``` ``` FA has the following parameters: • Factor loadings • Residual variances • Factor variance • (Factor mean) ``` LCA has the following parameters: • Conditional probabilities • Class probabilities

Contrast the absolute model fit statistics involved in IRT, FA and LCA

IRT: Q-statistic ``` FA: • Χ^2 • RMSEA • CFI • ... ``` LCA: • Pearson χ2 • G^2

Contrast the comparative model fit statistics involved in IRT, FA and LCA

The same in each: • Likelihood ratio • AIC • BIC

Contrast the latent variable estimates involved in IRT, FA and LCA

IRT: Latent variable level (e.g., factor.scores() in ltm package) FA: Not covered in this course ``` LCA: Class membership (e.g., predict() in LCA.r) ```

Latent Class Analysis Flashcards

(38 cards)