Coding Categorical Variables Flashcards

1
Q

What are the 2 ways to code for a dichotomous predictor?

A

Dummy Coding & Effect Coding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the differences between ways to code for dummy and effect coding?

A

Dummy code is 1 and 0 and Effect code is -1 and 1.

They serve different functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is 0 considered in dummy coding?

Is there a reference group in effect coding?

A

0 is the reference group.

There’s no reference group in effect coding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

When we are looking at sex differences between GRE scores, what type of information do we need?

A

Male mean
Female mean
Grand mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we set up dummy code on r?

A

Instead of computing statements, we use contrast statements to set-up dummy codes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

From the output in r, what will the estimate of the intercept always be when we dummy code?

What will the b1 coefficient be?

A

The estimate of the intercept will always be the mean of the reference group (0).

The b1 coefficient will be the difference between the reference group and 1.

ex) b0 = 585
b1 = -10

-10 is the difference between the mean of 585 and female score of 575.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

From the output in r, what will the estimate of the intercept always be when we effect code?

What will the b1 coefficient be?

A

The b0, or intercept will be the Grand mean of the Equal group sizes.

b1 coefficient is the difference bw Male Mean and Grand Mean

ex) b0 = 580
b1 = 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we do an equation for dummy codes?

How do we interpret the coefficients?

A

We will be plugging the 0’s and 1’s into the difference of means starting with the reference group.

ex) yhat = b0 + b1D1 + b2D2 + b3D3 + b4D4

b0 is the mean of the reference group

b1 – difference between mean of interest group and reference group mean… so on and so on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The intercept of effect coding gives us _________.

The coefficients are the difference between __________ and __________.

A

the Grand mean

the coefficients are the difference between the Grand Mean and the mean of the group we are interested in (first group).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For equal n dummy coding, how many code vectors are required?

A

Multiple coded vectors, where coding vectors = # of categories - 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we apply ANOVA type partitioning to Effect coding? Conceptually, this shows that ANOVA and regression are pretty much the same.

A

The structural model is similar in that Y = GM + unique + error.

Each subject’s score includes contribution of the overall score, treatment effect, and error.

Overall = grand mean
Treatment effect = the effect of 1 coefficient
Error = the residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are examples of categorical groups?

A

Ethnic groups
gender
SES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When would we use effect coding over dummy coding, and vice versa?

A

Effect coding will be used if we are looking at the effects of each group.

Dummy coding to look at how each group varies compares to our reference group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What type of mean is now added for unequal n?

A

The average mean, which is the difference of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What’s the difference between grand mean and average mean (unequal n)?

A

The grand mean is the total number of scores we have divided by the total number of n.

The average mean divides the total mean scores by the number of mean scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When the data are unbalanced (unequal n), how do the coding of vectors and interpretation of output from (1) dummy coded, (2) effect coded, and (3) orthogonally coded analyses differ (from the equal n case)?

A

Nothing changes for dummy coding, we interpret it the same as we did for equal n.

However, for effect coding, IF THE EFFECT CODE STAYS THE SAME (-1 and 1) Interpretations change; the intercept is no longer the grand mean, now it’s the unweighted Average Mean of the group. The coefficients will be the Difference of the Group mean and the Average mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How do we restore our interpretation for effect coding for unequal n?

A

We restore the interpretation by introducing Weighted effect codes.

18
Q

If we want the intercept to get back to to the grand mean rather than the average mean, what would we do?

A

We have to use weighted effect codes.
We have to weigh our groups by n. So instead of using -1 for comparison group, we use values based on sample sizes of the 2 groups involved in the contrast.

ex)
ex) counting =8, intentional (comparison group) = 10.

so, -8/10 = -.8

These negative values are used in place of -1.

19
Q

How would we calculate the weight means for unequal effect code?

A

We take the -n of group we’re interested in OVER the n of the comparison group.

ex) counting =8, intentional (comparison group) = 10.

so, -8/10 = -.8

These negative values are used in place of -1.

20
Q

How do we interpret the output of the intercept and b-coefficients in R after running weighted effect codes?

How do we create an equation for the weighted effect codes?

What does the equation equal to?

A

The intercept is Grand mean.
The b1 coefficient is the difference between the group mean - the grand mean and so on.

The equation is as follows:
ŷ = GM + b1(W1) + b1(W2) + b2(W3); etc.

ex) ŷ = 10.20 + (-3.20)(-.8) + (-2.9)(-.7); etc…

GM = 10.20
b = -3.20 and -2.9
W = -.8 and -.7

The equation equals to the mean of the reference group (or Intentional group).

21
Q

How is weighted effect codes similar to ANOVA?

A

We get the same F information as if we ran an ANOVA.

The multiple R-squared here is the same as the eta-squared for ANOVA.

22
Q

Conceptually, what does it mean when the eta-squared goes up?

A

It means that the more predictors we add, the larger the eta-squared value meaning that the treatment might not be that effective - just that size of the treatment groups make the effect size bigger.

23
Q

How does weighted effect codes in R correct issues with ANOVA?

A

When there’s a significant ANOVA, we move on to multiple comparison tests.
The problem is that we inflate the Type 1 error.

However, the weighted effect codes produce significant effects of each coefficient without the inflation of the type 1 error.

24
Q

How do we verify and check orthogonality for more than 2 codes (4+ groups) for EQUAL n?

Why do we verify and check for orthogonality?

A

The sum of the multiplied contrasts (+ or -) need to equal zero. We need to make sure to check each possible pairs separately:

C11C21 +C11+C31+C11+C41
+C12C22+C12C32+C12C42; etc

We verify orthogonality because we need to check the independence of the variables since although there could be zero correlation (no linear relationship)- that doesn’t mean there’s no co-linearity.

25
Q

How many vectors do we need to represent a categorical variable for orthogonal coding?

A

o Orthogonal - A full set of orthogonal comparisons include # of levels – 1 comparison, exhausting available information

26
Q

What types of conditions must contrasts (+ or - signs) meet to be considered orthogonal?

A

o Independence – no linear relationship

o Zero correlation is necessary but not sufficient;
• zero correlation means no linear relationship, but we can still have zero correlation and have a relationship to a variable.

o Orthogonal vectors can be used to code a priori comparisons among group means

o A full set of orthogonal comparisons include # of levels – 1 comparison, exhausting available information
• Treeing-out on its side, but we make vectors.

27
Q

How do we interpret the output of the intercept and b-coefficients in R after running equal n Orthogonal codes?

What does the equation equal to?

A
  • b0 – Intercept is Grand mean
  • b1 – Each coefficient means are treed-out and divided over the range of codes we use (smallest and biggest).

EXAMPLE)
o b1 = .485 = (mean of counting, rhyming, adjective, and imagery groups – mean of intentional)/range of code = (9.575 – 12)/ (1-(-4))
o b2 = 1.275 = (mean of counting, rhyming, adjective – imagery groups)/range of code = (8.3-13.4)/ (1-(-3))
o b3 = ‐1.35 = (mean of Counting and Rhyming groups– mean of Adjective) / range of code = (6.95‐11)/(1‐(‐2))
o b4 = .05 = (mean of Counting – mean of Rhyming) / range of code = (7‐6.9)/(1‐(‐1))

The equation always equals the predicted values (group means).

28
Q

What’s a caveat of using equal n orthogonal coding?

A

Caveat is that we have to divide everything by the range of codes we used (the smallest to biggest; 1-4 for example).

29
Q

We use our orthogonal codes in 2 ways, what are they?

A
  1. We create our coefficients by looking down the column.

2. To get back our group means, or our predictive values, we go across the row.

30
Q

What is the purpose of criterion scaling? How is it carried out? What portions of standard regression output need to be corrected?

A

o Criterion scaling creates a single coded vector (not 4 like before) to represent group membership since we can’t apply selection algorithms or all possible regressions with categorical predictors.
o Useful when there’s many levels of categorical predictors in an analysis and for model selection procedures
• If we have 2 categorical predictors and we want to see the interaction between age and rhyming, it’s awkward to do that with all of the vectors, so we use criterion scaling to look at the interaction first, THEN break it apart if necessary.

31
Q

How do we code things by criterion scaling?

A

We code the vectors in criterion scaling by using the average of each group and use it as the code.

So instead of 0 1, -1 1; etc… this time, we use the mean of the group as coding vectors.

32
Q

How do we code things in criterion scaling?

A

We code things by criterion scaling by using the average of each group and use it as the code.

This allows us to include categorical predictors with many categories into our selection algorithms.

33
Q

In the standard regression output, the following need to be corrected:

A
  • df (using 1 vector to code the categorical predictor) fucks everything up including the F-test and t-test.
  • We need to re-calculate the df, MS, and F-test by hand (using correct df information)
34
Q

How is criterion scaling useful?

A

o Criterion scaling is useful in predicting whether the categorical predictor itself is even worth including in our model.

• It’s useful for model selection and model building because the R2 for the model is appropriate which means that if we added another predictor, we could do an F-test of the R2∆ to see if my categorical predictor accounts for a significant proportion of variability in my outcome measure, over and above the other predictors in the model. If it does account for a significant proportion of variability, then we can remove the criterion scaled vector, and re-run a different coding scheme including all the vectors into my model.

35
Q

• How many vectors do you need to represent a categorical variable?

A

o # of categories – 1; so if we have 5 categories, we would have 4 vectors

o Orthogonal - A full set of orthogonal comparisons include # of levels – 1 comparison, exhausting available information

36
Q

Under what circumstances might dummy, effect, and orthogonal coding be especially useful?

A

o Dummy coding is useful for looking at how each group varies compared to the reference group.
o Effect coding is useful for looking at the effects of each group in general.
o Orthogonal coding is useful

37
Q

• How do you interpret the test of R2 in an analysis involving categorical predictor variables? Will this depend on the coding scheme?

A

o dummy code – the predicted value of the models gives us back the means of each variable (gender) – we are comparing significant differences of coefficients to the reference group.

o effect code – gives us the effect of one coefficient (the effect of counting on intentional; etc) – shows us if a coefficient is significantly different from the grand mean (similar to SS between in ANOVA).

• Caveat is that we won’t get the effect of the “reference” group.

38
Q

• Be able to interpret b’s from regression analyses (with equal n) using dummy coding of categorical variables.

A

o dummy code – reference group is the one we don’t care about (because we’re comparing this to zero, which we don’t care about – the other coefficients are being compared to the reference group, so significant test actually matters).

  • b0 – is the mean of the reference group
  • b1 – difference between mean of interest group (first group) and reference group mean.
  • b2 – difference between mean of 2nd group and reference group mean.; until b4 (no b5 because we do not include the reference group).
39
Q

• Be able to interpret b’s from regression analyses (with equal n) using effect coding of categorical variables.

A

o effect code
• b0 – Intercept is always Grand mean
• b1 – Difference between the mean of the first one and Grand mean (effect of treatment 1)
• b2 – Difference between mean of 2nd group and Grand mean; so on… (effect of treatment 2; etc. Until b4 and no b5).

40
Q

• Be able to interpret b’s from regression analyses (with equal n) using orthogonal coding of categorical variables.

A

o orthogonal
• b0 – Intercept is Grand mean
• b1 – Each coefficient means are treed-out and divided over the range of codes we use (smallest and biggest).
• EXAMPLE)
o b1 = .485 = (mean of counting, rhyming, adjective, and imagery groups – mean of intentional)/range of code = (9.575 – 12)/ (1-(-4))
o b2 = 1.275 = (mean of counting, rhyming, adjective – imagery groups)/range of code = (8.3-13.4)/ (1-(-3))
o b3 = ‐1.35 = (mean of Counting and Rhyming groups– mean of Adjective) / range of code = (6.95‐11)/(1‐(‐2))
o b4 = .05 = (mean of Counting – mean of Rhyming) / range of code = (7‐6.9)/(1‐(‐1))

41
Q

• When the data are unbalanced (unequal n), how do the coding of vectors and interpretation of output from (1) dummy coded, (2) effect coded, and (3) orthogonally coded analyses differ (from the equal n case)?

A

o dummy code- Codes are identical to equal n – interpretations are unchanged.
o effect code- IF effect codes aren’t changed, interpretations change; the intercept is no longer the grand mean, now it’s the unweighted Average Mean of the group. The coefficients will be the Difference of the Group mean and the Average mean.
• This poses an issue because we aren’t really interested in the average mean, we’re interested in the effect of a coefficient against the grand mean. So we calculate using Weighted Effect codes (-ncurrent ÷ ncomparison; counting = 8, intentional (comparison group = 10); -8/10 = -.8. These negative values are used in place of -1.

42
Q

• Orthogonal vectors must meet what conditions?

A

Independence – no linear relationship
o Zero correlation is necessary but not sufficient;
• zero correlation means no linear relationship, but we can still have zero correlation and have a relationship to a variable.
o Orthogonal vectors can be used to code a priori comparisons among group means
o A full set of orthogonal comparisons include # of levels – 1 comparison, exhausting available information
• Treeing-out on its side, but we make vectors.