Coding Categorical Variables Flashcards

(24 cards)

1
Q

How many vectors do we need to represent an Orthogonal variable?

A

A full set of orthogonal comparisons include # of levels - 1 comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

For Dummy coding, what is the intercept and the b1 coefficient in R output?

A

The intercept is the mean of the reference group (coded as 0).

If males are coded 0 and females coded as 1 in D1 vector, then the males mean is the intercept.

The b1 coefficient is the difference between the males and females.

Ex:
Male mean is 585 - also intercept
Female mean is 575

The difference between 585 - 575 is -10. -10 is also the b1 coefficient in the output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What else do we look for in the R output after we see the intercept and the slope?

A

We check to see the p-value and whether gender (or something else) is a good predictor or not. if <p></p>

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

R: For Effect coding, what is the intercept and the b1 coefficient?

A

b0 is the Grand Mean and b1 is the difference between the male mean and grand mean.

So if the Grand Mean is 580, then our b0 is 580.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we interpret the test of R2 in Dummy coding in an analysis involving Categorical predictor variables? will this depend on the coding scheme?

A

Dummy code-
b0 is the mean of the comparison group (or the reference group)
b1 is the difference between the comparison group and Counting group (or whatever we’re comparing to)
b2 is the difference between the comparison group and the Imagery group; etc…

If there is 5 categories, we are only predicting specifically the 4 we are comparing the reference group to.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we interpret the test of R2 in Effect coding in an analysis involving Categorical predictor variables? will this depend on the coding scheme?

A

Effect code

b0 is the grand mean
b1 is the difference between the group and the grand mean (effect of treatment 1)
b2 is the difference between the group and the grand mean (effect of treatment 2)… etc….

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How is equal N Effect coding similar to ANOVA?

A

Each score represents contributions of both the overall score + treatment effect + error

Structural model is the Y = Mu + Unique + Error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is dummy code, effect code, and orthogonal code useful for?

A

Dummy coding is useful for looking at how our group varies compared to each other (reference group).

Effect coding is useful for looking at how our group compares to the grand mean; it shows us the treatment effect of each variable within the vectors…

Orthogonal coding is useful for looking at how independent each predictor is from each other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How many vectors do we need to represent categorical variables?

A

For dummy and effect coding, it is categories - 1.

Orthogonal is same, but it’s the number of levels - comparisons, and we exhaust all available information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In an orthogonal contrast vector, what do zeros represent?

A

Zero indicates that a group is not involved in a contrast.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the intercept and b coefficients in an orthogonal code?

A

b0 - intercept is Grand Mean

b1 - Each coefficient has been treed-out and divided over the range of codes we use (smallest to largest).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the rules for Orthogonality?

A
  1. Must be independent
  2. Zero correlation is necessary but not sufficient (no linear relationship)
  3. Orthogonal vectors can be used to code a priori comparisons among group means
  4. A full set of orthogonal comparisons includes number of levels - 1, exhausting available information (treeing out)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the statistical sentence for orthogonal coding for equal n?

A

It’s just like ANOVA -> F (dfreg,dfres) = F-value, p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How would we interpret equal n orthogonal code? *interpret using b0 and b1; etc…

C 7
R 6.9
A 11
I 13.40
IN 12

TOTAL 10.06

Range of largest to smallest orthogonal code is 1 to -4 (treeing out vector)

A

b0 is the grand mean

b1 is the (difference between the means of the groups) - the last comparison group ÷ the range of code:

-> (mean of Counting, rhyming, adjective, imagery) - (intentional) ÷ (1-(-4)

  • > (7+6.9+11+13.40 = 9.6) - 12 ÷ 5
  • > 9.6 - 12 ÷5 =
  • > -2.9÷5 =
  • > .58
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

when the data are unbalanced(UNEQUAL), how do the coding of vectors and interpretation of output from dummy, effect, and orthogonally coded analysis differ from equal n?

A

UNEQUAL N -
DUMMY CODE - Codes and interpretation are identical

EFFECT CODE- Intercept no longer the grand mean, now the unweighted Average mean of the group. Coefficients will be difference of the group mean - average mean. Not good because we are interested in coefficient against grand mean…. So we must calculate using Weighted Effect Codes (-ncurrent ÷ ncomparison);

EX)
counting = 8, intentional (comparison group) = 10.
(-ncurrent ÷ ncomparison)
Unequal Effect = -8/10 = -8

*We use these negative values instead of -1
__________________________________________

ORTHOGONAL - We use the n in the vector (instead of the -1 and 1s) and is similar to the treeing out concept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What’s the purpose of Criterion Scaling?

A

Criterion scaling creates a SINGLE coded vector (different from the multiple vectors from before) that has the criterion mean for each category.

We need single vectors to represent group membership since we can’t apply selection algorithms or all possible regressions with categorical predictors.

17
Q

Why do we do Criterion Scaling?

A

Because we can’t run all possible regressions or selection algorithms on categorical predictors and we want to see how much of an effect 1 predictor has… so we are essentially wanting to look at the R2 value.

18
Q

When is Criterion Scaling useful?

A

It’s useful for predicting whether a categorical predictor itself is even worth including in our model… or whether we can SCALE down.

It’s generally useful for Model Selection and Model Building.

When there’s multiple levels of categorical predictors in an analysis and when using model selection procedures.

EX) IF we want to see the interaction between age and rhyming (2 categorical predictors), we can use criterion scaling first to look at the interaction, THEN break it apart if necessary. We don’t have to include the rest of the predictors.

19
Q

How is criterion scaling carried out?

A

We code by using the average of each group. This allows us to include categorical predictors with many categories into our selection algorithm.

20
Q

What needs to be corrected in the standard regression output after running the criterion scaling?

A

The df because we used 1 vector to code all of the categorical predictors , thereby fucking up the F test and the t-test.

21
Q

What do we need to recalculate after running a criterion scaling?

How would we recalculate the information on an ANOVA table?

A

We need to recalculate the df, MS, and F-test by hand.

We would recalculate by looking at the output -

SS will be correct, so we can add that into our table.

DF must be changed to number of vectors and the total df will be +1 of that on the output residual…

And then we compute MS and F.

22
Q

Why would we compute an F test of the R2∆?

A

If we added another predictor to a model (in a model selection for example), we could compute the F-test of the R2∆ to see if my categorical predictor accounts for a significant proportion of variability in my outcome measure, OVER AND ABOVE the effect of other predictors.

If it does account for a large variability, we can remove the criterion scaling, and rerun a different coding scheme (dummy, effect, orthogonal) with all my vectors in the model.

23
Q

What is inaccurate and accurate from results from criterion scaling?

A

Inaccurate - df, MS, F

Accurate - R2 and SS

24
Q

What will b0 and b1 be after the criterion scaling?

A

b0 and b1 after criterion scaling will be constant - therefore meaningless.