Coding Categorical Variables Flashcards

Question

How many vectors do we need to represent a categorical variable for orthogonal coding?

Answer 1

o Orthogonal - A full set of orthogonal comparisons include # of levels – 1 comparison, exhausting available information

Answer 2

o Independence – no linear relationship o Zero correlation is necessary but not sufficient; • zero correlation means no linear relationship, but we can still have zero correlation and have a relationship to a variable. o Orthogonal vectors can be used to code a priori comparisons among group means o A full set of orthogonal comparisons include # of levels – 1 comparison, exhausting available information • Treeing-out on its side, but we make vectors.

Answer 3

* b0 – Intercept is Grand mean * b1 – Each coefficient means are treed-out and divided over the range of codes we use (smallest and biggest). EXAMPLE) o b1 = .485 = (mean of counting, rhyming, adjective, and imagery groups – mean of intentional)/range of code = (9.575 – 12)/ (1-(-4)) o b2 = 1.275 = (mean of counting, rhyming, adjective – imagery groups)/range of code = (8.3-13.4)/ (1-(-3)) o b3 = ‐1.35 = (mean of Counting and Rhyming groups– mean of Adjective) / range of code = (6.95‐11)/(1‐(‐2)) o b4 = .05 = (mean of Counting – mean of Rhyming) / range of code = (7‐6.9)/(1‐(‐1)) The equation always equals the predicted values (group means).

Answer 4

Caveat is that we have to divide everything by the range of codes we used (the smallest to biggest; 1-4 for example).

Answer 5

1. We create our coefficients by looking down the column. | 2. To get back our group means, or our predictive values, we go across the row.

Answer 6

o Criterion scaling creates a single coded vector (not 4 like before) to represent group membership since we can’t apply selection algorithms or all possible regressions with categorical predictors. o Useful when there’s many levels of categorical predictors in an analysis and for model selection procedures • If we have 2 categorical predictors and we want to see the interaction between age and rhyming, it’s awkward to do that with all of the vectors, so we use criterion scaling to look at the interaction first, THEN break it apart if necessary.

Answer 7

We code the vectors in criterion scaling by using the average of each group and use it as the code. So instead of 0 1, -1 1; etc... this time, we use the mean of the group as coding vectors.

Answer 8

We code things by criterion scaling by using the average of each group and use it as the code. This allows us to include categorical predictors with many categories into our selection algorithms.

Answer 9

* df (using 1 vector to code the categorical predictor) fucks everything up including the F-test and t-test. * We need to re-calculate the df, MS, and F-test by hand (using correct df information)

Answer 10

o Criterion scaling is useful in predicting whether the categorical predictor itself is even worth including in our model. • It’s useful for model selection and model building because the R2 for the model is appropriate which means that if we added another predictor, we could do an F-test of the R2∆ to see if my categorical predictor accounts for a significant proportion of variability in my outcome measure, over and above the other predictors in the model. If it does account for a significant proportion of variability, then we can remove the criterion scaled vector, and re-run a different coding scheme including all the vectors into my model.

Answer 11

o # of categories – 1; so if we have 5 categories, we would have 4 vectors o Orthogonal - A full set of orthogonal comparisons include # of levels – 1 comparison, exhausting available information

Answer 12

o Dummy coding is useful for looking at how each group varies compared to the reference group. o Effect coding is useful for looking at the effects of each group in general. o Orthogonal coding is useful

Answer 13

o dummy code – the predicted value of the models gives us back the means of each variable (gender) – we are comparing significant differences of coefficients to the reference group. o effect code – gives us the effect of one coefficient (the effect of counting on intentional; etc) – shows us if a coefficient is significantly different from the grand mean (similar to SS between in ANOVA). • Caveat is that we won’t get the effect of the “reference” group.

Answer 14

o dummy code – reference group is the one we don’t care about (because we’re comparing this to zero, which we don’t care about – the other coefficients are being compared to the reference group, so significant test actually matters). * b0 – is the mean of the reference group * b1 – difference between mean of interest group (first group) and reference group mean. * b2 – difference between mean of 2nd group and reference group mean.; until b4 (no b5 because we do not include the reference group).

Answer 15

o effect code • b0 – Intercept is always Grand mean • b1 – Difference between the mean of the first one and Grand mean (effect of treatment 1) • b2 – Difference between mean of 2nd group and Grand mean; so on… (effect of treatment 2; etc. Until b4 and no b5).

Answer 16

o orthogonal • b0 – Intercept is Grand mean • b1 – Each coefficient means are treed-out and divided over the range of codes we use (smallest and biggest). • EXAMPLE) o b1 = .485 = (mean of counting, rhyming, adjective, and imagery groups – mean of intentional)/range of code = (9.575 – 12)/ (1-(-4)) o b2 = 1.275 = (mean of counting, rhyming, adjective – imagery groups)/range of code = (8.3-13.4)/ (1-(-3)) o b3 = ‐1.35 = (mean of Counting and Rhyming groups– mean of Adjective) / range of code = (6.95‐11)/(1‐(‐2)) o b4 = .05 = (mean of Counting – mean of Rhyming) / range of code = (7‐6.9)/(1‐(‐1))

Answer 17

o dummy code- Codes are identical to equal n – interpretations are unchanged. o effect code- IF effect codes aren’t changed, interpretations change; the intercept is no longer the grand mean, now it’s the unweighted Average Mean of the group. The coefficients will be the Difference of the Group mean and the Average mean. • This poses an issue because we aren’t really interested in the average mean, we’re interested in the effect of a coefficient against the grand mean. So we calculate using Weighted Effect codes (-ncurrent ÷ ncomparison; counting = 8, intentional (comparison group = 10); -8/10 = -.8. These negative values are used in place of -1.

Answer 18

Independence – no linear relationship o Zero correlation is necessary but not sufficient; • zero correlation means no linear relationship, but we can still have zero correlation and have a relationship to a variable. o Orthogonal vectors can be used to code a priori comparisons among group means o A full set of orthogonal comparisons include # of levels – 1 comparison, exhausting available information • Treeing-out on its side, but we make vectors.

Coding Categorical Variables Flashcards

(42 cards)