Week 3 - Data Manipulation - Recode, Compute, and Count Flashcards

Question

Using recode; how can you locate where missing values are in your data? Imagine you are a researcher who wants to compare salary bracket (variable = salary) by job-level (variable = joblev). You find that you have 3 missing values somewhere in salary, but you don't which job-level the people who have these missing values are from. i.e., How many workers fell in each bracket of salary; how many juniour managers fell in each bracket of salary...and so on.

Answer 1

First you would need to RECODE the missing values on salary into a valid value. Recode salary (missing = -99) Then you can use crosstabs: crosstabs salary by joblev - the missing values would then appear in the crosstab tables, and show you which categories of Joblev the missing data is from. (Crosstabs display a table that shows how many people in job-level fall in each salary bracket)

Answer 2

Yes it would tell you how many cases were missing, but NOT WHERE they are missing from.

Answer 3

``` Recode joblev (1=1) (else=0) into Jlev1 Recode joblev (2=1) (else=0) into Jlev2 ```

Answer 4

This is creating dummy variables. Recode joblev (1=1) (else=0) into Jlev1 - This creates a binary variable called 'Jlev1' where workers are represented by '1' and mid-management and top management are represented by '0'. Recode joblev (2=1) (else=0) into Jlevel2 -This creates a binary variable called 'Jlev2' where mid-management are represented by '1' and worker and top management are represented by '0'. crosstab joblev by Jleve1 Jlev2 - checks that the dummy variables are correct. Top management are the reference group as they are represented when BOTH Jlev1 and Jlev2 = 0. Jlev1 = workers and Jlev2 = mid-management. You couldn't use this syntax if there was MISSING data on a variable because else = 0 would code all missing values as 0 as well.

Answer 5

Because (else = 0) would code all missing values as 0 as well; and they would therefore become a part of our reference group, which we intended to just include top management.

Answer 6

RECODE!! e.g., recode b14b b14c b14g b14i b141 b14o (1=7) (2=6) (3=5) (5=3) (6=2) (7=1) these would be negative items on a 1-7 likert scale. - e.g., 1 strongly disagree ---> 7 strongly agree

Answer 7

- Transforming variables to manage positive or negative skew - Creating a scale (sum or mean of items) - Creating dummy variables - Create new variables which represent combinatons of variables (e.g., joblevels by sex ---> SEXJOBLEV = 1 'male worker' 2 'male mid-manage' 3 'male top-manage' 4 ' female worker' 5 'female mid-manage' 6 'female top manage') - Centring a variable

Answer 8

- equal, no skew - positive skew - negative skew - positive - negative Values greater than 1 can be taken to mean there is substantial skew (though this is a rather arbitrary rule of thumb)

Answer 9

1. Compute sage=SQRT(age) 2. Compute lage=lg10 (age) 3. Compute rage=1/age

Answer 10

Creates a variable called 'sage' which is the square root of age - this tends to pull in the top half of the distribution and spread the bottom part a bit. A SLIGHT TRANSFORMATION

Answer 11

Creates a variable which is the log of age - A MIDDLE GROUP TRANSFORMATION (neither slight nor severe) for reducing positive skew.

Answer 12

Creates a variable called rage. Takes the reciprocal of age and divide it into one, older people will have a lower value than those who are younger. A VERY SEVERE TRANSFORMATION. Reciprocal - now a higher value = lower: reciprocal = the quantity obtained by dividing the number one by a given quantity.

Answer 13

Using these methods on a negative skew would just make it WORSE. First you need to either - square the values (so negative values become positive) e. g., compute perf2 = perf2**2 - reverse the scale e.g., e.g., the values which were 1-6, we subtract them from a value that is 1 higher. e. g., compute sperf = 7 - perf [so that 1 becomes 6, 2 becomes 5...and so on; this would later have to be reversed by adding 1 to the highest value (2.45) and then taking away sperf e.g., computer sperf2 = 3.45 - sperf) NOW YOU CAN APPLY THE TOOLKIT OF TRANSFORMATIONS!!

Answer 14

COMPUTE function to make a new variable which is the mean of the items. No missing data: compute meanofB=mean (b1 to b15) Missing data: compute meanofB=mean.12 (b1 to b15) [ the .12 says there must be at least 12 items out of the 15 not-missing]

Answer 15

- TIME CONSUMING - if any value was missing, b14 would not be created (this might be good if you required everyone to answer all of the items!)

Answer 16

If there is any missing data, the sum function would just count the one response as there score..terribly misleading. You could use the following syntax instead: computer b14sum = (mean.12 (b14a to b14o))*15 (remember, the .12 means they must have answered at least 12 items!)

Answer 17

``` Do if (joblev=1). compute Jlev1=1. else. Compute Jlev1=0. end if. ``` ``` Do if (joblev=2). compute Jlev2=1. else. compute Jlev2=0. end if. ``` - if we knew there was no missing data could use the following instead: compute Jlev1=0. if (joblev = 1)Jlev1=1. compute Jlev2=0. if (joblev=2) Jlev2=1. (in this case, if there were missing data here, the person would be assigned as '0' on both values i.e., as a top management...this would be very misleading)

Answer 18

Computes dummy variables: Jlev1 = 1 & Jlev2 =0 ---> Worker Jlev1 = 0 & Jlev2 =1 ---> Mid Management Jlev1 = 0 & Jlev2 =0 ---> Top Management The second syntax does the same thing, but can only be utilised when there is NO MISSING DATA; because otherwise people with the person would be assigned '0' on both values i.e., as a top management...this would be very misleading.

Answer 19

- in SPSS if a statement is true e.g., age =6, then spss outputs a '1' - in contrast, a false statement e.g., gender = male, when it is in fact female, spss will output a '0' Therefore the following syntax can also be used to create dummy variables: compute Jlev1 = (joblev = 1). compute Jlev2 = (joblev = 2). freq Jlev1 Jlev2 - i.e., for the first line of the above command: - if joblev is NOT EQUAL TO 1, than the output is 0 (false, and this becomes the reference category) - if joblev is EQUAL to 1, than the ouput is 1 (true).

Answer 20

This syntax is a short way to create new variables which represent joblevels by sex (this can then be crosstabulated with a third variable). Specifically: - it multiples sex by 10, such that female now = 10 and male now = 20 - it then adds the value of joblev to that value (e.g., 1, 2, 3, 4, 5, or 6 - depending on one's job level) - recode is then used to assign each value on this variable [10, 11, 12 (males by job level) and 22, 22, 23 (females by job level)] into categories 1-6 - and then labels these categories appropriately.

Answer 21

Each value to be reveresed needs to be substracted from the highest possible value (7) + 1 i.e., 8. This command makes a loop that does this efficiently. - The 'x' stands for the first item in the list the first time through, the second loop around the 'x' stands for the second item on the list....and so on. - compute x = 8-x is what you want to do EACH TIME AROUND with the next variable in the chain. (EXTRA note a slash separates two lists and both will be run through simultaneously - see notes if interested)

Answer 22

using the compute command - we would subtract the variables mean from each individuals score. syntax might look like this: descriptives perf (variable perf descriptives, retrieve the mean value) compute perfcent=perf - 4.1418 (perf - mean of perf).

Answer 23

Counts the number of times a response occurs over a set of variables. - used less frequently than RECODE and COMPUTE commands. - in the count command, we give a list of variables and it counts how often a certain value occurs.

Answer 24

COUNT function e.g., count b14agree= b14a to b140 (5,6,7). (a value of 2 means agreement with two items and so on)

Answer 25

Count doesn't care if a variable is missing or not - it simply looks at the variable for the specified values. If one of the specified value/s is present, then it adds '1' to the count. - if it doesn't (including if it is missing), it just doesn't count it. It doesn't say "well the person didn't answer the quetion so I can't really count them' rather it erroneously states ' they didn't have the values e.g., they didn't agree. THEREFORE, we might want to exclude people that didn't answer the questions. We can also create a new variable that counts the number of people missing values. - now that we have a variable count b14mis=b14a to b140 (missing)

Answer 26

We can create a new variable that counts the number of people missing values. Then remove the missing data e.g., ``` count b14agree= b14a to b140 (5,6,7). count b14mis=b14a to b140 (missing). do if b14miss ne 0. recode b14agree= (lo thru hi=sysmis). end if. ```

Answer 27

compute agesq=age**2

Answer 28

recode age (lo thru 18=1)(18 thru 50=2)(50 thru hi=3) (else=copy).

Answer 29

recode bmi (18 thru 25=0) (else=copy).

Week 3 - Data Manipulation - Recode, Compute, and Count Flashcards

(53 cards)