Week 3 - Data Manipulation - Recode, Compute, and Count Flashcards
(53 cards)
In SPSS what are operators?
These are written (words) or symbolic (symbols) commands that make SPSS perform a certain job/operation.
What are the logical operators?
– OR / |
Do something i EITHER condition is met
– AND / &
Do something if both conditions are met
What are assignment/testing operators?
Do SOMETHING if a variable is equal to a value others include: - not equal to - less than - greater than - less than or equal to - greater than or equal to - not
What are standard arithmetic operators?
Mathematical symbols/operations e.g., +, -, *, /, **
- for exponentiation w could say X = Y**2 (i.e., Y squared)
In SPSS, what is a function?
Consists of a word (which SPSS recognises) which carriers out some operation on the variables or values that follow the function name.
e.g., SELECT IF (sex=1 & age=>=30)
How is the select function written in SPSS Syntax?
SELECT IF (sex=1 & age=>=30)
or alternatively SELECT IF (sex eq 1 and age ge 30)
What does the function ‘compute rv.normal(0,1)’ do?
Creates a random variable with a mean of zero and a SD of 1.
What is the difference between system missing and user missing?
System missing is automatically signed by SPSS whenever there is a blank cell.
User missing is when we tell SPSS to treat a value of a variable, which might otherwise be a normal value (e.g., 9), we tell SPSS to treat it LIKE it is missing (without it actually being missing from the system).
What does writing ‘temporary.’ before your syntax do?
e.g.,
temporary.
select if (sex = 2 and age >=30)
This makes a temporary selection or change that only lasts for one process e.g., freq age.
How can you tell that a cases have been excluded in the SPSS spread sheet?
oblique lines will cross out the cells which have been excluded.
Using syntax, how would you select cases that are NOT MISSING on a variable?
Select if not(missing(variablename))
OR
Select if ~missing(variablename)
How can sort cases and split file be used to perform the same job as the following syntax?
temporary.
select if (sex =1 ).
freq age
temporary.
select if (sex = 2)
freq age
sort cases by sex (this orders data by sex)
split file by sex (this splits the data in half by sex
freq age (will fetch frequency of ages for each half)
split file off (removes the split file)
What is the command ‘select if’ used for?
To select cases upon which you want to perform some analyses on (either inferential or descriptive). To select a subset of your data (and exclude the rest from your operations).
What can the recode command be used for?
- Permanently creating new variables (based on available variables)
- e.g., grouping values if a numeric variable, such as calculating a total score based on answers to a questionnaire. - Altering variables
- e.g., breaking a continious variable (e.g., age) into categories
- splitting the variable/data into two (or more) e.g., by median - Findings missing values, and excluding or removing them when making a dummy variable
- Other - using recode to reverse a scale or using conditional recoding.
How would you use syntax to create a variable which takes the continuous variable of age (of people aged between 0-100) and makes it into a single categorical variable with 4 categories?
You would use RECODE
e.g., recode age (lo thru 25=1) (26 thru 30 = 2) (31 thru 40 =3) (41 thru hi=4) into agecat.
This can also been completed through the transform menu in SPSS Point-and-Click.
The research wants to recode age as a continuous variable into age as a categorical variables with 4 age brackets so they create the following syntax:
recode age (lo thru 25=1) (26 thru 30 = 2) (31 thru 40 =3) (41 thru hi=4) into agecat.
HOWEVER, one of the cases has an age of 25.9 and has not been included in a category; how could this issue be fixed?
Explain why you would use the new syntax.
The problem is that in the original syntax, 25.9 and other between category numbers would not be included in groups. would need to re-write syntax as:
Recode age (lo thru 25=1) (25 thru 30 =2) (30 thru 40=3) (40thru hi = 4) into agerec
By overlapping values which are recoded we can make sure that nothing slips through the cracks. Note that once a value has been recoded it is not recoded again.
Is: Recode age (lo thru 25=1) (25 thru 30 =2) (30 thru 40=3) (40thru hi = 4) into agerec
The same as:
Recode age (lo thru 25=1) (lo thru 30 =2) (lo thru 40=3) (lo thru hi = 4) into agerec
Why?
YES!!
Because each case will only be assigned to a category once. Once it has been assigned to a category it won’t be assigned/re-assigned again; the function/operations works from left to right.
Describe what the following syntax would do:
temporary.
recode salary (1=2) (6 thru hi=5)
add value labels salary 2 ‘up to $30k’ 5 ‘>50k’
freq salary
This creates a TEMPORARY RECODE (it does not alter the file, it just creates temporary categories for the purpose of the function frequency)
It makes salary a two-category variable. and adds descriptive labels. and then gets the frequency.
Describe how you would use recode (syntax) to perform a median split (e.g., for age)
First you would need to find the median:
freq age/statistics=median (imagine the median is 30)
Then you would perform a recode based on this information:
recode age (lo thru 30=1) (30 thru hi = 2) into agemed freq agemed
What is visual binning?
Can be found in the transform menu allows you to see a visual diagram of the distribution of a variable (e.g., age) and gives you a number of options to divide up the variable/data
e.g., equal percentiles based on scanned cases = equal cases in both groups
cutpoints at mean and selected SD
Why might you want to make data missing?
You may want to make values that can’t be used in a meaningful way ‘missing’ e.g., the few outliers with income in the lowest/highest bracket.
What function might you use to exclude (i.e., make missing) data on a variable? (e.g., outliers in low and high salary bracket)
Recode
e.g., recode salary (1, 6 thru hi=sysmis) [this will make income categories 1 and 6 system missing]
this can also be performed through ‘tranform’ menu in point and click.
(The temporary command can be used to made the recode TEMPORARY)
What function might you use to LOCATE data that is missing?
Crosstabs and recode to fine where missing values are.
What is crosstabulation?
A process or function that combines and/or summarises data from one or more sources into a concise format for analysis or reporting. Crosstabs display the joint distribution of two or more variables and they are usually represented in the form of a contingency table in a matrix.
IT tells you how much data is missing but NOT WHERE it is missing from.