Week 3 - Aggregation and Restructuring Flashcards Preview

Statistics > Week 3 - Aggregation and Restructuring > Flashcards

Flashcards in Week 3 - Aggregation and Restructuring Deck (10)
Loading flashcards...
1
Q

What does aggregate do?

A

Aggregates data across a number of observations so that we can produce a new data set that contains sums or averages of the other variables.

2
Q

What does restructuring do?

A

Enables us to reshape data (short–> long form) set in a way that will allow us to facilitate graphing, it is also necessary when carrying out certain procedures, for example, the mixed-model procedure, which expects data to be stacked (long-form) - in order to avoid list-wise deletion of cases that are missing any data

3
Q

Imagine you have a data set with the performance of 5 individuals across various times at various hearing frequencies (See below).

We want to make a new data set, were each person has only one line and scores which represent an average of all the times (average sensitivity across time per frequencies).

What procedure would we use?
What would be the break variable?

e.g.,
ID:  Time:  db250:  2b500:
1        1           55          35
1        2          45          25
2       1           35          60
2       2          50          40
2       3          40          25
3       1           45          35
3       2          65          60
3       3          30          30
A

Aggregate. The break variable would be ID.

e. g.,
* combining data over observations (to get mean, SD, and number)

aggregate out=*/
break=id/
db250 db500=mean(db250 to db500)

aggregate out=*/
break=id/
mdb250 mdb500 = mean (db250 to db500)
sdb250 sdb500 = sd(db250 to db500)
ndb250 ndb500 = n(db250 to db500)

aggregate out=* mode addvariables
break=id/
mdb250 mdb500 = mean(db250 to db500)

The third aggregate = to the same data set (not making a new one)

__________________
Point-and-Click:

Data —> aggregate —> select break variable ‘ID’ —-> add variables you want to aggregate under ‘summaries of variables’ (we can change the default to find the mean etc) in this case the db250, db500 etc —-> under ‘save’ check ‘create new dataset containing only the aggregated variables –> hit continue/ok

4
Q

What are break variables (in aggregation)?

A
  • ‘Break variable(s)’ are those across which we want to average….we want to produce some function of the original variables
    • if we DO NOT select a break variable (e.g., ID) then it will just aggregate everything into one line.
5
Q

What is the difference between data in wide and long formats?

A

Wide: number of cases with multiple values across various variable (such that the data literally looks wide e.g., ID 1 might have 4 different observation and each observation has its own variable.)

Long (aka stacked): each observation is a new number vertically.

6
Q

Does SPSS tend to create better graphs with long or wide data?

A

long/stacked structure.

7
Q

What is the syntax for restructuring data from wide to long?

ID:  Time:  db250:  2b500:
1        1           55          35
1        2          45          25
2       1           35          60
2       2          50          40
2       3          40          25
3       1           45          35
3       2          65          60
3       3          30          30
A

varstocases make db from db250 db500
index=frequency(2)
keep = id.

  • make ‘db’ (a new variable) from (list the variables we want to stack in the order we want to stack them).
  • then we give it the name of an index variable, we would like to use a variable we have named frequency, with 5 levels.
  • keep = id means we want to keep this variable.
8
Q

When conducting a mixed model analysis using GLM (i.e., when data is both between and within subjects) where there is some missing data - What happens if you analyse data in the wide format?

How can we fix this problem?

e.g.,

glm pre post fu by group with age/
wsfactor=time3/
design.

A

We would lose all information from cases were any data was missing, the whole subject would be removed from the analysis (i.e., list-wise deletion).

We can restructure the data so that it is in long format. Thus only missing values (not whole cases) will be removed and then use a mixed analysis function.

e.g., 
varstocases make score rom pre post fu
index=time(3)
id=id/
keep=group age.

mixed score by group time with age/
fixed=intercept age group time age by time group by time/
random=intercept | subject(id)/
print=solution testcov

9
Q

Suppose you have an SPSS dataset open which has 110 observations. You use the aggregate command with the append option to create a new variable which is the overall mean of an anxiety variable. How many records will the resulting datafile have is the anxiety variable has 12 missing values?

A

110

– each person will have a value (the mean of everyones anxiety scores) in a new variable which has been appended to the data sheet.

10
Q

What is aggregation? (long answer)

A

Aggregate Data aggregates groups of cases in the active dataset into single cases and creates a new creates new variables that contain the aggregated data (or a new data file). Cases are aggregated based on the value of one or more break (grouping) variables.

If you create a new, aggregated data file, the new data file contains one case for each group defined by the break variables. For example, if there is one break variable with two values (e.g., Gender, male/female), the new data file will contain only two cases, and will have aggregated the data (other variables selected) over those two cases in the way you specified.

If you add aggregate variables to the active dataset, the data file itself is not aggregated. Each case with the same value(s) of the break variable(s) receives the same values for the new aggregate variables. For example, if gender is the only break variable, all males would receive the same value for a new aggregate variable that represents average age (and all females would received the aggregate variable across all females)