Statistics And Data Analysis 21061 Flashcards

1
Q

What is the difference between categorical level data and continuous data

Week 1

A

Catergorical data is nominal only (numbers, names gender only) whereas as continious data can be put on a continious scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What two descriptive statistics do we typically use

Week 1

A

Central tendency & spread

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the difference between how independent variables and dependent variables are measured

Week 1

A

The IV is ALWAYS measured on a categorical scale
The DV is IDEALLY measured on a discrete/continious scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the benefit of measuring the DV on a continious scale

Week 1

A

So that we can use parametric statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the difference between a true-experimental vs a quasi-experimental design

Week 1

A

We actively manipulate the IVs in a true experimental design whereas the IVs in a quasi experimental design reflect fixed characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Is handedness a quasi or true experimental IV

Week 1

A

Quasi - it is a fixed characteristic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 main types of subject design

Week 1

A

Between subjects, within subjects, mixed design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a (2^ 3) mixed design

Week 1

A

Has two IVs, one between, one within.
Between IV has two levels, within IV has 3 levels
(e.g males and females preferences to horror, action and romance movies)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does normally distributed data allow us to do

Week 1

A

Use parametric stats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the properties of normally distributed data

Week 1

A

Symmetrical about the mean
Bell shaped - mesokurtic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Platykurtic data

Week 1

A

Data which has more variations/spread than normally distributed data
(-ve kurtosis value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is leptokurtic data

Week 1

A

Data which has less variations/spread than normally disributed data (+ve kurtosis value)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What type of skew does normal data have

Week 1

A

normally distributed data has no skew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is sampling error

Week 1

A

degree to which sample statistics differ from underlying population parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are Z scores

Week 1

A

converted scores from normally distributed populations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is sampling distribution

Week 1

A

Distribution of a stat across an infinite number of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the sampling distribution of the mean

Week 1

A

Distribution of all possible sample means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are standard error (SE) and estimated standard error (ESE)

Week 1

A

Standard deviation of sampling distribution

ESE is simply an estimate of the standard error based on our sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What do we use sample statistics for

Week 1

A

to estimate the population parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a T-test

Week 2

A

Inferential statistic when we have 1 IV and 2 DVs that estimates whether population means under 2 IV levels are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What contributes to variance between IV levels in an independent t-test

Week 2

A
  • manipulation of IV (treatment effects)
  • individual differences
  • experimental error
    * random error
    * constant error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what contributes to variance within IV levels in an independent t-test

week 2

A

individual differences
random experimental error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What would happen if we continued to determine the mean of the difference for infinite samples

Week 2

A

it would essentially be like calculating the population mean difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the null hypothesis when talking about sampling distribution of differences

Week 2

A

the sampling distribution of differences will have a mean of 0 as there is no difference between the sample means of 2 different samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Why do we use estimated standard error instead of standard deviation in T-distribution ## Footnote Week 2
Because it is a sampling distribution, instead of s.d we use s.e. This is because standard error is used to express the extent an individual sample mean difference deviates from 0 As we do not have all of the possible samples to calculate the standard error, we estimate the standard error , hence why we use e.s.e
26
What is the equation for t in an independent design ## Footnote Week 2
Xd/ESEd AKA Mean of the difference / estimated standard error of the difference AKA variance between IV levels/variance within IV levels
27
What does the distance to 0 of the t value indicate? ## Footnote Week 2
If t value is closer to 0, smaller variance between IV levels relative to within If t value is further from 0 , large variance between IV levels relative to within IV levels
28
What does it mean if the null hypothesis is true for t-dist | Think CI ## Footnote week 2
If the null hypothesis is true - 95% of sampled t-values will fall within the 95% bounds of the t-dist If the null hypothesis is true, only 5% of sampled t-values will fall outside the 95% bounds
29
What are degrees of freedom and how are they calculated ## Footnote Week 2
the differences between the number of measurements (sample size) made & number of parameters estimated (usually one, mean) (Sample size - # of parameters) N-2 for independent t-test n-1 for paired t-test
30
What happens to the degrees of freedom value the larger they get ## Footnote Week 2
They tend to 1.96, the original value
31
What are some of the assumptions we make for an independent t-test ## Footnote Week 2
* **Normality**: the DV should be normally distributed under each level of the IV * **Homogeneity** of variance: The variance in the DV, under each level of the IV should be reasonably equivalent * **Equivalent sample size**: sample size under each level of IV should be roughly equal ( matters more with smaller samples) *** Independence of observations**: scores under each level of the IV should be independent
32
What test do we use when the asumptions for the independent t-test are violated ## Footnote Week 2
we use the non-parametric equvalent: Mann-Whitney U test
33
What is Levenes test ## Footnote Week 2
A test for equality of variance --> homogeneity of variances
34
what does levenes test tell us and what does it not tell us ## Footnote Week 2
**Tells us**: Whether theres a diff in variances under the IV levels **doesn't tell us**:if our means are different or IV manipulation
35
What is the null hypothesis of levenes test ## Footnote Week 2
no diff between the variance under each level of the IV (i.e homogeneity in variance)
36
If we reject Levene's test, what does this mean ## Footnote Week 2
There is heterogeneity in variance - the way in which the data varies under both IVs is different
37
What assumptions do we want when it comes to variance between IV levels? ## Footnote Week 2
equal variance and homogeneity
38
What contributes to variance **between IV levels** in a **paired t test** ## Footnote Week 2
- Manipulation of IV (treatment effects) - Experimental error
39
what contributes to variance **within IV levels** in a **paired t test**
Experimental error (RM designs - can discount the variance due to individual differences (leaving **only variance due to error**))
40
What assumptions do we make during a paired t-test
* **Normality** - distribution of difference scores between the IV levels should be approximately normal * Assume ok if n> 30 * **Sample size** - sample size under each IV level should be roughly equal
41
What do we do when our assumptions are violated during a paired t-test ## Footnote Week 2
We use the non-parametric equivalent - Wilcoxon test
42
How do we interpret 95% Confidence intervals for repeated measure designs ## Footnote Week 2
we can't determine if result is likely to be significant by looking at 95% CI plot therefore we need to look at the influence of the IV in terms of size & consistency of effect
43
For a repeated measures design, what would happen if the confidence intervals cross 0 (lower value is negative and higher value is positive) ## Footnote Week 2
you cannot reject the null hypothesis as you cannot conclude that the true population mean difference is different from 0
44
What is Cohen's D ## Footnote Week 2
The magnitude of difference between two IV level means, expressed in s.d units I.e - a standardised value expressing the diff between the IV level means
45
What are the values for effect size of Cohen's d ## Footnote week 2
Effect size d Small 0.2 Medium 0.5 Large 0.8
46
How does cohen's d differ from T? Define both. ## Footnote week 2
D = magnitude of difference between two IV level means, **expressed in s.d units** T = magnitude of diff between two IV level means, **expressed in ESE units** **T takes sample size into account - qualifies the size of the effect in the context of the sample size .**
47
When do we use a One way anova ## Footnote Week 3
When we have 1 IV with more than 2 levels
48
What does a one way anova do? ## Footnote Week 3
Estimate whether the population means under the diff the levels of the IV are different
49
What is an ANOVA like (think of t-tests) ## Footnote Week 3
an extension of the t-test --> if you conducted a one-way anova on an IV w/ 2 levels, you'd obtain the same result (F = t^2)
50
Why do we use ANOVA instead of running multiple t-tests ## Footnote Week 3
the more we draw from a population, the more likely we are to encounter a type I error and reject the null hyothesis, even if it true
51
What is the familywise error rate and what does ammedning it provide ## Footnote Week 3
Probability that at least one of a ‘family’ of comparisons run on the same data, will result in a type I error Provides a corrected significance level (a) reducing the probability of making a type I error
52
How do calculate the familywise error rate ? ## Footnote Week 3
a' = 1 - (1- a)^c where c is the number of comparisons e.g for 3 IV levels (3 comparisons) (ab ac bc) 1 - (1 - 0.05) ^3 = .143 = 14% chance of type I error for 4 IV levels (6 comparisons (ab ac ad bc bd cd) ) 1 - (1 - 0.05)^6 = .264 = 26% chance of type 1 error
53
Why do we use omnibus tests? ## Footnote Week 3
To control familywise error rate
54
What is the null hypothesis of the F ratio/ANOVA? ## Footnote Week 3
there is no difference between populations means under different levels of IV H0:u1=u2=u3
55
what is the ratio for the F value. ## Footnote Week 3
Variance between IV levels/ Variance within IV levels
56
What does the closeness of the F value to 0 indicate ## Footnote Week 3
F value close to 0 = small variance between IV levels relative to within IV levels F Value further from 0 = large variance between IV levels relative to within IV levels
57
What assumptions do we make for an independent one way ANOVA ## Footnote Week 3
Same as those for independent T-test **Normality**: DV should be normally distributed, under each level of the IV **Homogeneity of variance** : Variance in the DV, under each level of the IV, should be (reasonably) equivalent **Equivalent sample size** : sample size under each level of the IV should be roughly equal **Independence of observations** : scores under each level of the IV should be independent
58
What do we do when the assumptions of the independent one-way anova aren't met? ## Footnote Week 3
We use the non-parametric equivalent, the Kruskal Wallis test
59
# 1. What is the model sum of squares? | Equation ## Footnote Week 3
Model Sum of Squares (SSM): sum of squared differences between IV level means and grand mean (i.e. between IV level variance)
60
What is the residual sum of squares? ## Footnote Week 3
Residual Sum of Squares (SSR): sum of squared differences between individual values and corresponding IV level mean (i.e. within IV level variance)
61
What is SSt and how is it calculated ## Footnote Week 3
Sum of squares total = SSm( Sum of squares model ) + SSr (Sum of squares residual)
62
What is the mean square value and how is it calculated? What are the two types? ## Footnote Week 3
MS = SS/df (Sum of squares/ degrees of freedom) MSm = model Mean square value MSr = residual mean square value
63
What do we use mean square values for? ## Footnote Week 3
To calculate the F statistic
64
How do we calculate the F statistic | mean square values ## Footnote Week 3
MSm/MSr aka model mean square value / residual mean square value
65
What do we do when the assumption of homogeneity is violated in an independent 1-way ANOVA ## Footnote Week 3
We report Welch's F instead of ANOVA F
66
What happens to the degrees of freedom when we use Welch's F? ## Footnote Week 3
The degrees of freedom are adjusted (to make the test more conservative)
67
How is the ANOVA F value reported ## Footnote Week 3
*F*(dfm,dfr)=F-value, *p* =p-value
68
How do we calculate degrees of freedom for an independent 1 way ANOVA ## Footnote Week 3
find the difference between the number of measurements and the number of parameters estimated i.e. no. of measurements – no. parameters estimated
69
How do we calculate df for between IV level (model) variance where N is total sample size and k is number of IV levels ## Footnote Week 3
K-1
70
How do we calculate df for within IV level (residual) variance where N is total sample size and k is number of IV levels ## Footnote Week 3
N-k
71
What are post hoc tests ## Footnote Week 3
Secondary analyses used to assess which IV level mean pairs differ
72
When do we use post-hoc tests ## Footnote Week 3
only when the F-value is significant
73
How do we run post-hoc tests? ## Footnote Week 3
As t-tests, but we include correction for multiple comparisons
74
what are the 3 type of post-hoc test ## Footnote Week 3
* Bonferroni * least significant difference (LSD) * Tukey honestly significant difference (HSD)
75
Which post hoc test has a very low Type I error risk, very high type II error risk and is classified as 'very conservative' ## Footnote week 3
Bonferroni
76
Which post-hoc test has a high type I error risk, a low type II error risk and is classified as 'liberal'
Least significant difference (LSD)
77
Which post-hoc test has a low type I error risk , a high type II error risk and is classified as 'reasonably conservative' ## Footnote week 3
Tukey Honestly significant difference (HSD)
78
What are the three levels of effect size for partial eta^2 for ANOVA ## Footnote week 3
>0.01 is small >0.06 is medium >0.14 is large
79
what is effect size measured in for ANOVA
calculated in 2 ways, cohens d and partial eta squared
80
How do you calculate partial eta squared ## Footnote week 3
Model sum of squares/ (model sum of squares + residual sum of squares)
81
In a repeated measures design for a one way ANOVA, what contributes to variance between IV levels ## Footnote Week 4
* Manipulation of IV (treatment effects) * Experimental error (random & potentially constant error
82
In a repeated measures design for one way ANOVA, what contributes to variance within IV levels ## Footnote Week 4
Experimental error (random error)
83
# **** how do we calculate total variance? ## Footnote Week 4
**Model variance**(variance between IV levels)/ **residual variance** (variance within IV levels) - **Individual differences** (in independent designs)
84
what is the t/F ratio and how do we calculate it? ## Footnote Week 4
variance between IV levels/ variance within IV levels (excluding variance due to individual diffs WHEN IN RM design)
85
how is the F ratio calcated in terms of Mean square values ## Footnote Week 4
Mean sum of squares model/ mean sum of squares residual
86
What are the 3 assumptions made in a repeated measures 1-way ANOVA ## Footnote Week 4
* **Normality -** distribution of difference scores under each IV level pair should be normally distributed * **Sphericity (homogeneity of covariance)** - the variance in difference scores under each IV level pair should be reasonably equivalent * Unique to RM 1-way anova * **Equivalent sample size**: sample size under each level of the IV should be roughly the same
87
What corrects for the sphericity assumption. ## Footnote Week 4
Greenhouse-geisser
88
What test do we do to check for sphericity and what is its respective value? ## Footnote Week 4
Mauchly's test & the W value
89
What is the null hypothesis of the assumption of sphericity in the repeated measures ANOVA ## Footnote Week 4
There is no difference between the covariances under each IV level pair (i.e homogeneity) If p ≤ .05 we reject null hypothesis (i.e heterogeneity)
90
What do we do if our data seriously violates the assumptions of a repeated measures One-way ANOVA ## Footnote Week 4
we should use the non-parametric equivalent - Friedman test
91
If Mauchlys is significant, what do we use in SPSS output ## Footnote Week 4
The row that labelled Greenhouse-geisser as sphericity cannot be assumed
92
If Mauchlys is not significant, which row do we use in SPSS output? ## Footnote Week 4
The row labelled sphericity assumed
93
# **** How do we report the F statistic in repeated measures ANOVA ## Footnote Week 4
F(dfM,dfR) =*F-value* p = *p value* (greenhouse-geisser/sphericity assumed)
94
How do we calculate the degrees of freedom for RM 1-way anova | for model and residual ## Footnote week 4
dfM = K -1(where K number of IV levels/ parameters) dfR = dfM x (n-1) (where n = number of participants)
95
Which post Hoc test do we use for RM 1 way anova ## Footnote week 4
bonferroni
96
why is recruitment an advantage of repeated measures designs ## Footnote Week 4
needs fewer p’s to gain same number of measurements
97
How does the model of repeated measures design cause error variance to be reduced and why is this advantageous ## Footnote Week 4
Remove variance due to individual differences from error variances --> leading to less variance within IV levels
98
Apart from recruitment and reduction in error variance,what is another advantage of repeated measures designs
There is more power with the same amount of participants * its easier to find a significant difference ( and avoid type II error)
99
what are order effects and what effect can they have on repeated measure designs ## Footnote Week 4
They are the effects of having the participants go through the same thing in different conditions and becoming habituated to it in a variety of different ways. They **introduce confines** - error introduced systematically between IV levels
100
What are the 4 types of order effects ## Footnote Week 4
* Practice effets * fatigue * sensitisation * carry-over effects
101
What are practice effects in terms of order effects ## Footnote Week 4
P's get better at the task which positively skews how they do in subsequent IV levels
102
What is fatigue in terms of order effects ## Footnote Week 4
Participants get bored/ tired of engaging which negatively skews how they do in subsequent tasks
103
What is sensitisation in terms of order effects ## Footnote Week 4
P's start behaving in a particular way to please or annoy the experimenter due to understanding IV manipulation
104
What are carry over effects in terms of order effects ## Footnote Week 4
- effect of taking part in one IV level effects how one acts on subsequent IV levels
105
What is counterbalancing and how is it used to minimise order effects ## Footnote Week 4
counterbalancing what order people undergo the IV levels go through must be done to ensure as much randomness as possible, this does not get rid of order effects, but spreads their impact
106
What are alternatives for each type of order effect when counterbalancing is not possible (4) | week 4
– Practice - extensive pre-study practise – Fatigue - short experiments – Sensitisation - intervals between exposure to IV levels – Carry-over effects - include a control group
107
When do we use factorial ANOVAs ## Footnote Week 5
to test for differences when we have more than one IV with at least 2 levels
108
What are the 3 broad factorial ANOVA designs ## Footnote Week 5
* all IVs are between-subjects (independent) * all IVs are within-subjects (repeated measures) * a mixture of between-subjects and within-subjects IVs (mixed)
109
what would a 2 * 2 ANOVA mean ## Footnote Week 5
2 IVs/factors, each with 2 levels
110
what would a 2 * 4 ANOVA mean ## Footnote week 5
2 IVs/factors, one with 2 levels and one with 4 levels
111
What are the three type of main effects we would be looking for in a 2 * 3ANOVA design if the primary IV is gender (male female) and the secondary IV is colour (red, white and blue)
* is there a significant main effect of gender * is there a significant main effect of colour * is there an significant main interaction between gender and colour?
112
If we are doing a study to try and see whether there is a difference between how much men and women like chocolate, and we are also looking to see whether the texture of the chocolate (chunks vs tablets) has an effect, what is the primary IV, what is the secondary IV, why are they respectivley so andwhat do these terms mean? ## Footnote Week 5
The primary IV is gender, the secondary IV is texture. Gender is the primary IV as it is the IV main IV we are looking for an effect for. Texture is the secondary IV as we are looking to see if the addition of this variable also creates an effect, hence it being secondary because it is not the focus.
113
In a between subjects 2 * 3 ANOVA, how many possible conditions are there? ## Footnote Week 5
6
114
What is the null hypothesis for Factorial ANOVAand how many are there? ## Footnote Week 5
**There is one per IV and one for each possible interaction IV pair.** e.g in 2 * 2 ANOVA , there is a null hypothesis of no difference in means for IV one, one for IV two and one for the interaction between IV one and IV two
115
What does a significant interaction indicate in ANOVA? ## Footnote Week 5
that the effect of manipulating one IV depends on the level of the other IV
116
What is an interaction in terms of ANOVA. ## Footnote Week 5
The combined effects of multiple IVs/factors on the DV
117
What are Marginal means used for in ANOVA ## Footnote Week 5
to determine if there is significant effect for either IV
118
In an ANOVA line chart, what does it mean if the lines for the IVs are parallel ## Footnote Week 5
There is no interaction of the two IVs
119
What does it mean if the marginal mean of one of the IVs is at roughly the same level as the means for both populations ## Footnote Week 5
there is no main effect
120
What are the assumptions made in an independent factorial (two way) ANOVA (5) ## Footnote Week 5
Normality: DV should be normally distributed, under each level of the IV Homogeneity of variance : Variance in the DV, under each level of the IV, should be (reasonably) equivalent Levennes - DON'T want a significant result **NO correction** Equivalent sample size : sample size under each level of the IV should be roughly equal Independence of observations : scores under each level of the IV should be independent
121
What is the non-parametric equivalent for the Independent factorial ANOVA ## Footnote Week 5
**There is no non-parametric equivalent for factorial ANOVA** If our data seriously violate these assumptions we can attempt a ‘fix’ or we can simplify the design
122
How many F statistics do we report in factorial ANOVA ## Footnote Week 5
one for each IV i.e the main effect for each IV
123
What is the difference between classical eta squared and partial eta squared ## Footnote Week 5
Classical eta^2 : proportion of total variance attributable to factor Partial eta^2: Only takes into account variance from one IV at a time (Proportion of total variance attributable to the factor, partialling out/excluding variance due to other factors)
124
when do we use Post Hoc tests ## Footnote Week 5
If the main effect of at least one of the IVs is significant, then we reject the null hypothesis ***Only relevant when *** * main effect of IV is significant & IV hs more than 2 levels
125
For one-way ANOVA what do we report alongside post hoc results ## Footnote Week 5
Cohens D
126
For factorial ANOVA what do we report alongside post hoc results ## Footnote Week 5
nothing, we dont report Cohens d
127
What are simple effects in terms of interaction effects and how do we check them ## Footnote Week 5
effect of an IV at a single level of another IV * * do compairsosn of cell mean conditions (i.e t-tests)
128
For an IV with a between subjects design, how do we check for simple effects ## Footnote Week 5
we do independent t-test for each comparison
129
What is the bonferroni correction and what is it the calculation is performs? ## Footnote Week 5
a correction that divides the required alpha level by the number of comparions (e.g for 6 comparisons , .05/6 = .008)
130
How can ANOVAs in general be described as ## Footnote Week 7
a flexible and powerful technique appropriate for many experimental designs
131
What questions are necessary to ask before collecting any data and performing an ANOVA ## Footnote Week 7
*Do I have a clear research question? *Do I know what analyses I will need to conduct to answer this? *Will I be able to carry out and interpret the results of these analyses? *Have I considered and controlled for potential confounds? *Will I understand the answer I get?
132
What does our choice of statistical test depend on ## Footnote Week 1
* **Scale of measurement** * **Research aim** *` `Descriptive only *` `Relational (relationships) *` ` Experimental (differences) * **Experimental design** *` `Subject design: between/within *Number of IV’s *` `Number of IV levels * **Properties of dependent/outcome variable** *Normally distributed: parametric *` `Not normally distributed: non parametric
133
What do descriptive statistics not allow us to do ## Footnote Week 1
Make predictions or infer causality
134
What does a 95% confidence interval mean ## Footnote Week 1
95% of all sampled means will fall within the 95% bound of the population mean
135
When writing proportions (such as partial eta squared) what is the correct notation for them? ## Footnote General
you drop the leading zero and report it to 3dp
136
What can relationships vary in ## Footnote Week 8
Form, Direction ,Magnitude/strength
137
What are the two types of form a relationship can take ## Footnote Week 8
linear or curvilinear
138
What are the two directions a relationship can go in ## Footnote week 8
positive or negative
139
What is the magnitude/strength of a relationship measured in ## Footnote Week 8
The R value
140
What R values are indicative of a perfect positive relationshp, a perfect negative relationship and no relationship ## Footnote Week 8
1, -1 & 0
141
# [](http://) What does a an r value of 0 look like on a scatter graph ## Footnote Week 8
The dots are random and there is no systematic relationship
142
What are the values for weak, moderate and strong correlation ## Footnote Week 8
± 0.1 - 0.39 = weak correlation ± 0.4 - 0.69 = moderation correlation ± 0.7 - 0.99 = strong correlation
143
What is meant by non-linear correlation? ## Footnote Week 8
The idea that some DV’s peak at a certain point of an IV (e.g confidence in ability to pass course, too low = do worse, too high = do worse, at optimum = do best)
144
what does bivariate linear correlation involve ## Footnote Week 8
Linear correlation involves measuring relationship between 2 variables measured in a sample We use sample stats to estimate population parameters -whole logic of inferential statistical testing
145
What is the null hypothesis when doing a bivariate linear correlation? ## Footnote Week 8
no relationship between population variables
146
What parametric assumptions do we have when doing a bivariate linear correlation? (4) ## Footnote Week 8
* Both **variables should be continious** * **Related pairs**: each P (or observation) should have a pair of values (one for each axis/IV) * **absence of outliers**: outliers skew results, we can usually just remove them * **linearity**: points in scatterplot should be best explained w/ a straight line
147
Apart from the parametric assumptions, what other things are important to consider in regards to Correlation and correlation coefficients ## Footnote Week 8
**they are sensitive to range restrictions** * E.g floor and ceiling effects - floor effect, clustering of scores at bottom of scale, ceiling effect = clustering at top of scale * Can be hard to see relationship between variables as you dont see how far they stretch due to cap **There is debate over likert scales,** if you have 6-7 points, can get away with parametric, if you have less, best to use non-parametric
148
What happens if our data seriously violates our parametric assumptions for a correlation coefficient test? ## Footnote Week 8
use non-parametric equivalent **Spearman's rho (or kendall’s Tau if fewer than 20 cases)
149
What does Pearson's correlation coefficient do, and what does it's outcome show? ## Footnote Week 8
* Investigates relationship between 2 quantitative continuous variables * Resulting correlation coefficient ( r ) is a measure of strength of association between the two variable
150
What is covariance ## Footnote Wee k 8
Variance between the x and Y variable
151
How do you calculate Covariance? (we will never have to do this by hand but good practice to know) | The process ## Footnote Week 8
1. For each datapoint, calculate diff from mean of X and difference from mean of Y 2. Multiply the differences 3. Sum the multiplied differences 4. Divide by N-1
152
What does the correlation coefficient of pearson's provide us with and what actually is it? ## Footnote Week 8
a measure of variance shared between our X and Y variables it is a ratio of covariance (the shared variance) to separate variances
153
What does the distance of the r value in relation to 0 mean in regression? | Covariance and variances ## Footnote Week 8
If covariance is large relative to separate variances - r will be further from 0 If covariance is small relative to the separate variances - r will be closer to 0 If the things (variables) tend to go up and down together a lot (large covariance), the correlation (r) will be far from 0, indicating a strong relationship. If the things don't move together much (small covariance), the correlation will be closer to 0, indicating a weaker relationship.
154
What does R tell us in terms of a scatter graph? - How does the spread of the data points relate to R? ## Footnote Week 8
how well a straight line fits the data points (i.e strength of correlation → strength is about how tightly your data points fit on the straight line ) If data points cluster closely around the line, r will be further from 0 If data points are scattered some distance from the line, r will be closer to 0
155
What difference reflects sampling error? ## Footnote Week 8
The fact that if you took two samples from the same populations you're likely to get two different R values.
156
If we did the sampling distribution of correlation coefficients what would the null hypothesis be ## Footnote Week 8
if we plotted the R values, the majority would cluster around a common point,the true populaion mean.
157
What would the null hypothesis be for the sampling distribution of correlation coefficients ## Footnote Week 8
The mean would be 0 thus most R values would cluster close to 0
158
What is the r-distribution, what does it tell us and what is its mean value? ## Footnote Week 8
* It is the extent to which an individual sampled correlation coefficient (r) deviates from 0 which can be expressed in standard error units * we can determine the probability of obtaining an r-value of a given magnitude when the null hypothesis is true (p-value) * **the mean is 0**
159
What is the relationship between the R-value and the population ## Footnote Week 8
the obtained r-value is a point estimate of the underlying population r-value
160
When is linear regression used, and what is it's purpose? ## Footnote Week 9
* Similarly to linear correlation, it is used when the relationship between variables x & y can be described with a straight line * by proposing a model of the relationship between x & y, regression allows us to estimate how much y will change as a result of given change in x
161
What is the Y variable in linear regression? ## Footnote Week 9
The variable that is being predicted --> **the outcome variable**
162
What is variable X in linear regression and what is special about it
The variable that is being used to predict --> **The predictor variable** **can have Multiple predictor variables **
163
What is regression used for? (3) ## Footnote Week 9
* Investigating strength of effect x has on y * Estimating how much y will change as a result of a given change in x * Predicting a value of y, based on a known value of x
164
What assumption is made in regression that is not done in correlation and what does this mean in regards to what evidence can be obtained from regression? ## Footnote Week 9
Regression assumes that Y (to some extent) is dependent on X, this dependence may or may not reflect causal dependency. This therefore means **regression does not provide direct evidence of causality**
165
Does a significiant regression infer causality? ## Footnote Week 9
No, other factors other than our used predictor variables may come in to effect, thus can't suggest causality.
166
What are the 3 stages of performing a linear regression? ## Footnote Week 9
1. analysing the relationship between variables 2. proposing a model to explain the relationship 3. evaluating the model
167
What does ' analysing the relationship between variables' mean as a stage during linear regression? ## Footnote Week 9
Determining the strength & direction of the relationship
168
What kind of model is being proposed in linear regression and what is expected of this model? ## Footnote Week 9
a line of best fit where the distance between the line and the individual datapoints is minimised as much as possible
169
Ideally, for a line of best fit, where should the datapoints be relative to it ## Footnote Week 9
* half above, half below line * clustered as close as possible to line (signifies strong relationship) * distance is minimised as much as possible
170
What are the 2 properties of a regression line? ## Footnote Week 9
* **The intercept:** value of y when x is 0 (typically the baseline) (a value) * **The slope:** how much y changes as a result of a 1 Unit increase in x (the gradient) (b value)
171
When 'evaluating the model' , what are we doing and how do we do this ## Footnote Week 9
Assessing the goodness of fit of our model (best model/line of best fit) vs the simplest model (b=0, comparing data points to the mean of y)
172
What is the simplest model? ## Footnote Week 9
* Using the average Y value (mean) to estimate what Y might be * **assumes no relationship between x and y (b=0) **
173
What is the 'best model'? (What is it based on, what functions can it serve?) ## Footnote Week 9
* based on the relationship between x & y * uses regression line & line of best fit to **determine what a value of Y would be at a particular value of X** * allows for better predicition
174
When calculating the goodness of fit your model, what is the first thing you do? What does this provide? ## Footnote Week 9
first check how much variance remains when checking the simplest model (mean of y) to predict Y. **This provides the sum of squares total** (diff between each data point & mean value, & squaring it and summing them )
175
How do you calculate the variance not explained by the regression line and what does this give you?
calculate difference between each data point and point on the line it matches up to (score that would be predicted), square these differences and then add them together **This gives you the sum of square of the residuals**
176
What does more clustering around the regression line indicate for the model? | Week 9
The model is providing a better model, meaning there is smaller error variance, and that the model is more accurate (about variance due to the variable in question).
177
What is the sum of squares total in relation to regression? | Week 9
*the difference between the observed values of y and the mean of y i.e. the variance in y not explained by the simplest model (b = 0)* '
178
What best matches the description '*the difference between the observed values of y and those predicted by the regression line i.e. the variance in y not explained by the regression model* ' | Week 9
Sum of squares residual
179
What is reflective of the improvement in prediction using the regression model when compared to the simplest model? | Week 9
The difference between Sum of squares total and and sum of squares residual , in other words **the model sum of squares ** **SST - SSR = SSM **
180
What does a large sum of squares value indicate in regression? | Week 9
a large(er) improvement in the prediction using the regression model over the simplest model
181
What can we use F tests (what we call an ANOVA in cases of regression to avoid confusion) to evaluate and what is this reported as? | Week 9
the improvement due to the model (SSM) relative to the variance the model does not explain ( SSR) It is reported as the F-ratio
182
What does the F ratio do in goodness of fit tests and how do you calculate it? | Week 9
* provides a measure of how much the model has improved the prediction of y, relative to level of inaccuracy of the model * F = Model mean squares / residual mean squares
183
What would you expect to see in terms of model mean squares (MSM) and residual means squares (MSR) if the regression model is good at predicting y? | Week 9
the improvement in prediction due to the model (MSM) will be large, while the level of inaccuracy of the model (MSR ) will be small
184
What are the assumptions we make for simple linear regression? (5) | Week 9
* Linearity: x and y must be linearly related * Absence of outliers * Normality * homoscedasticity * Independence of residuals
185
How do we check for the assumption of normality in regression models and what would we expect to see (idk if this'll be on the exam but just know it init) | Week 9
Using a normal P-P plot of regression standardised residual * Ideally data points will lie in a reasonably straight diagonal line from bottom left to top right - this would suggest no major deviations from normality
186
How do we check for the assumption of Homoscedasticity in regression models and what would we expect to see
Using the scatterplot of regresssion standardised residual * Ideally, residuals will be roughly, rectangularly distributed, with most scores concentrated in the centre (0)
187
What do the values of R, R^2 and adjusated R^2 each tell you about regression in the SPSS output | Week 9
* **R** - strength of relationship between x and Y * **R^2**- proportion of variance explained by the model * **Adjusted R^2** - R^2 adjusted to account for the degrees of freedom (number of participants and number of parameters being estimated)
188
Why would we use adjusted R^2 | Week 9
-If we wanted to use the regression model to generalise the results of our sample to the population, R2 is too optimistic
189
What are the key values identifying when evaluating the regression model and what do they mean? (3 values) | Week 9
* **a** - constant, also the intercept where the line intersects Y * **b** - gradient of slope * **beta** - slope converted to a standardised score
190
If there is only one predictor variable, what does this mean for the beta coefficient? | Week 9
Beta coefficient and R are the same value
191
Why would we use a T-test in a regression model | Week 9
* t-value: equivalent to √F when we only have 1 predictor variable) * *i.e.** it does the same job as the F-test when we have just one predictor variable**
192
What additional info do we have regarding the b value in regression models? | Week 9
The b value has 95% confidence intervals
193
What else can R^2 be interpreted as | Week 9
the amount of variance in y explained by the model (SSM), relative to the total variance in y (SST)
194
In what ways can we express R^2 | Week 9
as a proportion or as a percentage
195
What is the fundamental difference between correlation and regression | Week 9
Correlation shows what variance is shared,Regression explains the variance by showing that a certain amount of the variance can be explained by the mode
196
What does multiple regression allow us to do | Week 9
to assess the influence of several predictor variables (e.g. x1, x2, x3 etc…) on the outcome variable (y)
197
How does multiple regression work (basic description)/what do you need to do in order to conduct it? | Week 9
Need to combine both predictor variables to see the joint effect on the outcome variable
198
Why do we have to use a plane of best fit when proposing a model in multiple regression | Week 9
Because you're looking at 3 things; outcome variable & predictor variables one and two , thus it will be best model in 3 dimensions instead of two, thats why we look at a plane instead of a line
199
What are some of the assumptions being made multiple regression? (4)
* **Sufficient sample size** * **Linearity** - Predictor variables should be linearly related to the outcome variable * **Absence of outliers** * **Multicollinearity** - *Ideally, predictor variables will be correlated with the outcome variable but not with one another
200
What does a violaition of the assumption of multicollinearity mean? What is a way to tell if this has be violated? | Week 9
* There is some overlap in the variables you are measuring for (the predictor variables might be one thing in two different terms - e.g., confidence and self-esteem are basically the same) * Predictor variables which are highly correlated with one another (r = .9 and above) are measuring much the same thing
201
if a multiple regression model is significant what does this mean | Week 9
* The regression model provides a better fit (explains more variance) than the simplest model *** I.e at least one of the slopes is not 0 (without specifying which)**
202
What does hierarchical regression involve and what does this allow us to see? | Week 10
Hierarchical regression involves entering predictor variables in a specified order of 'steps' based on theoretical grounds. This allows us to see the relative contribution of each 'step' (set of predictor variables) in making the prediction stronger.
203
Why do we use hierarchical regression ## Footnote Week 10
* Examine influence of predictor variable(s) on an outcome variable after ‘controlling for’ (i.e partialling out) the influence of other variables
204
When doing a hierarchical regression what is the difference between step one and two.
Step 1 (what you want to partial out) Step 2 (what you want to measure) = optimism
205
When looking at hierarchical regression in SPSS, what are we looking at? ## Footnote Week 11
The row labelled Model 2. Particularly the R square change, F Change and Sig F change values. (Check SPSS, this will make sense)
206
What does the sig f change column tell us in Hierarchical regression? ## Footnote Week 11
Whether this predictor variable alone explains a significant proportion of the variance of the outcome variable
207
What type of non-parametric tests are there and what are their parametric equivalents ## Footnote Week 11
* Between P’s - Independent T-test → Mann-whitney U Test * Within P’s - Paired T-test → Wilcoxon test * Between P’s - 1 way independent ANOVA - Kruskal Wallis test * Within P’s - 1 way Repeated measures ANOVA → Friedman test
208
What are the non parametric tests for factorial Designs ## Footnote Week 11
Factorial designs do not have a non parametric equivalent and either need to have a simplified design or have adjustments made
209
What is the non parametric equivalent of pearsons correlation coefficient when N>20 ## Footnote Week 11
Spearmans rho
210
What is the non parametric equivalent of pearsons correlation coefficient when N<20? ## Footnote week11
Kendall's tau
211
# ``` What types of nonparametric test exist for tests of relationships? (2) ## Footnote week 11
Spearmans rho and Kendallls tau are both non parametric equivalents of pearsons correlation coefficient
212
What is the non parametric equivalent of partial correlation ## Footnote week 11
Partial correlation has no non-parametric equivalent
213
what is the non parametric equivalent for regression
Regression has no non-parametric equivalent
214
What types of test do we use when analysing categorical data ## Footnote Week 11
Chi-square (one variable or test of independence)
215
What type of test is a chi-square test ## Footnote Week 11
non-parametric
216
What are the parametric equvalents of One-variable Chi-Square (a.k.a. Goodness of Fit Test) and Chi-Square Test of Independence (two variables)
neither of them have parametric equivlaents, they are non-parametric only
217
What is an example of an Omnibus test? ## Footnote Week 3
An ANOVA (because they control for familywise error rate)
218
How do you calculate the number of comparisons for an IV with n levels ## Footnote Week 3
n x (n-1/2) e.g N = 3 3((3-1)/2) = 3(2) / 2 = =6/2 = 3 e.g N = 6 6((6-1)/2) = 6((5)/2) = 30/2 = 15