Tests and Assumptions Flashcards

1
Q

Continuous data Vs discrete.

A

The data can assume any value within a continuous range Vs the data cannot assume every value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Independent Samples T-test

A
  • To compare unpaired/independent samples where each subject is only measured once; and to determine if the two datasets come from the same population. e.g. analysing differences between cultivars of wheat or proportions of sweets in two different jars.
  • Null: The two sets of data are the same.
  • Assumptions: data is approx. normally distributed, data is continuous and variance of the two sets of data are homogenous.
  • Outcome: Group statistics show measures of central tendency, T-test output shows Levene’s test, df’s, Sig(2-tailed) p-value and 95% confidence intervals.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mann Whitney U-test

A
  • The non-parametric equivalent of the Independent Samples T-test. Used for comparisons between independent values in two sets of data when the data do not follow a form of normal distribution. The test uses ranks instead.
  • Null: The two sets of data are the same.
  • Assumptions: none.
  • Output: Details the null hypothesis and states whether it should be rejected or retained.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Paired Samples t-test

A
  • Used when an individual has been measured twice or when a clone has been subjected to different treatments. The values in the two sets of data are therefore dependent (in units). e.g. the before and after air particulate measures of a power plant installation; the reaction speed of a dominant and non-dominant hand.
  • Null: There is no difference between the two data sets.
  • Assumptions: Values for each data set follow a normal distribution, variables are measured on a continuous scale, the variance of the two sets of data are homogenous.
  • Output: Sig(2-tailed) p-value with a critical threshold of 0.05 (if significant then reject the null hypothesis).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Wilcoxon Matched-pair Signed Rank Test

A

The non-parametric equivalent of the Paired Samples T-test. Used for paired data where the values are dependent between two data sets. e.g. the measures of river flow in winter compared to summer.

  • Null: There is no difference in the subject being analysed between sample one and two.
  • Assumptions: The data is required to be on a continuous scale and a minimum of 6 pairs of data.
  • Output: Details the null hypothesis and states whether it should be rejected or retained. Also displays the test statistics with the Asymp. Sig. (2 tailed) p-value.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

One-Sample T-test

A
  • Used when it is desired to compare the mean of a set of values to a fixed value. e.g. analysing whether every box in a cereal production line is equivalent to 500g on average.
  • Null: The set of values is the same as the fixed.
  • Assumptions: The dependent variable should be approximately normally distributed, data should be on a continuous scale and values should be independent on one another.
  • Output: Sig(2-tailed) p-value, confidence intervals should correspond.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Wilcoxon Signed Rank Test

A

The non-parametric equivalent of the One Sample T-test. Ranks the data instead of analysing a normal distribution i.e. the median value of the ranked data should be the same as the desired fixed value.
- Output: Details the null hypothesis and states whether it should be rejected or retained.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

One-way ANOVA

A
  • ANOVA is used for the analysis of variance. It is an extension of the Paired-Samples T-test as it can make 3 or more comparisons of samples. It does this through comparison of any two of two or more means by measuring the between group variance against the within group variance. e.g. the analysis of variation between grain cultivars using different cultivars and grain size; to the ability of angle perception between male and females. It must be noted that factors have to be labelled as fixed or random i.e. were the levels chosen from a continuous scale or are they discrete categories.
  • Null: There is no difference between x and y in z i.e. there is no difference between male and female for angle perception test scores.
  • Assumptions: The residuals are normally distributed, there is equal variance both within and between groups, values are independent both within and between groups and data is continuous.
  • Output: F-ratio (large slope, small residual = significant) and corresponding p-value and R squared value showing how much of the variance is explained by the variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Kruskal Wallis

A
  • The non-parametric equivalent of ANOVA, used when the residuals have been shown to be non-normally distributed with Q-Q plots. Compares unpaired samples using ranks.
  • Null: There is no significant difference between x and y.
  • Assumptions: Data is on a ‘fairly’ continuous scale.
  • Output: P-value is labelled as the Asymp. Sig.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bonferroni Correction

A

Bonferroni correction is in place to control for the effect of multiple testing. It is the calculation of type 1 error occurring (the chance that a test outcome could provide a false positive).
Chance of not getting a false positive = 0.95, so the combined probability of e.g 4 tests = 0.95 to the power of 4 = 0.81. Therefore the probability of getting a false positive is 1 - 0.81. = 19%.
Therefore you should always times a p-value by the number of tests you have performed to control for this effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

2-way ANOVA

A
  • Used when there are two factors that have been divided into levels and there is only one observation for each factor level combination i.e. a two-way ANOVA deals with two factors simultaneously.
  • Null: 3 null hypotheses
  • Assumptions: The data is continuous, there is approx. equal variance in each factor combination, residuals should be normally distributed, each factor has at least 2 levels which should be coded when analysing with a programme.
  • Output: There are 3 p-values corresponding to the 3 hypotheses.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Chi squared: Goodness of fit and test of association

A
  • The Goodness of fit test is widely used. It can be easily utilised when you are presented with a table of observed and expected values. It is customary that the data is grouped to avoid any expected values being less than 1. It is different from any other test since the data is comprised of counts of discrete events. e.g. may use observed data of progeny to analyse if it conforms to the expected Mendelian ratios.
  • Null: That the observed values conform to the expected values.
  • Assumptions: no assumptions of normally distributed data etc. just that the counts are independent.
  • Output: Calculate the sum of observed values minus the expected (squared), divided by n - 1. This value is then looked up in a table.
  • Tests for Association, this test is utilised when there are no fixed ratios for comparison or expected results. The expected outcome is calculated based on the assumption that there is no relationship between the rows and columns. e.g. a pilot drug is being tested to decipher if it can combat the effect of angina (4 test groups: drug group and placebo each with symptom level and non-symptom level). The expected ratios are calculated through working out the percentage of the total that should be present in each row and column. Then you gain the p-value as above.
  • Null: There is no association between x and y.
  • Assumptions: The values are independent both in rows and columns, not more than 20% of values should be less than 5.

note: SPSS: both columns should be set to nominal and numeric.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Linear Regression

A
  • Used for two variables that have been shown to have a linear relationship (scatterplot). Linear regression is used to determine the form and strength of the relationship which allows any value of y to be predicted from a value of x. The slope of the line and the noise are crucial measures of relation. e.g. how the uptake of an experimental drug is affected by the PH of the stomach.
  • Null: There is no relationship between x and y.
  • Assumptions: residuals are normally distributed, there is no relationship between the residuals and x or y, there is equal variance in the range of y for the predicted values of x, the relationship is linear.
  • Output: Model summary provides the R squared value. ANOVA results provides the sum of squares and the residual sum of square, the F statistic and the associated P-value, the coefficients box shows (in the first column working down) the coefficient and the slope of the line so that any value of y can be obtained.

note: homescedasticity and heteroscedasticity for no obvious relation between predicted values and residuals
note: predicted standardised value against standardised residuals for equal variance across the range of y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Binary Logistic Regression

A
  • Used to analyse the relationship between two variable where the outcome is observed in a binary fashion. Therefore, measuring how a predictor variable may influence the proportion of possible outcomes. e.g. how increasing the amount of toxin in a pond may affect the proportion of dead: alive fish.
  • Null: There is no relationship between the predictor variable and the measured proportion of x.
  • Assumptions: unlike linear regression binary regression does NOT assume equal variance or normally distributed residuals.
  • Output: Variables in equation box provides the coefficient and slope, also provides you with the Wald statistics that acts as the p-value.

note: the formula for the slope and intercept is not y = mx + c, it is the formula for probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Repeated Measures ANOVA

A
  • Used when data is dependent from sample to sample. It is an extension of the Paired samples T-test as it allows 3 or more comparisons to be made. A common design is a test for measuring before, during and after i.e. changing over time in response to a factor. Or e.g. measuring sulphur dioxide levels annually.
  • Null: hypotheses for each factor level and their interaction.
  • Assumptions: normally distributed residuals, equal variance, data should be continuous and there should be equal variance within each factor.
  • Output: Mauchly’s test for sphericity (if significant then used the second line of output for p-values in next box), first p-value deals with the significance of the within subject effect e.g. time and the next deals with the between subject effect e.g. site.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Friedman Test

A
  • The non-parametric equivalent of the repeated ANOVA tests if the residuals have been shown to be non-normally distributed. It can only deal with the within subject factor not between subject factor i.e. it will analyse the change over time but not the difference between sites.
  • The test details the hypothesis, p-value and whether to reject or retain the Ho.
17
Q

ANCOVA

A
  • Used when there may be another factor that cannot be controlled for but may still be affecting the outcome of the investigation. e.g. detecting if two shrimp are different species by analysing their leg beat pattern but cannot control for sea temperature.
  • Null: significant difference between x and y.
  • Assumptions: ANOVA
  • Output: F statistic and corresponding p-value, if significant that there is a statistically significant difference between x and y even when z is accounted for.
18
Q

Poisson Probability

A
  • Used when the Poisson distribution is more useful than the normal. Utilised to calculate the probability of an event when you already have obtained a measure of the average. It uses a scale of space and time do discern the randomness or independence of an event. e.g. calculating the likelihood of birds visiting a feeder either 5 or 10 times each hour when we know that the mean visit frequency is 5.
  • Assumptions: that events are independent, mean is known.
  • Output: excel: first column calculates the probability of an exact occurrence, the next the probability of at least an occurrence and the third calculate the occurrence of x or more.
19
Q

Correlation

A

x

20
Q

Principal Component Analysis

A

x