Hypothesis Testing Flashcards
What does the p value tell us?
The p value tells us the likelihood of obtaining our results by chance (i.e. When our defined null hypothesis is true).
What does a p value of 0.025 mean?
It would tell us that the chance of obtaining our results by chance was 0.025. This is very unlikely and we can use this result to reject the null hypothesis.
What does a small p value tell us?
Indicates that there is a difference (or association) and we can reject the null hypothesis.
What does a large p value indicate?
It indicates that there is no evidence of a difference (or association) and we fail to reject the null hypothesis.
True or false - the p value, the size of the effect and the number of observations (sample size) are all interrelated.
True. If you carry out a small study, you can get a p value which is not significant even when the effect is large. If you carry out a large study, even small differences which are clinically and epidemiologically irrelevant, may achieve statistical significance.
Summarise how to go about hypothesis testing.
1) . Start by specifying the study hypothesis and the null hypothesis (which is usually that there is no difference between the groups).
2) . We assume the null hypothesis is true. That there is no difference between our two groups.
3) . We calculate the chance that we would get the difference that we observed if the null hypothesis were true. This chance is called the p value.
4) . We then accept or reject the null hypothesis on the basis of the size of the p value. If the p value is small, we reject the null hypothesis. If the p value is big, we accept the null hypothesis.
How do we choose what hypothesis test to use in order to obtain the p value?
We use a different test depending on the design of the study (unpaired or paired) and on what sort of outcome variable we are dealing with, continuous or categorical, whether it is normally distributed or not. Each test may have some additional assumptions associated with it, and you should always check to make sure these are valid before carrying out and reporting the test.
What is the most important question to answer when deciding how to analyse continuous variables?
One of the most important questions to answer when deciding how to analyse continuous variables is whether they are following a normal distribution or not, so this should be the first thing you look at. You need to assess the histogram and summary statistics.
What is another name for the Student t-test?
The independent samples t-test.
What is the independent sample t-test used for?
The independent sample t-test (or student t-test) is used when we want to compare two groups and the outcome variable is a continuous normally distributed variable, such as birth weight. The standard t-test also assumes that the variation (scatter) in the two groups is approximately the same.
The t-test provides a p value which is interpreted as previously described. The probability that we are after is the probability of getting the difference in the sample means that we observed when the true difference is zero. Now the probability will depend on how big the difference is between our two sample means, and it will depend on how much (sampling) error there might be in our estimate of the difference in means. We already know that we have a measure of the amount of error in our estimate: this is called the standard error or the mean difference. We used this when we calculated the confidence interval for the difference between two means, but all you need to realise is that, just as for a single mean, this standard error is a formula which depends on the sample sizes and the standard deviation of the thing we are measuring.
The t-test uses the observed difference in the sample means, and the standard error (sampling error) for the difference in means to calculate the p value.
What is a t-statistic?
A t-statistic is the difference in sample means divided by the standard error of the difference in means.
A large difference in sample means and a small standard error will lead to a large t-statistic, indicating that the probability that the observed difference happened by chance is small and producing a small p value.
A small difference in means or a large standard error will produce a small t statistic meaning that the probability that the observed difference happened by chance is large and the p value will be large.
The t statistics can be converted into a p value using statistical tables and the t-distribution, but more commonly using statistical software.
When is it appropriate to use the t-test?
When comparing observations on a continuous variable between two groups the t-test is valid only when the data are normally distributed and when the two populations have equal variances. Because of these assumptions about the underlying distribution, it is known as a parametric test.
What do we do if the assumptions for using a t-test (parametric test) do not hold?
If your data do not appear to be normally distributed, it will be necessary to use a non-parametric test, and to present medians (and difference in medians). Non-parametric tests make no assumptions about the underlying distribution of the data.
What are the advantages and disadvantages of using non-parametric tests?
Non-parametric tests make no underlying assumptions of the data. However, they do have their disadvantages:
- they are less powerful than parametric tests, ie they are less likely to detect a true effect as significant.
- it is not easy to obtain confidence intervals using the non-parametric approach (SPSS does not calculate these; programs such as Minitab will give you an estimated confidence interval for some non-parametric tests.)
Describe the Wilcoxon rank sum tests.
The Wilcoxon rank sum test is the non-parametric equivalent of the independent samples t-test. The Mann-Whitney U test is a. Alternative to the Wixcoxon rank sum test that uses a different formula but results in the same p-value. The theory behind this test is given in the notes for information but you will only be expected to carry out this test using SPSS.
What hypothesis test would you use to assess the differences between categorical variables statistically?
A chi-squared test.
What is the particular hypothesis test used for assessing e association between two categorical variables called?
The chi-squared test or the chi-squared test for independence.
Describe how you would go about carrying out a chi-squared test.
1) . State the null hypothesis (e.g. There is no association between smoking and disease).
2) . Compute the test statistic
The chi-squared test statistic tells us how close the actual values seen on the table (observed values) are to the values we would have expected (expected values) were there no association between the two variables. Large values of the test statistic suggest they are not close, so the data are inconsistent with the null hypothesis. Small values suggest the observed and expected values are similar, which is consistent with the null hypothesis. If you are sing SPSS it will compute this statistic for you. However, you should also understand how to compute it by hand.
First compute the expected values. From the contingency table, we know the overall proportion with disease for the smoking example was 20/200=0.1 or 10%. Therefore if the null hypothesis were true, we would expect 10% of smokers to have disease and 10% of the non-smokers to have disease. So the expected number of smokers with the disease would be 10% of 78 = 10/100 x 78 = 7.8.
As there are are 78 smoker in the study and we would expect 7.8 of them to have disease, it follows that 78-7.8 = 70.2 would be expected to not have disease.
Expected numbers of non-smokers with and without disease can be calculated similarly. An easy way to compute the expected numbers is too use the general formula:
Expected number = row total x column total / overall total
We can then construct a table of the numbers of each expected value (smokers with disease, non-smokers with disease, smokers with no disease, non-smokers with no disease).
The test statistic simply compares the observed values (O) in the cells of the table with the expected values (E). It is computed by taking each cell of the table in turn:
- subtract the expected value (E) from the observed value (O)
- square this value
- divide this by the expected value
- having done this for all cells, sum the numbers obtained together
X^2 = (the sum of) (O-E)^2 / E
To decide how big this test statistic has to be before we conclude it to be inconsistent with the null hypothesis we refer the test statistic to the chi-squared distribution and obtain the corresponding p-value.
3). Obtain a p-value
Again, using the computer the p-value will be given to you automatically. However, if doing a chi-squared test by hand,you will need to look up your value of the test statistic on the table of the chi-squared distribution and read off the p-value. To read the table, you need to know the degrees of freedom which are given by ‘the number of rows minus one’ multiplied by ‘the number of columns minus one’:
Degrees of freedom = (r-1)x(c-1) where r = number of rows and c = number of columns.
Fin the row on the table that corresponds to the degrees of freedom that you have. Read across the table to find where your test statistic falls. For the two numbers between which your statistic falls, read the top of the columns and see what p-values they correspond to. So if your test statistic falls between the two values under P=0.05 and P=0.025, then the p-value is between these values.
4). Interpret the p-value
Remember the p-value tells us how likely differences as large as those seen in our sample would have arisen were the null hypothesis true, ie if there truly was no association. Standard practice is to use the cut-off of p0.05, we fail to reject the null hypothesis and conclude that we have no evidence for a significant association.
What kind of testing do we use to obtain the p value?
Hypothesis testing
The chi squared test can only be used on 2x2 tables. True or false?
False. The chi-squared test can be used on any sized tables. Larger tables are sometimes called r x c tables where r is the number of rows and c the number of columns.
What is continuity correction or Fishers exact test used for?
If you have a small sample, the chi-squared test may not be valid. If we have a 2x2 table, we can use a continuity correction or Fishers exact test when we have small numbers.
If the expected numbers in the cells of a 2x2 table are small, we can improve the chi-squared test by applying a continuity correction (known as Yates’ continuity correction). Instead of the previous chi-squared formula we use a slightly different formula. SPSS will also do this for you. This test statistic is then used to obtain a p-value in the manner as previously described.
There is no set rule as to when to apply this correction, but one suggested guide is to use it if any of the expected numbers in the table are less than 10. If you look at any statistical literature you will find some people that recommend this correction and some people who claim it is too conservative and may lead to a type 2 error (failure to reject the null hypothesis when you should).
Describe Fishers exact test for 2x2 tables.
If the expected numbers are very small, the chi-squared test (even with the continuity correction) is not valid and a special test called Fishers exact test must be used. As a guide, use the exact test rather than Chi-squared when either of the following is true:
- the overall total is
Describe a Chi-squared test for trend.
A special case arises when the outcome variable is binary and the exposure variable is an ordered categorical variable. The usual chi-squared test assesses whether the proportion with the outcome differs between the different levels of the exposure. However when the exposure variable is an ordered categorical variable, a more sensitive test is to look for an increasing or decreasing trend in the proportions across the exposure categories. This test is called the chi-squared test for trend.
The null hypothesis is the same as for an ordinary chi-squared test, namely that there is no association between the two variables. The test is conducted by assigning the numerical scores 1,2,3 etc to the columns of the table, and then calculating the mean scores in those who have and in those who don’t have the outcome/disease and comparing them.
We will not delve further into the details of this test, but simply recognise that this test exists and is provided by SPSS as we shall see later where either the row or column variable is binary. Remember, it only makes sense to look at this test when the other variable is ordered in some way e.g. age group.
Describe significance and magnitude.
We have seen how to assess the significance of an association and we have seen previously how to assess the magnitude using the appropriate measure of effect such as an odds ratio. It is important to understand that these are different and tell us different things so both should be presented. For example we could get an odds ratio if 1.2 (small magnitude) but if the sample was very large, this could be statistically significant (ie unlikely to have arisen by chance). Conversely, a study could yield an odds ratio of 3 (large magnitude), but if based on a small sample, statistical significance may not be reached (eg p=0.1) suggesting such a difference could have arisen by chance alone.