Test on Samples Flashcards
(17 cards)
Outline the process for testing on two samples
- PDA 2. If data is normally distributed and the variances are equal then use the t test. If data is non parametric then use the Mann-Whitney Test.
- Looking for the confidence interval; contains zero then no evidence of a statictically significant different in the sample means
- P value shows the probability of the null hypthesis being true; low reject Ho
What is a Mann-Whitney Test?
A non-parametric statistical test that ranks the data set and uses ordinal/ non-parametic data
What is a T-test and the assumptions for this?
Is a parametic statistical test which uses means and variance.
2 Assumptions of samples are randomly drawn from a normally distributed population and that the variances of the population are equal.
Uses interval or ration data and no outliers
Outline the process for tests on more than 2 samples.
If data is normally distributed and the variances equal (Bartlett’s number > 0.05 same variance) then use ANOVA which presents a p value if low must be different and the Tukeys Test
Use KW if the sample is non parametric with no outliers, get p value and use z score
Use Moods median if outliers are present
What is Tukey’s Test?
It compares the difference in means between each sample and analyses pairs when/ if they cross zero of pairs.
What is the Kruskall-Wallis Test?
It is the preferred non-parametric test and compares medians and is based on ranks. P value and null says each sample from same pop. P value is low we use the z score; larger than +/- 2 therefore that group is different.
What is Mood’s Median?
Looks for the number of samples above or below overall median. It calculates the overall median for the whole data set and how does this differ to the median of each set. Output; looking for where N values are unbalanced and a p value
What is the Chi-Squared Test?
Used to analyse categorical data and to test for an association between two variables. Commonly used on two way tables.
Assumptions: Data is collected by random sampling and that all data points are independent measurements or observations.
What does Chi-squared do?
Compares the observed frequencies to the expected frequencies if their was no pattern to the data (null)
Outline the Chi-Squared process?
Find the difference (Chi) for the whole table, compare the chi calc to chi crit. P value significant and calc is larger than cite we reject Ho so there is a relationship between one variable and another.
What are the three time series trends and their definitions?
Smooth Trend Pattern (Overall trend) has a reasonably smooth increasing or decreasing trend between data points.
Seasonal trend: A pattern when the values repeat at regular intervals or certain time periods
Irregular: Data set lacks any systematic pattern or has outliers i.e. uncertain in a peak or trough
What is a moving average?
Smooths the dataset and removes short terms fluctuations to hightlight the overall longer term trend.
What is cluster analysis?
Technique used to classify peopleor things into groups and assess their similarity.
Linkage Techniques?
Groups individuals according to those that are most similar to each other.
Variance Techniques
Individuals are split into groups according to their variance
Single Linkage Clustering
Create a data matrix for all pairs in a data set and produce a dendrogram.
What is a dendrogram?
It represents the structure/grouping of individuals in the data matrix. Interpret it by identifying groups of individuals within the dendrogram and the distance measure where all groups are joined together.