statistics Flashcards
types of variables
- qualitative: either dichotomous (binary), nominal, ordinal
- quantitative: discrete or continuous
dichotomous
qualitative variable
data where every observation is in one of two categories (yes/no)
nominal
qualitative variable
- 3 or more categories; no inherent ordering
- ex cow breeds
ordinal
qualitative variables
- categories in 3 or more categories with categories having inherent order
- ex; gum colour; normal, pale, white
discrete data (counts)
quantitative variable
- can only have values as whole numbers
- ex; number of animals, heart rate, bacterial count
continuous
quantitative variable
have any value within a defined range
measurement
ex body weight, blood pressure, age, hormone concentration
descriptive statistics
conducted to explore patterns in data and to validate/ check the data
depend on the type of data
mean
average
median
line them all up and choose one in middle; not affected by a few extremes
- more accurate indicator of average
mode
most commonly observed variable
normal distribution
mode, median and mean are very similar
percentile
- a number that indicates the percentage of values less than or equal to that number
- 50th percentile is the median
- 25% percentile means that at this data point 25% of data is less than that value
box and whisker plot
the box is the 25-75% percentile (lower and upper quartile), the line in the box is the median
any dots outside of the range are outliers; either more or less than 3/2 times of lower/upper quartile
variance and standard deviation are measures of
the spread of data around the mean
variance s^2
the sum of the squares of the difference of each of n values from the mean, divided by degrees of freedom (n-1)
so take the mean, then each data point that point minus the mean
square root it
then divide by n-1
standard deviation s
square root of variance
estimates the average variation of n the values from the mean
tells us how much variability can be expected among individuals
2/3s of the valies will be within mean +/- one standard deviation
95% of values will be within mean +/- 2 standard deviations
standard error of the mean (SEM)
standard deviation/ square root of number sampled
how close sample mean is to actual mean in target population
confidence interval (one sample only ie one type of experiment)
if you have a x confidence interval then in every 100 samples you collect x amount of them contain the actual mean
NOT CORRECT; if i do an experiment today there is x% chance i get the actual pop mean
confidence interval example
mean +/= tSEM
t will be given to us
gives a range
null hypothesis
there is NO difference between groups
alternative hypothesis
hypothesis that there is a difference between groups
want to disprove the
null hypothesis
steps in hypothesis testing
1) from observed data, a test statistic is calculated
2) the probability (p-value) of observing a test statistic as large or larger than observed, if the null hypothesis is true is calculated
3) p value is compared to a cut off termed level of significance; should be small because we don’t want to reject null hypothesis when it is true
p value
probability of observing a test statistic as large or larger than that observed, if null hypothesis is true
- if p very small, unlikely null is true; reject null hypothesis, 0.05 is alpha
- if p is large then data are consistent w the null hypothesis