stats Flashcards
(32 cards)
null hypothesis
based on normal conditions eg 0.5 probability of heads when tossing a fair coin
alternative hypothesis
when the probability is not “normal”
significance level
you set a lower tail and upper tail with anything above or below these numbers allowing you to reject the null hypothesis
combination
doesn’t care about the order of things for example 3 cards in order A,b,c is the same as 3 cards in order b,c,a
permeantation
does care about order eg number for a safe so different orders of the same numbers are different possible answers
If you multiply all your data points by a number what happens to mean and sd
mean and sd are both multiplied by that number
if you add a number to all data points what happens to mean and sd
add number to mean, nothing happens to sd
sum of squared deviation (sxx)
sum of (x-mean)^2
sd
- root(sxx/n-1)
2. root(sum of the x^2-n x mean^2/n-1)
lower boundary outliers
- LQ-1.5 x IQR
2. mean - 2 x sd
upper boundary outliers
- UQ+1.5 x IQR
2. mean + 2 x sd
how do you know if 2 events are mutually exclusive
p(A u B)= p(A)+P(B)
p(A n B)=0
p(B|A)
p(B|A)=P(BnA)/P(A)
P(B|A’)
P(B|A’)=P(BnA’)/p(A’)
parent population
set of all possible data points from which you will draw your sample
census
When population is small enough data can be collected from every member of the population
Sampling fraction
Sample size/population size
Sampling error
The difference between an estimate derived through stats and its true value
Simple random sampling
assign everyone in the population are number and generate random numbers (shift, decimal point, x by total in pop.). Take information from those with corresponding numbers. You should take no less than 30 samples
pros and cons of simple random sampling
- equal chance of getting chosen so it will provide an accurate picture of the population and a spread
- however it’s time consuming and access to the entire population is unlikely
Stratified sampling
The population is divided into different groups which will have different information (for example tomatoes sizes with a population divided into tomato varieties). Each group should be represented in your sample asa percentage of the sample it takes up equal to the percentage of the population the group takes up
Pros and cons of stratified sampling
- Results are likely to accurately reflect the population studied and take into account a wide spread
- you can’t always divide the population into groups and sometimes members will not fit into any group or will fit into multiple
Cluster sampling
The population is in groups but there is no reason to suspect the information between groups will be hugely different so use one or more groups as the sample
Pros and cons of cluster sampling
- very easy to conduct
- clusters are likely to have been picked by human bias so limited in how representative they are