Probably And Statistics - Part 2 Flashcards
What is an estimator of a population parameter?
This is:
-A random variable that depends on sample information
-Of which whose value provides an approximation to this unknown parameter.
A specific value of that random variable is called an estimate.
What are the 4 criteria we use when decided which of two estimators to use?
1) Unbiasedness
2) Efficiency
3) Consistency
4) Mean Square Error
What is unbiasedness?
An estimator will be unbiased if the expected value of the estimator is equal to the true population value.
What is efficiency of an estimator?
Efficiency of an estimator refers to how reliable it is.
-If you have two estimators, each with the same number of sample variations, then the more efficient sample will be the sample that has a lower variance.
NOTE: Efficiency takes priority over bias.
What is the consistency of estimators?
An estimator is considered to be consistent if the difference between the expected value of an estimator and the parameter decreases as the sample size increases, essentially implying as the sample size approaches infinity, the bias diminishes.
What is Mean Square Error (MSE)?
If the error is the difference between the true value of the parameter and the estimated value of the parameter, the MSE is the mean of the error squared. This happens to equal Var(X) - bias^2.
If unbiased, the MSE will equal the variance.
An estimator with a smaller MSE is said to be more efficient.
How can we find a confidence interval for a parameter?
We can do this thanks to the central limit theorem (CLT), as for large samples the standardised mean approaches a standard normal random variable.
-Depending on if we are assessing the mean, variance or proportion, the calculation will be slightly different.
-There’s no guarantee the true parameter will actually fall in this interval, that depends on our confidence level. We can never be 100% confident.
What is the difference between point and interval estimates?
A point estimate is a single value, where as an interval estimate consists of a range of values and has the advantage of providing greater confidence than a point estimate. This is also called a confidence interval.
What is significance level?
If 100%(1 - alpha) is the confidence level, then alpha will be the significance level, and will lie between zero and 1.
What do we do when we want to find the confidence interval, but the variance of the population is unknown?
Usually,new will replace it with its sample variance and use T-distribution tables (if looking at the mean).
If we are looking for the interval estimate of the variance, then we will use the chi-squared table.
What assumptions do we use when finding the confidence interval?
-The population variance has to be known
-The population is normally distributed
-If the population isn’t normal, use a large sample.
For the mean, we use:
Sample mean +- Z x (s.d/sqrt(n)), this is equal to the point estimate +- the margin of error.
How can we reduce the margin of error?
Increase the sample size,
population standard deviation can be reduced,
Decrease the confidence level.
When do we use the t-test for confidence interval rather than the z-test.
-We don’t know the population variance so estimate it using the sample variance
-The population variance is known but the sample size is small (less than 25).
t = (xbar- mu)/(sample s.d/ sqrt(n))
Then, the confidence interval is the same as the z-test, just using t instead of z.
The degree of freedom for a t-test is n-1.
How do we find the confidence interval for population proportion?
If P is the proportion, standard deviation = sqrt(p(1-p)/n).
The confidence interval is then:
p -+ Z(sample s.d)
NOTE: YOU ALWAYS USE A Z TEST FOR PROPORTION, NEVER A T TEST.
How do we find the confidence interval for the population variance?
First we assume the population is normally distributed, and then the confidence interval is based on the sample variance.
The chi-squares distribution has an n-1 degree of freedom.
Then use variance = (n-1)sample variance/chi squared result.
With chi squared value, LB will be the significance level/2, and the UB will be 1 - that.
What are dependent samples?
This is the confidence interval estimation of the difference between two normal populations means.
These two samples could be:
-Paired/matched samples
-Repeated measures
d = x - y
How do we calculate the confidence interval difference between two means?
If you have n ‘ith’ pairs between Xi and Yi (of which the observations must be somehow related, the mean difference is the sum of Di/n.
The sample standard deviation is then S = sqrt(sum of squred difference from mean/n-1)
To then find the confidence interval, use dbar +- t(sample s.d/sqrt(n)).
What are independent samples?
This occurs when we have non-paired data. E.g some units only assigned treatment A, and others treatment B.
Or units in two different groups compared on some survey variable.
How to find confidence interval between two means independent and variance is known.
The confidence interval for mu(x) - my(y):
(Xbar - Ybar) +- Z(sqrt(var(popX)/n(x) + var(popY)/n(y)).
The part that comes after the Z value is the standard deviation of X and Y.
How do we find the confidence interval when the two population variances are unknown and equal?
1)If they are equal, we can calculate the ‘pooled variance’:
= [(n(x) - 1)sampleVAR(x) + (n(y) - 1)sampleVAR(y)]/(n(x) + n(y) - 2)
2) Use the t-test with n(x) + n(y) - 2 degrees of freedom.
3) compute confidence interval (Xbar -Ybar) +- t(sqrt(pooledVar/n(x) + pooledVAR/n(y))
How do we calculate the confidence interval estimation of the difference between two population proportions?
If the samples are randomly and independently drawn, the populations are large and hence can use central limit theorem (and normal distribution), and the population variances are unknown and assumed unequal:
P(p(x) - p(y) +- Z(sqrt(p(x)(1-p(x)/n(x) + p(y)(1-p(y))/n(y) = 1 - alpha
NOTE: You will where get two possible correct answers depending on which way round you set x and y. These can both be correct unless the question specifies a specific order to use.
When should you assume a sample is small?
Some say when the small sample statistics differ from the large sample statistics, and others say to consider a sample small of it has 60 or less observations.
A good rule of thumb is to play it safe and assume a sample is small if unclear.