Stats Concepts Flashcards
(40 cards)
What is a variable?
A feature of individual units within a study (e.g. people); something that we can observe or measure.
What are the four things an outcome could be?
An observation (at one moment in time - attained weight of a baby at 6 months; mortality, status (dead/alive)), a time to an event (that may or may not happen-Time to death), a count (independent of time - number of measles cases), a rate (dependent on time - No of deaths per 1000 person-years)
What is binary data?
Categorical (not numerical) data which only has 2 alternatives/options - an example would be dead/alive
What is nominal data?
Categorical (not numerical) data which has more than 2 alternatives/options but which has no natural order - classic examples are ethnic groups or blood type
What is ordinal data?
Categorical (not numerical) data which has more than 2 alternatives/options and a natural order to it; for example - hypertensive/borderline/normal/hypotensive
What is discrete data?
Numerical data (quantitative) that is a count - for example, number of measles cases
What is continuous data?
Numerical data (quantitative) where there is an infinite number of values the data can take - for example, blood pressure, weight, age
What is positively skewed data?
Data who’s frequency distribution is skewed to the right on the x axis
What is negatively skewed data?
Data who’s frequency distribution is skewed to the left on the x axis
What are some examples of continuous probability distributions?
Normal distribution, t-distribution, f-distribution, chi-squared distribution
What are some examples of discrete probability distributions?
Binomal, poisson, uniform
What is meant by the range of data?
Simply the highest score minus the lower score; it is the range of scores you would see in your sample
What is VARIANCE?
The average squared distribution from the mean
What does variance tell us?
On average, how much the scores are distributed around the mean
What is STANDARD DEVIATION?
It is the square root of the variance
What happens to the mean in the context of a skewed distribution?
It no longer gives a good impression on the central tendency of the observations
What is a more appropriate measure than the mean for skewed distributions?
When data are skewed, it is more appropriate to use the median and interquartile range (75th to 25th quartile) as your descriptive statistics. If data is positively skewed, a log transformation will also help
What is the only time the mean is appropriate?
When the data is normally distributed
What are population values?
The true values of a measure in a population. They define the population.
• Mean, μ
• standard deviation, σ
What is a sample statistic?
The value of the measure in a sample of the population. It is calculated from the observations in the sample
• Sample mean, x̄
• Sample standard deviation (SD), s
What is sampling error?
Information gained from a single sample is the “best estimate” of what is true in the population
• In truth, the sample statistic may be somewhat larger or smaller than the true population value (i.e. uncertainty)
• This is due to sampling error
What is standard error?
It is the measure of the accuracy of the sample estimate. It calculates how far from the true (but unknown) population value the sample estimate is likely to be - basically, how large an error we are likely to be making.
The standard error of the mean would be calculated as: = SD/ square root of n = s/ square root of n
What are the two main methods of statistical inference?
Hypothesis testing (significance testing) Estimation (confidence intervals)
What does low SD indicate?
Data points are close to the mean