Midterm 1 Material Flashcards
(32 cards)
What is the mean?
- Mean or average
- We calculate the average by adding all the numbers together and dividing it by the number of values.
- It can be represented as ȳ = mean of a sample
What does “n” signify?
n signifies sample size
What does yi rep?
yi represents one observation in the sample
What is the median?
Median is the midpoint of the ordered sample
- Median = (N + 1) / 2
What is mode?
Value that occurs most frequently in the distribution.
What is distribution?
Distribution is just a way of organizing and showing data. Think of it as a list or picture that shows where all the data points are and how often they appear.
- It’s like grouping similar data points together to see patterns.
- Unimodal (having one mode)
-Bimodal (having 2 modes)
-Multimodal (having 3 modes)
What is a deviation?
- Difference between an observation and the mean (i.e. how far an observation ‘falls’ from the mean of the population or the sample)
To calculate deviation;
(yi)= yi - ȳ
What is a standard deviation
Typical (average) deviation from the
mean for an observation in the set
(population or sample).
- σ means population standard
deviation (‘sigma’)
To calculate population standard deviation:
σ = √[Σ(yi – μ)2 / (N – 1)]
What is the Empirical Rule
If the distribution is normal (symmetrical) bell-shaped
- 68% of observations fall between the mean and one standard deviation on either side.
- 95% of observations fall between the mean and two standard deviations on either side.
- Over 99% of observations fall between the mean and three standard deviations on either side.
What is Quartile
Measure of data dispersion breaking down the distribution in four ordered segments.
- When ordered in ascendance, the first 25% of the data distribution comprise the lower quartile, whereas the first 75% of the distribution comprise the upper quartile. Quartiles are typically ordered as
follows: - Min: 0% of the data
- Q1: First 25% of the data
- Med: First 50% of the data
- Q3: First 75% of the data
- Max: Fully 100% of the data
Data
Factual information about things that
we observe
Population
Total set of subjects of interest in a
study
Sample
Subset of the population on which the
study collects data
Parameter
Numerical summary of the population
Statistic
Numerical summary of the sample data
Descriptive statistics:
Statistics summarizing (outlining)
sample or population data
Inferential statistics:
Statistics making predictions about population parameters based on sample
data
Variable
Characteristic that can vary in value
among subjects in a sample or a population.
Discrete variable:
Variable taking the form of a set of
separate numbers, such as 0, 1, 2, 3.
Continuous variable
Variable that can take an infinite continuum of real number values
Random sampling
Drawing a sample of n subjects who each
have the same probability of being drawn
What are the three main ways data can be biased; sampling error?
Sample bias
*Response bias (e.g. under /over-reporting)
*Non-response bias
Sampling bias
occurs from using nonprobability samples, such as the selection
bias inherent in volunteer samples.
Response bias
occurs when the subject gives an incorrect response (perhaps
lying), or the question wording or the way the interviewer asks the questions is
confusing or misleading