data summary Flashcards
what is quantitative data
Quantitative data measure some quantity resulting in a numerical value, e.g. weight, salary.
what is qualitative data
Qualitative data measure the quality of something resulting in a value that does not have a numerical meaning, e.g. colour, religion, season.
what is discrete quantitative data
Discrete: data with distinct values and possible values take only a distinct series of numbers (e.g. number of traffic accidents, number of children born to a women)
what is continuous quantitative data
Continuous: a value that can be measured evermore precisely and hence become essentially continuous (e.g. height, speed).
what is ordinal qualitative data
Ordinal: non-numeric value but the values have some natural ordering; e.g. poor, fair, good, excellent.
what is nominal qualitative data
Nominal: unordered, distinct by name only; e.g. retail, construction, manufacturing.
what are frequency distribution
A frequency distribution summarizes discrete variables or qualitative data by counting how often each value occurs.
what is the mode
The mode is the most frequently occurring value in a dataset
What is a bimodal distribution?
A bimodal distribution has two distinct peaks in the frequency of values.
What are the 3 measures of centre in statistics?
mode
mean
median
4 measures of spread
range
interquartile range (IQR)
sample variance
standard deviation.
Why is it important to know both the centre and spread of a dataset?
Knowing both provides a better understanding of the data’s behavior. The center gives us a “typical” value, while the spread tells us how much variability or dispersion exists in the data.
what is the population mean and sample mean
The population mean is a parameter (𝜇) which is typically unknown
we take a sample and obtain an estimate (𝜇̂), the sample mean
how to find the position of an even and odd sample median
even: (𝑛 + 2)/2
odd: (𝑛 + 1)/2
𝑛 - sample size
what is the range
The range is the difference between the maximum and minimum value.
one disadvantage of range
can be misleading if one number is different to the rest. (outlier)
what is an outlier
An outlier is a value that is very different to the other values recorded.
What are percentiles and how are they used?
Percentiles: Values that divide the dataset into 100 equal parts.
25th percentile (lower quartile or 1st quartile): 25% of data lies below it.
75th percentile (upper quartile or 3rd quartile): 75% of data lies below it.
what is the interquartile range
The difference between the 75th percentile and 25th percentile, representing the spread of the middle 50% of data.
population variance formula
𝜎² = ∑(𝑦𝑖 - 𝜇)² / 𝑁
𝑁: population size
𝑦𝑖: each value.
what does variance measure
Measures the spread of data from the population mean (𝜇).
What is sample variance and how is it different from population variance?
Measures the spread of data from the sample mean (𝜇̂).
Sample variance divides by (𝑛 - 1) instead of 𝑁 to correct for bias in estimating population variance
sample variance formula
𝑠² = ∑(𝑦𝑖 - 𝜇̂)² / (𝑛 - 1)
where n-1 is the degrees of freedom
why do we use standard deviation
unit of variance give a squared answer so we want to root them