Descriptive Statistics Flashcards
What is the mean?
The average of a set of numbers.
How do you calculate the median?
Divide the ordered dataset into two halves; if the number of observations is odd, the middle number is the median, if even, it is the average of the two middle numbers.
What does the mode represent in a dataset?
The most frequently occurring value in a dataset.
Define the range in statistics.
The difference between the highest and lowest values.
How is variance calculated in a dataset?
The average of the squared differences from the Mean.
Explain how standard deviation is used in data analysis.
It measures the amount of variation or dispersion of a set of values.
What does a high variance indicate about a dataset?
It suggests a wider spread of data points in the dataset.
How do you find the interquartile range?
The difference between the 75th and 25th percentiles.
What is a boxplot and what does it show?
A graphical representation of the distribution of data points.
Why is it important to know the shape of the distribution?
It provides insights into the symmetry and spread of data.
What is skewness in statistical terms?
A measure of how much data deviates from being symmetrical.
Explain kurtosis in a dataset.
A measure of the “tailedness” of the probability distribution.
How does one identify outliers in data?
By identifying data points that significantly differ from other observations.
What is a frequency distribution?
The organization of data by the frequency of their values.
How can a histogram help in understanding data?
It visually shows the distribution of data.
What is a scatter plot used for?
To display values involving two variables.
How do quartiles divide a dataset?
They divide the dataset into four equal parts.
What is the difference between absolute deviation and mean deviation?
Absolute deviation is the absolute differences, mean deviation is the average of these absolute differences.
How do you calculate a percentile rank?
The position of a value in a dataset as a percentage of the total number of data points.
What is a cumulative frequency distribution?
The sum of relative frequencies up to a certain point in a dataset.
Explain the concept of a relative frequency distribution.
It shows the proportion of each class relative to the total number of cases.
What role does the mean play in symmetrical distributions?
It represents the balance point of the distribution.
What is the best measure of central tendency for skewed data?
Median.
Why might one use the median instead of the mean?
It is less affected by outliers and skewed data.