Data Analytics Theory Flashcards
(333 cards)
Which of the mean, mode and median are resistant to outliers?
The mean is very sensitive to the presence of outliers. The median and mode are very resistant to outliers.
True or false - the median is calculated differently depending on if there is an even or odd number in the sample?
True
What are the steps for determining the mean?
The sum of all sample values (Xi) divided by the number of samples.
What are the steps for determining the median?
Order the sample values in ascending order. For odd total n the median is found at (n+1)/2. For even total n, the median is the average of the value at n/2 and (n+2)/2.
What are the steps for determining the mode?
Creating a frequency table and observing the highest frequency. Then observe which value this is for.
What is the mode?
The observation that occurs most frequently in the dataset.
Why should we not look at measures of centrality in isolation?
Comparing the measures of centrality between datasets may indicate that they are similar when in reality they have different amounts of dispersion.
What are the measures of centrality?
Mean, mode, median
What do measures of variability describe?
Measures of variability describe how dispersed observations in the univariate dataset are. They describe whether observations are tightly clustered or spread out.
What are the measures of variability?
Variance and standard deviation. Range (though very sensitive to outliers). Five number summary provides basic information about variability.
What are synonyms of the mean?
Arithmetic mean or average
What is the formula for calculating the mean?
[See flashcard]
What is the mean?
The mean is considered to be the central (typical) measurement of a collection of observations.
What is the formula for calculating the standard deviation?
[See flashcard]
What is the formula for calculating the variance?
[See flashcard]
What is the variance?
The average squared distance of each observation from the mean. Measured in units squared.
What units is the variance measured in?
Units squared
What is the standard deviation?
The square root of the variance - it is useful to consider how close the observations are from the mean. Measured in the same units/same scale as the observations in the numerical variable.
What units is the standard deviation measured in?
Same units/scale as the observations in the numerical variable.
How much of the data is usually within one standard deviation from the mean?
68%
How much of the data is usually within two standard deviations from the mean?
95%
What are order statistics?
Statistics based on sorted (ranked) data
Define a quantile.
The value computed from a sorted collection of numerical measurements (in ascending order) that indicates an observation’s rank when compared to all other present observations. It can take a value between 0 and 1.
What does the 0.5th quantile mean?
This is the median value, below which half (50%) of the measurements lie.