QM Flashcards
(15 cards)
Sampling distribution
A sampling distribution is the probability distribution of a given statistic (such as the
sample mean or sample proportion) based on random sampling from a population.
Central Limit Theorem
The Central Limit Theorem (CLT) states that, for a large enough sample size, the
distribution of the sample mean will be approximately normal (or t-distributed), regardless
of the shape of the population distribution.
Power of a Test
The power of a hypothesis test is the probability of correctly rejecting a false null hypothesis.
In other words, it is the probability of avoiding a Type II error.
If correlation coefficient is close to -1
strong negative relationship
If correlation coefficient is close to +1
strong positive relationship (random variables moving in the same direction away from the mean.
If correlation coefficient is close to 0
no relationship
If random variables are independent X and Y the covariances and correlations…
are 0!
Why correlation coefficient might be preferred over covariance
Covariance is dependent on the units of the variables.
The correlation coefficient is standardised and ranges between -1 and 1.
It provides unit-free measurement.
Correlation allows for a more meaningful interpretation and comparison of the strength and direction of the relationship.
Ad & Disad of using the mean as a measure of central location
The mean is a useful measure of central tendency because it takes into account all observations in a dataset.
The mean is sensitive to extreme values or outliers, meaning a few very low or very high prices can significantly affect it.
Ad & Disad of using the median as a measure of central location
The median is the middle value in a
dataset (ordered data).
The median has the advantage of being less affected by outliers and skewed data.
However, the median ignores most of the data points, as it only considers themiddle value in the distribution.
It disregards the distribution of data.
IQR
measures the spread of the “middle 50%” of data. It’s the difference between the third quartile and the first quartile. It captures the dispersion while mitigating the impact of outliers, thus providing a more robust measure of variability.
Variance
Measures deviation of data from the mean value.
Variance measures the average of the squared differences between each observation and the mean. Variance quantifies the overall spread of the data.
Because it uses squared differences, it is in squared units of the original data, making it less interpretable in terms of the original units.
Coefficient of Variation
The ratio of the standard deviation to the mean.
It standardises the measure of dispersion relative to the mean, allowing comparisons across different variables when the unit of measurement is different. CV is also sensitive to outliers.
Standard deviation
Square root of the variance.
Indicates how data typically deviates from its mean. It provides a measure of variability in the same units as the original data. The lower the standard deviation the lower the spread of data.
Coefficient of determination
Measure the goodness of fit (equal to R^2).
R^2 is a proportion, i.e. 0 <= R^2 <= 1
R^2 = 0 Regression explains none of the variation in Yi
R^2 = 1 Regression explains all the variation in Yi
Thus we want R^2 to be higher. The factor determining prices more is the one with the higher R^2.