Statistical Learning Methods Flashcards
mean
arithmetic mean
-> greek letter “mu”
median
The median of a set of data is the middlemost number in the set. The median is also the number that is halfway into the set. To find the median, the data should first be arranged in order from least to greatest. If there is an even number of items in the data set, then the median is found by taking the mean (average) of the two middlemost numbers.
standard deviation
Deviation just means how far from the normal
- is just the square root of Variance
- while Variance gives you a rough idea of spread, the standard deviation is more concrete, giving you exact distances from the mean
- The standard deviation is an especially useful measure of variability when the distribution is normal or approximately normal because the proportion of the distribution within a given number of standard deviations from the mean can be calculated. For example, 68% of the distribution is within one standard deviation of the mean and approximately 95% of the distribution is within two standard deviations of the mean. herefore, if you had a normal distribution with a mean of 50 and a standard deviation of 10, then 68% of the distribution would be between 50 - 10 = 40 and 50 +10 =60. Similarly, about 95% of the distribution would be between 50 - 2 x 10 = 30 and 50 + 2 x 10 = 70.
=> a measure of the spread of the values around the mean
Sample
a selection taken from a bigger Population
normal distribution
It is a continuous, bell-shaped distribution (single peak)
which is symmetric about its mean and can take on values from negative infinity to positive infinity
Each normal curve is characterized by two parameters (is completely described by):
- the mean
- the standard deviation (its symbol is the greek letter sigma)
variance
measures how far a data set is spread out.
The technical definition is “The average of the squared differences from the mean,” but all it really does is to give you a very general idea of the spread of your data.
A value of zero means that there is no variability: All the numbers in the data set are the same.
population
a sample is a part of a population. A population is a whole, it’s every member of a group.
A population is the opposite to a sample, which is a fraction or percentage of a group. Sometimes it’s possible to survey every member of a group. A classic example is the U.S. Census, where it’s the law that you have to respond. If you do manage to survey everyone, it actually is called a census: The U.S. Census is just one example of a census.
In most cases, it’s impractical to survey everyone. In addition, sometimes people either don’t want to respond or forget to respond, leading to incomplete censuses. Incomplete censuses become samples by definition.
p-value
probability value: probability to get the sample result if the null hypothesis is assumed to be true -> if very small, then H_0 should most likely be rejected
test statistic
s
Gaussian distribution
= Normal distribution = bell curve
simple linear regression
s
predict quantitative values
least squares method
s
estimate coefficients β_0, β_1, …, β_p st RSS is minimized (i.e. has the smallest possible value)
intercept
s
Linear models
Linear models describe a continuous response variable as a function of one or more predictor variables. They can help you understand and predict the behavior of complex systems or analyze experimental, financial, and biological data. Linear regression is a statistical method used to create a linear model.
residuals
s
qualitative response
s
quantitative response
s
regression
s
classification
s
prediction
s
inference
- when inference is the goal, there are clear advantages to using simple and relatively inflexible statistical learning methods
- In some settings, however, we are only interested in prediction, and the interpretability of the predictive model is simply not of interest
- if we seek to develop an algorithm to predict the price of a stock, our sole requirement for the algorithm is that it predict accurately—interpretability is not a concern
prediction vs inference
s
parametric
reduces the problem of estimating function f down to one of estimating a set of parameters
overfitting (the data)
fitting a too flexible model can lead to overfitting the data, i.e. the model follows the errors, or noise, too closely
- As model flexibility increases, training MSE will decrease, but the test MSE may not. When a given method yields a small training MSE but a large test MSE, we are said to be overfitting the data.
- This happens because our statistical learning
procedure is working too hard to find patterns in the training data, and may be picking up some patterns that are just caused by random chance rather than by true properties of the unknown function f. When we overfit the training data, the test MSE will be very large because the supposed patterns that the method found in the training data simply don’t exist in the test data. Note that regardless of whether or not overfitting has
occurred, we almost always expect the training MSE to be smaller than the test MSE because most statistical learning methods either directly or indirectly seek to minimize the training MSE. Overfitting refers specifically
to the case in which a less flexible model would have yielded a smaller test MSE.