Week 3 Flashcards
(11 cards)
Extraneous information can do what to a model
complicates the model
harder to correctly interpret the solution
Outlier
data point very different than the rest
contextual outlier
outier that relies on the context of the points
collective outlier
something is missing in a range of points
Plot for finding ourliers
box and whisker plot
Box and whisker plot
Top and bottom are the 25 and 75th percentile of the values and horizontal line is the 50th percentile. The vertical line up and down are called the whiskers. Points outside range are outliers
CUSUM Equation
St = max{0, S_t- (S_t - mu - C) }
If S_t > T
WHen deciding what to do with outliers,
it is important to understand how they ended up in your dataset. Is it an error or a real data point. If it is an error, you may want to remove the data point
What things do you need to look at when assessing an outlier
1) Where did the data come from
2) How was the data compiled?
3) The situation in which the data was compiled
CUSUM from high level
Determines whether the mean of the observed distribution has gone gone beyond a critical level. It can detect when a process gets to a higher level than before, a lower level, or both