Lecture 18, 19 & 20 Flashcards
Why is Correlation useful?
Discover relationship/possible causality
One step towards finding Causality
What is Correlation
if there is a relationship between a pair/set of values.
How can Correlations be identified visually?
Via Scatter Plot
Why is Correlation important?
Discover Relationships
Why is Correlation different to Causation?
Because data may have a similar cause such as sunglasses sales vs ice cream sales.
What is Euclidean distance?
Distance between two points x and y.
Why is Euclidean distance shit?
- Different scales of objects so numbers become arbitrary
- Can not discover similar behaviour at different scale
- Can not discover negative correlation
Advantages of Pearson’s correlation
- Range within [ -1 , 1 ]
- Scale Invariant: r(x,y)=r(x,Ky), K is real positive constant
- Location Invariant: r(x,y)=r(x,y+C), C is real positive constant
Disadvantage of Pearson’s correlation
Can not detect non linear relationships
How is Pearson’s correlation calculated?
Practice that shit
Advantages of Mutual Information?
- Range within [ 0, 1]
- Detect non linear relationships
What is Variable Discretization?
Converting from continuous to discrete values via bins
What methods of Variable Discretization are there?
Domain Knowledge, Equal-width bin, Equal frequency bin
What is Domain Knowledge Variable Discretization?
Manually assigning thresholds e.g. Speed
- 0-40km/h Slow
- 40-70km/h Medium
- 70km/h+ fast
What is Equal-width bin Variable Discretization?
Where bins have the same length