Lecture 18, 19 & 20 Flashcards by Mariam Shahid

Why is Correlation useful?

Discover relationship/possible causality

One step towards finding Causality

How well did you know this?

Not at all

Perfectly

What is Correlation

if there is a relationship between a pair/set of values.

How well did you know this?

Not at all

Perfectly

How can Correlations be identified visually?

Via Scatter Plot

How well did you know this?

Not at all

Perfectly

Why is Correlation important?

Discover Relationships

How well did you know this?

Not at all

Perfectly

Why is Correlation different to Causation?

Because data may have a similar cause such as sunglasses sales vs ice cream sales.

How well did you know this?

Not at all

Perfectly

What is Euclidean distance?

Distance between two points x and y.

How well did you know this?

Not at all

Perfectly

Why is Euclidean distance shit?

Different scales of objects so numbers become arbitrary
Can not discover similar behaviour at different scale
Can not discover negative correlation

How well did you know this?

Not at all

Perfectly

Advantages of Pearson’s correlation

Range within [ -1 , 1 ]
Scale Invariant: r(x,y)=r(x,Ky), K is real positive constant
Location Invariant: r(x,y)=r(x,y+C), C is real positive constant

How well did you know this?

Not at all

Perfectly

Disadvantage of Pearson’s correlation

Can not detect non linear relationships

How well did you know this?

Not at all

Perfectly

How is Pearson’s correlation calculated?

Practice that shit

How well did you know this?

Not at all

Perfectly

Advantages of Mutual Information?

Range within [ 0, 1]

- Detect non linear relationships

How well did you know this?

Not at all

Perfectly

What is Variable Discretization?

Converting from continuous to discrete values via bins

How well did you know this?

Not at all

Perfectly

What methods of Variable Discretization are there?

Domain Knowledge, Equal-width bin, Equal frequency bin

How well did you know this?

Not at all

Perfectly

What is Domain Knowledge Variable Discretization?

Manually assigning thresholds e.g. Speed

0-40km/h Slow
40-70km/h Medium
70km/h+ fast

How well did you know this?

Not at all

Perfectly

What is Equal-width bin Variable Discretization?

Where bins have the same length

How well did you know this?

Not at all

Perfectly

What is Equal Frequency Variable Discretization?

Study These Flashcards

Where bins have the same number of points.

What is Entropy?

Study These Flashcards

A measure of the Information Content

What is Classification

Study These Flashcards

Given a training data set find a model for classifying attributes as a function of values of other attributes.

What is Goal of Classification?

Study These Flashcards

To provide previously unseen data and assign a class to it.

What is Regression?

Study These Flashcards

Given a training data set learn a predictive model for the data.

What is required by K Nearest neighbour Classifier?

Study These Flashcards

Set of records
Metric to compute distance between records
The value of k, i.e. the number of neighbours to retrieve.

What is the methodology of K Nearest neighbour Classifier?

Study These Flashcards

Compute distance to other training records (e.g. euclidean distance/ possibly with weights)
Identify k nearest neighbours
Use classes of neighbours to determine the class of the unknown record

Problems with K Nearest Neighbour?

Study These Flashcards

K needs to be selected carefully

- Large number of points add storage cost and search cost

How do you calculate accuracy?

Study These Flashcards

(TP+TN)/(TP+TN+FP+FN)

How do decision trees work?

Seriously you need me to answer that ಠ_ಠ

Problems with decision trees?

- Determining how to split values | - Determining when to stop splitting

How do you specify test condition for a tree?

Depends on attribute types and number of splits

How to determine the best split?

Nodes with homogeneous class distribution with a low level of impurity

How is Entropy used to calculate impurity?

Entropy formula, 0 when all belong to one class

How to determine how good is a split?

(Formula from slides) Compare impurity of parent node before split and after split

Lecture 18, 19 & 20 Flashcards

(30 cards)