L3 - Classification Flashcards

Question 1

Q

Give an overview of the classification workflow

Answer

A

Import Data
Organise & Preprocess Data
Explore Data (derive features)
Build a model
Evaluate the model
Iterate from step 3
Deploy the model

Question 2

Q

What is K-Nearest Neighbours (KNN)?

Answer

A

A model that finds known samples that are similar to the new samples and assigns the new sample to the same class.
K is the number of nearest neighbours to consider.

KNN relies on the assumption that catagories are nicely clustered making it very sensetive to the known data.
KNN requires that the predictors are either all numerical OR catagorical.

Question 3

Q

Describe how KNN with k=3 assigns a new data point to a class.

Answer

A

Imagine all the known data on a graph.
Plot the new data.
Find the 3 closest known data points.
Assign the class to the new data point to match the majority of the NN’s class.

Question 4

Q

What is a Decision Tree model?

Answer

A

A model which classifies data by creating a series of Yes/No questions.
Graphically this creates bounded regions. Real data can be overfitted as the tree becomes very complex.
Pruning is used to reduce the tree but may trade off accuracy.

Question 5

Q

What is a Naive Bayes Model?

Answer

A

A classifier model which assumes the independance of predictors within each class.
(i.e., independent and identically
distributed, i.i.d)

Question 6

Q

What is “Naive” about Naive Bayes?

Answer

A

Naive Bayes assumes that all features (predictors) are independent of each other given the class label. In reality, this is rarely true—features often have some degree of correlation.

For example, in an email spam classifier:

The words “free” and “money” might both appear in spam emails.
These words are not truly independent, but Naive Bayes treats them as if they are.

Question 7

Q

What is a Support Vector Machine Model?

Answer

A

A model which aims to find the optimal hyperplane in an N-dimensional space to separate data points into different classes. The algorithm maximizes the margin between the closest points of different classes.

Question 8

Q

Can SVM be used when data is non-linearly seperable?

Answer

A

Yes by using a kernal: A function that maps data to a higher-dimensional space, enabling SVM to handle non-linearly separable data.

Question 9

Q

Can SVM be used for multiclass data?

Answer

A

Yes using an Error-correcting output codes classifier ECOC

L3 - Classification Flashcards

(9 cards)