L3 - Classification Flashcards

(9 cards)

1
Q

Give an overview of the classification workflow

A
  1. Import Data
  2. Organise & Preprocess Data
  3. Explore Data (derive features)
  4. Build a model
  5. Evaluate the model
  6. Iterate from step 3
  7. Deploy the model
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is K-Nearest Neighbours (KNN)?

A

A model that finds known samples that are similar to the new samples and assigns the new sample to the same class.
K is the number of nearest neighbours to consider.

KNN relies on the assumption that catagories are nicely clustered making it very sensetive to the known data.
KNN requires that the predictors are either all numerical OR catagorical.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe how KNN with k=3 assigns a new data point to a class.

A

Imagine all the known data on a graph.
Plot the new data.
Find the 3 closest known data points.
Assign the class to the new data point to match the majority of the NN’s class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a Decision Tree model?

A

A model which classifies data by creating a series of Yes/No questions.
Graphically this creates bounded regions. Real data can be overfitted as the tree becomes very complex.
Pruning is used to reduce the tree but may trade off accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Naive Bayes Model?

A

A classifier model which assumes the independance of predictors within each class.
(i.e., independent and identically
distributed, i.i.d)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is “Naive” about Naive Bayes?

A

Naive Bayes assumes that all features (predictors) are independent of each other given the class label. In reality, this is rarely true—features often have some degree of correlation.

For example, in an email spam classifier:

The words “free” and “money” might both appear in spam emails.
These words are not truly independent, but Naive Bayes treats them as if they are.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a Support Vector Machine Model?

A

A model which aims to find the optimal hyperplane in an N-dimensional space to separate data points into different classes. The algorithm maximizes the margin between the closest points of different classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can SVM be used when data is non-linearly seperable?

A

Yes by using a kernal: A function that maps data to a higher-dimensional space, enabling SVM to handle non-linearly separable data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Can SVM be used for multiclass data?

A

Yes using an Error-correcting output codes classifier ECOC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly