Class Imbalance and Dimensionality Reduction Flashcards

1
Q

What is the advantage of Leaky ReLU?

A

Allows small negative values to avoid ‘dead neurons’

Leaky ReLU modifies the traditional ReLU to allow a small, non-zero gradient when the unit is not active.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the sigmoid function do?

A

Maps to (0, 1); good for binary classification

The sigmoid function is often used in logistic regression and binary classification tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the range of the tanh function?

A

Maps to (-1, 1)

The tanh function is symmetric around the origin, which can be beneficial for learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the purpose of optimisers in training?

A

Update weights to minimise loss

Optimisers adjust the weights of the model to improve performance based on the loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does SGD stand for?

A

Simple gradient descent

SGD is a common optimisation algorithm used in machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does Adam optimiser combine?

A

Momentum + RMSprop

Adam is an adaptive learning rate optimisation algorithm that is popular due to its efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is visualising CNNs important?

A

See what filters are learning, Debug issues, Understand model behaviour

Visualisation helps in interpreting the features learned by convolutional neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is one technique for visualising CNNs?

A

Feature map visualisation

This technique helps in understanding which features are being activated by certain inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Fill in the blank: Data augmentation helps fight overfitting by training on ‘______’ versions of your data.

A

new

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is class imbalance?

A

Class imbalance = When one class has way more examples than another.

This can lead to biased model predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is an example of class imbalance?

A

Negative samples: 998, Positive samples: 2.

Such a scenario can heavily skew model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What accuracy could a model achieve by always predicting ‘Negative’ in a class imbalance scenario?

A

99.8% accuracy.

This illustrates how misleading accuracy can be in imbalanced datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is class imbalance a problem?

A

Model ignores the minority class.

This can lead to poor performance in predicting the minority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the impact of class imbalance on model predictions?

A

Biased boundaries = Bad predictions.

Class imbalance can result in a model that is biased towards the majority class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In which areas is class imbalance especially problematic?

A

Medical diagnoses, Fraud detection, Rare event prediction

These domains often involve critical decisions based on minority classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Binary Cross-Entropy Loss (BCE)?

A

L_BCE = - y_i * log(y_i) - (1 - y_i) * log(1 - y_i).

BCE is commonly used for binary classification tasks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does class imbalance affect the Binary Cross-Entropy Loss?

A

Majority class dominates the loss function.

This can lead to suboptimal learning for the minority class.

18
Q

What metric does the model optimize for in the presence of class imbalance?

A

Overall accuracy, not fair balance.

This can result in a model that performs well overall but poorly on minority classes.

19
Q

What is a solution to class imbalance in model training?

A

Weighted Loss Functions.

These functions adjust the loss to give more importance to the minority class.

20
Q

What is Weighted Binary Cross Entropy?

A

Assign higher importance (weight) to the minority class.

This helps to mitigate the effects of class imbalance.

21
Q

Provide a Keras example for setting class weights.

A

class_weights = {0: 1.0, 1: 5.0}.

This example assigns a weight of 5 to the minority class.

22
Q

What is Weighted Categorical Cross Entropy used for?

A

For multi-class problems.

This is an extension of weighted binary cross-entropy for multiple classes.

23
Q

What is one strategy for fixing imbalanced data?

A

Collect more data.

Increasing the number of examples for the minority class can help balance the dataset.

24
Q

What is oversampling?

A

Duplicate minority class samples.

This can help to balance the dataset by increasing the representation of the minority class.

25
What is undersampling?
Remove majority class samples. ## Footnote This technique reduces the number of examples in the majority class to balance the dataset.
26
What is data augmentation?
Make more diverse samples for the minority. ## Footnote This technique generates new training examples by modifying existing ones.
27
What does SMOTE stand for?
Synthetic Minority Over-sampling Technique. ## Footnote SMOTE is a popular method for generating synthetic samples in the minority class.
28
What are the steps involved in SMOTE?
Pick a minority sample, Find its k-nearest neighbors, Interpolate a new sample between it and a neighbor. ## Footnote This process helps to create new, synthetic examples of the minority class.
29
What is an analogy for the SMOTE process?
Draw a line between two known dots and place a new dot somewhere along it. ## Footnote This visualization helps to understand how SMOTE generates new samples.
30
What is the limitation of SMOTE?
Great for structured datasets – not ideal for raw images. ## Footnote SMOTE's effectiveness can vary based on the type of data being used.
31
What is the purpose of data augmentation for images?
Trick your model into seeing new examples by tweaking real ones. ## Footnote This technique helps improve model robustness and generalization.
32
Provide a Keras example for data augmentation.
datagen = ImageDataGenerator(rotation_range=40, width_shift_range=0.2, height_shift_range=0.2, shear_range=0.2, zoom_range=0.2, horizontal_flip=True, fill_mode='nearest'). ## Footnote This example shows how to create a data generator for augmenting image data.
33
What is a warning regarding data augmentation and SMOTE?
Don’t augment or SMOTE your whole dataset before splitting into train/test. ## Footnote This can lead to data leakage and biased evaluations.
34
What set should you apply balancing/augmentation to?
Only on the training set. ## Footnote This ensures that the model is trained on augmented data without contaminating the test set.
35
Why is accuracy misleading in the presence of class imbalance?
Accuracy does not reflect the performance on the minority class. ## Footnote High accuracy can be achieved by favoring the majority class.
36
What are better metrics to use when class imbalance exists?
Recall, Precision, Accuracy ## Footnote These metrics provide a more nuanced view of model performance on imbalanced datasets.
37
What does Recall measure?
TP / (TP + FN). ## Footnote Recall indicates the ability of a model to find all relevant cases.
38
What does Precision measure?
TP / (TP + FP). ## Footnote Precision reflects the accuracy of positive predictions made by the model.
39
What does Accuracy measure?
(TP + TN) / total. ## Footnote Accuracy gives the overall proportion of correct predictions.
40
What is the ideal combination of metrics for a model?
High recall + high precision. ## Footnote This combination indicates that the model is effectively identifying positive cases while minimizing false positives.