lecture 10 imbalanced data Flashcards

1
Q

random undersampling

A

Drop data
Fast training
Loses data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

random oversampling

A

Repeat sample
Much slower

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

class weight

A

Reweight the loss function
Same effect as oversampling , not as expensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

ensemble resampling

A

Random resampling separate for each instance in an ensemble

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

edited nearest neighbors

A

Reducing dataset for knn
Remove Al samples that are mis classified by knn from training
Cleans up outliers and boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

condensed nearest neighbors

A

Add linings to the data that are mis classified by knn
Focus on the boundaries
Removes many

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

synthetic sample generator

A

Add synthetic interpolated data to smaller class
For each sample in minority class
Pick random neighbor from k neighbors
Pick point on line connecting the two uniformly
Large dataset
Combined with under sampling strategies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly