Preprocessing Flashcards

1
Q

What is the first activity you want to do when you are creating a ML algorithm?

A

Preprocessing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Preprocessing?

A

any manipulation of the dataset before running it through the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is an example of preprocessing?

A

Changing csv or xlsx to npz file

Logarithmic transformations

Standardization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the main points of Preprocessing?

A
  1. Compatibility - TF uses tensors, not csv
  2. Orders of Magnitude - standardize inputs
  3. Generalization -
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q
A

Relative metrics are especially useful when we have time-series data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some advantages of Logarithms?

A

Faster computation
Lower order of magnitude
Clearer relationships
Homogeneous Variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the most common problem when working with numerical data?

A

Orders of Magnitudes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we solve the numerical orders of magnitude challenge?

A

Standardization

also called

feature scaling
normalization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is standardization or feature scaling?

A

The process of transforming data into a standard scale

normal standard = subtract mean then divide by std dev

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is PCA?

A

Principle components analysis

dimension reduction technique used to combine several variables into a bigger (latent) variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Whitening?

A

It is often performed after PCA.

removes most of the underlying correlation

useful for when the data should be uncorrelated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two methods for encoding categorical data?

A

One-hot encoding

Binary encoding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the one big problem with one-hot encoding?

A

It requires a lot of new variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly