How to Scale Numerical Data Flashcards

1
Q

WHICH TYPE OF ALGORITHMS BENEFIT FROM SCALING NUMERICAL VALUES TO A STANDARD RAGE? P230

A

Algorithms that use a weighted sum of the input (e.g. Linear Regression, ANN)
Algorithms that use distance measures (e.g. K-nearest neighbors, SVM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

WHAT ARE THE TWO MOST POPULAR TECHNIQUES FOR SCALING NUMERICAL DATA? P230

A

Normalization

Standardization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

WHAT DOES NORMALIZATION DO? P230

A

Scales each input variable separately to the range 0-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

WHY DO WE CHOOSE THE RANGE 0-1 FOR NORMALIZING? P230

A

Floating-point values have the most precision in this range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHAT DOES STANDARDIZATION DO? P230

A

Scales each input variable separately by subtracting the mean and dividing by STD so that the distribution has a mean of 0 and STD of 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

FOR WHICH MODEL IS IT A CRITICAL STEP TO SCALE THE TARGET? P231

A

For regression predictive modeling problems, to make it easier to learn; most notably in the case of neural network models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WHY DO WE NEED TO KNOW THE RANGE OF VALUES, BEFORE NORMALIZING? P232

A

Because we need all the values to be between 0 and 1, we divide all values by the maximum amount, or we use the range of change (maximum value-minimum value).
Normalization formula: y = x − min /max − min

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

WHAT SHOULD WE DO IF THERE’S AN OBSERVATION HIGHER OR LOWER THAN THE MAX – MIN VALUES? P232

A

Either remove them or limit them to the pre-defined maximum and minimum values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

USING WHICH CLASS CAN WE NORMALIZE DATA IN SCIKIT-LEARN? P232

A

MinMaxScaler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

USING WHICH METHOD OF MINMAXSCALER CAN WE REVERSE THE TRANSFORMATION? P233

A

Inverse_transform ()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

WHAT IS THE ASSUMPTION OF STANDARDIZATION? P234

A

That the observations fit a Gaussian distribution, with a well-behaved mean and STD.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

WHAT’S ANOTHER NAME FOR STANDARDIZATION? P234

A

Center scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

WHAT IS SCIKIT-LEARN’S CLASS FOR STANDARDIZATION? P234

A

StandardScaler

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

WHEN IS IT BETTER TO NORMALIZE AND WHEN IS IT BETTER TO STANDARDIZE? P244

A

Whether input variables require scaling depends on the specifics of your problem and of each variable. You may have a sequence of quantities as inputs, such as prices or temperatures. If the distribution of the quantity is normal, then it should be standardized, otherwise, the data should be normalized. This applies if the range of quantity values is large (10s, 100s, etc.) or small (0.01, 0.0001).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly