How to Make Distributions More Gaussian Flashcards

1
Q

WHEN ARE THE TRANSFORMATIONS FOR MAKING A DISTRIBUTION MORE GAUSSIAN, MOST EFFECTIVE? P273

A

These transforms are most effective when the data distribution is nearly-Gaussian to begin with and is afflicted with a skew or outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

WHY IS IT BETTER TO HAVE GAUSSIAN DISTRIBUTION FOR REAL-VALUED FEATURES? P273

A

Some algorithms like linear regression and logistic regression explicitly assume the real-valued variables have a Gaussian distribution. Other nonlinear algorithms may not have this assumption, yet often perform better when variables have a Gaussian distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

WHAT ARE POWER TRANSFORMERS? P273

A

Power transforms refer to a class of techniques that use a power function (like a logarithm or exponent) to make the probability distribution of a variable Gaussian or more Gaussian like.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

WHAT HAPPENS IF WE SET STANDARDIZE PARAMETER OF POWER TRANSFORMER IN SCIKIT-LEARN TO FALSE? P274

A

It won’t standardize data, meaning that they won’t have mean of 0 and STD of 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHAT CAN WE DO IF THERE ARE 0 OR NEGATIVE VALUES IN DATA, TO BE ABLE TO USE BOX COX TRANSFORMATION? 280

A

We can use a MinMaxScaler to scale the data to strictly positive values first and then use boxcox

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

CAN USING YEO-JOHNSON AND SCALER+BOX-COX RESULT IN DIFFERENT SCORES? P285

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WHAT CAN CAUSE A LIFT IN POWER YEO-JOHNSON’S PERFORMANCE SOMETIMES? P285

A

Sometimes a lift in performance can be achieved by first standardizing the raw dataset prior to performing a Yeo-Johnson transform.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly