How to Make Distributions More Gaussian Flashcards

Question 1

Q

WHEN ARE THE TRANSFORMATIONS FOR MAKING A DISTRIBUTION MORE GAUSSIAN, MOST EFFECTIVE? P273

Answer

A

These transforms are most effective when the data distribution is nearly-Gaussian to begin with and is afflicted with a skew or outliers.

Question 2

Q

WHY IS IT BETTER TO HAVE GAUSSIAN DISTRIBUTION FOR REAL-VALUED FEATURES? P273

Answer

A

Some algorithms like linear regression and logistic regression explicitly assume the real-valued variables have a Gaussian distribution. Other nonlinear algorithms may not have this assumption, yet often perform better when variables have a Gaussian distribution

Question 3

Q

WHAT ARE POWER TRANSFORMERS? P273

Answer

A

Power transforms refer to a class of techniques that use a power function (like a logarithm or exponent) to make the probability distribution of a variable Gaussian or more Gaussian like.

Question 4

Q

WHAT HAPPENS IF WE SET STANDARDIZE PARAMETER OF POWER TRANSFORMER IN SCIKIT-LEARN TO FALSE? P274

Answer

A

It won’t standardize data, meaning that they won’t have mean of 0 and STD of 1.

Question 5

Q

WHAT CAN WE DO IF THERE ARE 0 OR NEGATIVE VALUES IN DATA, TO BE ABLE TO USE BOX COX TRANSFORMATION? 280

Answer

A

We can use a MinMaxScaler to scale the data to strictly positive values first and then use boxcox

Question 6

Q

CAN USING YEO-JOHNSON AND SCALER+BOX-COX RESULT IN DIFFERENT SCORES? P285

Question 7

Q

WHAT CAN CAUSE A LIFT IN POWER YEO-JOHNSON’S PERFORMANCE SOMETIMES? P285

Answer

A

Sometimes a lift in performance can be achieved by first standardizing the raw dataset prior to performing a Yeo-Johnson transform.

How to Make Distributions More Gaussian Flashcards

(7 cards)