How to Transform Numerical to Categorical Data: Suitable For Highly Skewed or Non-Standard Distribution Flashcards

1
Q

WHAT DO DISCRETIZATION TRANSFORMS DO? P303

A

Discretization transforms are a technique for transforming numerical input or output variables to have:
Discrete ordinal labels
Different data distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

WHICH LIBRARY DO WE USE FOR CHANGING THE STRUCTURE AND DISTRIBUTION OF NUMERIC VARIABLES TO CATEGORICAL TO IMPROVE THE PERFORMANCE OF PREDICTIVE MODELS? P303

A

KBinsDiscretizer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

WHAT ARE 3 COMMON METHODS WE CAN USE FOR GROUPING VALUES INTO K DISCRETE BINS? P305

A

ˆ Uniform: All bins in each feature have identical widths.
ˆ Quantile: All bins in each feature have the approximately same number of points.
ˆ Kmeans: Clusters are identified and examples are assigned to each group.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

WHAT ARE THE VALUES WE CAN USE FOR ‘STRATEGY’ PARAMETER OF KBINSDISCRETIZER? P305

A

Uniform, quantile, kmeans.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHAT DOES N_BINS PARAMETER MEAN, IN KBINSDISCRETIZER BASED ON WHAT IS IT SET? P305

A

It controls the number of bins that will be created.
It must be set based on the choice of strategy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

HOW IS THE N_BINS CHOSEN FOR DIFFERENT STRATEGIES IN KBINSDISCRETIZER? P305

A

Uniform: flexible
Quantile: n_bins less than the number of observations or sensible percentile.
K-means: a value for the number of clusters that can be reasonably found

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

WHAT DOES ENCODE ARGUMENT CONTROLS IN KBINSDISCRETIZER? P305

A

It controls whether the transform will map each values to an integer value by setting “ordinal” or a one hot encoding “onehot”.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

WHICH METHOD OF ENCODING IS PREFERRED IN KBINSDISCRETIZER? P305

A

Ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

WHEN DO WE USE ONEHOT, FOR ENCODE PARAMETER IN KBINSDISCRETIZER? P305

A

For example in the case of k-means clustering strategy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

WHAT SORT OF RELATIONSHIPS CAN A MODEL LEARN WHEN WE USE ONEHOT ENCODING? P305

A

Non-ordinal relationships

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

WHICH RANGE IS SUITABLE FOR K-MEANS STRATEGY’S N_BINS IN KBINSDISCRETIZER? P312

A

3-5, unless the empirical distribution of the variable is complex.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

DOES UNIFORM DISCRETIZATION TRANSFORM CHANGE THE PROBABILITY DISTRIBUTION? P309

A

No, Me: because the bin widths are all the same, it’s like creating a type of histogram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

WHAT RANGE OF NUMBER IS SUITABLE FOR N_BINS OF QUANTILE STRATEGY IN KBINSDISCRETIZATION TRANSFORM? P315

A

5-10, unless there are a large number of observations or a complex empirical distribution, this number should be kept small.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly