How to Mark and Remove Missing Data Flashcards

1
Q

WHAT IS THE INDICATOR FOR MISSING VALUES WHEN USING “.DESCRIBE()”? WHAT COMMAND CAN WE USE TO FIX IT? P85,86

A

Missing values are frequently indicated by out-of-range entries, exp: Value of 0 for somethings that can’t have 0 as a value, indicates missing values.
We can use dataset[cols].replace(0,nan).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

CAN WE COUNT MISSING VALUES AS A CATEGORY IN DISCRETE (CATEGORICAL) FEATURES? P86

A

Yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

WHAT IS STATISTICAL IMPUTATION? P92

A

Calculating a statistical value for each column (such as mean) and replacing all the missing values with that statistic.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

FOR IMPUTATION USING KNN, WHAT PARAMETERS SHOULD WE SELECT? P104

A

1-Distance measure
2-Number of contributing neighbors for each prediction (k parameter)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

WHAT IS “NA_VALUES” PARAMETER IN PD.READ_CSV? P105

A

We can choose the character which represents the missing values in the dataset and this will replace it with NaN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

WHAT IS THE LIBRARY IN SCIKIT-LEARN THAT SUPPORTS KNN IMPUTATION? P106

A

KNNImputer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

HOW DOES ITERATIVE IMPUTATION WORK? P115

A

Iterative imputation is a process where each feature is modeled as a function of the other features, e.g. a regression problem where missing values are predicted. Each feature is imputed sequentially, one after the other, allowing prior imputed values to be used as part of a model in predicting subsequent features. It’s iterative because this process is repeated multiple times, allowing ever improved estimates of missing values to be calculated as missing values across all features are estimated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

WHAT ARE THE OTHER NAMES FOR ITERATIVE IMPUTATION APPROACH? P115

A

Fully Conditional Specification (FCS)
Multivariate Imputation by Chained Equations (MICE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

WHAT NUMBER OF ITERATION IS USUALLY ENOUGH FOR ITERATIVE IMPUTATION? P115

A

Low number, 10-20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

WHAT KIND OF ALGORITHMS ARE OFTEN USED FOR ITERATIVE IMPUTATION? P115

A

Linear models, for their simplicity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

IN WHICH ORDERS CAN WE PROCESS THE FEATURES WITH MISSING VALUES? P115

A

The order that features are processed sequentially can be considered, such as from the feature with the least missing values to the feature with the most missing values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

WHAT SHOULD WE IMPORT BEFORE USING ITERATIVEIMPUTER? P118

A

from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

WHAT ARE DIFFERENT IMPUTATION STRATEGIES IN ITERATIVE IMPUTATION? P121

A

Ascending, Descending, right-to-left(Arabic), left-to-right(Roman), random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

WHICH PARAMETER OF ITERATIVEIMPUTER IS FOR SETTING THE ITERATION NUMBER? P122

A

Max_iter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly