Preparing Data for Feature Engineering and Machine Learning in Microsoft Azure Flashcards

1
Q

What issue you could possibly face with a credit card fraud detection dataset?

Problem of outliers

Problem of high-dimensionality

Problem of imbalanced data

Multicollinearity problem

A

Problem of imbalanced data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What happens when we increase the amount of data for a machine learning problem?

A. The training accuracy increases, test accuracy decreases

B. The training accuracy increases, test accuracy increases

C. The training accuracy decreases, test accuracy decreases

D. The training accuracy decreases, test accuracy increases

A

D. The training accuracy decreases, test accuracy increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You can delete the records with missing values if the missing assumption is what?

Missing at Random

Missing Completely at Random

Either of MCAR or MAR

Missing not at Random

A

Missing Completely at Random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which is the best method to use to handle missing data if the feature has outliers?

Mode imputation

Mean imputation

Listwise deletion

Median imputation

A

Median imputation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following Machine Learning models does not have any target value?

Clustering

Anomaly detection

Regression

Classification

A

Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following machine learning models’ target is a continuous value?

Regression

Classification

Anomaly detection

Clustering

A

Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following is the BEST way to create features for a high-cardinality categorical data?

One-hot encoding

Learning with counts

Dummy coding

Binning

A

Learning with counts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following is a disadvantage of linear models?

They run slower

They are not scalable

They may not give accurate predictions

They are harder to train

A

They may not give accurate predictions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is TRUE about Leave-one-out cross validation?

It produces low bias and high variance models

It produces low bias and low variance models

It produces high bias and low variance models

It produces high bias and high variance models

A

It produces low bias and high variance models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Suppose you need to create 7 folds for K-fold Cross validation. How would you do it?

Use Partition and Sample module with ‘Assign to folds’ mode

Use Partition and Sample module with ‘Pick folds’ mode

Use Split data module to assign folds

Use the Cross-validate model module

A

Use Partition and Sample module with ‘Assign to folds’ mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly