Experimental Design for Data Analysis Flashcards Preview

DP-100 - PS > Experimental Design for Data Analysis > Flashcards

Flashcards in Experimental Design for Data Analysis Deck (10)
Loading flashcards...
1

Which of the following are valid hyperparameters for decision trees?

A: Depth of a tree

B: Min samples per node

C: Max samples per node

D: Length of a tree

B and C only

A and B only

C and D only

A and C only

A and B only

2

What kind of machine learning model can predict whether email is spam or ham?

Dimensionality reduction

Classification

Regression

Clustering

Classification

3

What is a benefit of Azure Machine Learning Studio?

It allows for the building and training of machine learning models with no code.

It includes pre-trained machine learning models for all to use.

It includes powerful APIs for common machine learning problems.

It allows for the building and training of machine learning models with no code.

4

If you want to prototype your machine learning models in Python on Azure, what service would you choose?

Azure Machine Learning Studio

Azure APIs

Azure Notebooks

Azure Machine Learning Service

Azure Notebooks

5

How is a confusion matrix constructed?

First row = accuracy, recall; Second row = precision, F1-score

Rows = predicted labels; columns = actual labels; cell values = harmonic mean of instance counts for corresponding pair of actual and predicted labels

Rows = actual labels; columns = predicted labels; cell values = instance counts for corresponding pair of actual and predicted labels

Rows = actual labels; columns = predicted labels; cell values = instance counts for corresponding pair of actual and predicted labels

6

How is the accuracy of a classifier calculated?

TP/(TP + FN) where TP = number of true positives and FN = number of false negatives

Sum of all diagonal elements from confusion matrix; divide by sum of all elements in confusion matrix

TP/(TP + FP) where TP = number of true positives and FP = number of false positives

Average of all diagonal elements from confusion matrix; divide by average of all elements in confusion matrix

Sum of all diagonal elements from confusion matrix; divide by sum of all elements in confusion matrix

7

What is the best definition of hyperparameters in a machine learning model?

Model configuration parameters that learn from test data

Model inputs that train models

Model parameters that learn from training data

Design parameters of the machine learning algorithm that stay constant during training

Design parameters of the machine learning algorithm that stay constant during training

8

If you want to ensure that grouped data does not cross validation-fold boundaries, what kind of cross validation would you choose?

Singular cross validation

Stratified k-fold

Repeated k-fold

Group k-fold

K-fold

Group k-fold

9

If you want to ensure that each fold of your validation data has similar representations of records of each category or class, what kind of cross validation would you choose?

Repeated k-fold

K-fold

Singular cross validation

Group k-fold

Stratified k-fold

Stratified k-fold

10

Which test looks across multiple samples, compares their means, and computes one test statistic and one p-value?

Paired difference test

Analysis of variance (ANOVA)

T-test

Linear regression

Analysis of variance (ANOVA)