Introduction to Machine Learning Flashcards by Stefan Hoejmose

If the business case is to predict fraud detection, which is the correct Objective to choose in Vertex AI?

Segmentation

Forecasting

Regression/Classification

Clustering

Regression/Classification

How well did you know this?

Not at all

Perfectly

Which of the following metrics can be used to find a suitable balance between precision and recall in a model?

F1 Score

ROC AUC

Log Loss

PR AUC

PR AUC, this is the area under the precision-recall PR curve.
F1 score, this is the harmonic mean of precision and recall. F1 is a useful metric if you’re looking for a balance between precision and recall and there’s an uneven class distribution.
ROC AUC, this is the area under the receiver operating characteristic ROC curve.
Log loss, this is the cross-entropy between the model predictions and the target values.

How well did you know this?

Not at all

Perfectly

MAE, MAPE, RMSE, RMSLE and R2 are all available as test examples in the Evaluate section of Vertex AI and are common examples of what type of metric?

Decision Trees Progression Metrics

Linear Regression Metrics

Clustering Regression Metrics

Forecasting Regression Metrics

Linear Regression Metrics

How well did you know this?

Not at all

Perfectly

If a dataset is presented in a Comma Separated Values (CSV) file, which is the correct data type to choose in Vertex AI?

Video

Tabular

Text

Image

Tabular

How well did you know this?

Not at all

Perfectly

For a user who can use SQL, has little Machine Learning experience and wants a ‘Low-Code’ solution, which Machine Learning framework should they use?

Python

BigQuery ML

AutoML

Scikit-Learn

BigQuery ML

How well did you know this?

Not at all

Perfectly

What is the default setting in AutoML Tables for the data split in model evaluation?

80% Training 10% Validation, 10% Testing

80% Training, 15% Validation, 5% Testing

80% Training, 5% Validation, 15% Testing

70% Training, 20% Validation, 10% Testing

80% Training 10% Validation, 10% Testing

How well did you know this?

Not at all

Perfectly

What does the Feature Importance attribution in Vertex AI display?

How much each feature impacts the model, expressed as a decimal

How much each feature impacts the model, expressed as a ratio

How much each feature impacts the model, expressed as a ranked list

How much each feature impacts the model, expressed as a percentage

How well did you know this?

Not at all

Perfectly

Which of the following are stages of the Machine Learning workflow that can be managed with Vertex AI?

All of the options.

Train an ML model on your data.

Deploy your trained model to an endpoint for serving predictions.

Create a dataset and upload data.

All of the options.

How well did you know this?

Not at all

Perfectly

What is the main benefit of using an automated Machine Learning workflow?

It reduces the time it takes to develop trained models and assess their performance.

It makes the model run faster.

It deploys the model into production.

It makes the model perform better.

It reduces the time it takes to develop trained models and assess their performance.

How well did you know this?

Not at all

Perfectly

Which of the following are advantages of BigQuery ML when compared to Python based ML frameworks?

All of the options

BigQuery ML automates multiple steps in the ML workflow

BigQuery ML custom models can be created without the use of multiple tools

Moving and formatting large amounts of data takes longer with Python based models compared to model training in BigQuery

All of the options

How well did you know this?

Not at all

Perfectly

Which of these BigQuery supported classification models is most relevant for predicting binary results, such as True/False?

DNN Classifier (TensorFlow)

AutoML Tables

XGBoost

Logistic Regression

How well did you know this?

Not at all

Perfectly

Where labels are not available, for example where customer segmentation is required, which of the following BigQuery supported models is useful?

Time Series Anomaly Detection

Recommendation - Matrix Factorization

Time Series Forecasting

K-Means Clustering

How well did you know this?

Not at all

Perfectly

For Classification or Regression problems with decision trees, which of the following models is most relevant?

XGBoost

AutoML Tables

Wide and Deep NNs

Linear Regression

XGBoost

How well did you know this?

Not at all

Perfectly

What are the 3 key steps for creating a Recommendation System with BigQuery ML?

Prepare training data in BigQuery, specify the model options in BigQuery ML, export the predictions to Google Analytics

Import training data to BigQuery, train a recommendation system with BigQuery ML, tune the hyperparameters

Prepare training data in BigQuery, train a recommendation system with BigQuery ML, use the predicted recommendations in production

Prepare training data in BigQuery, select a recommendation system from BigQuery ML, deploy and test the model

How well did you know this?

Not at all

Perfectly

Which of the following loss functions is used for classification problems?

MSE

Cross entropy

None of the options are correct.

Both MSE & Cross entropy

Cross entropy

How well did you know this?

Not at all

Perfectly

Which of the following gradient descent methods is used to compute the entire dataset?

Batch gradient descent

Gradient descent

None of the options are correct.

Mini-batch gradient descent

Study These Flashcards

Batch gradient descent

What are the basic steps in an ML workflow (or process)?

Collect data

Check for anomalies, missing data and clean the data

All options are correct.

Perform statistical analysis and initial visualization

Study These Flashcards

All options are correct

For the formula used to model the relationship i.e. y = mx + b, what does ‘m’ stand for?

It captures the amount of change we’ve observed in our label in response to a small change in our feature.

It refers to a bias term which can be used for regression.

None of the options are correct.

It refers to a bias term which can be used for regression and it captures the amount of change we’ve observed in our label in response to a small change in our feature.

Study These Flashcards

It captures the amount of change we’ve observed in our label in response to a small change in our feature.

Which of the following are benefits of Performance metrics over loss functions?

Performance metrics are easier to understand.

Performance metrics are directly connected to business goals.

None of the options are correct.

Performance metrics are easier to understand and are directly connected to business goals.

Study These Flashcards

Performance metrics are easier to understand and are directly connected to business goals.

Which of the following allows you to create repeatable samples of your data?

Use the first few digits of a hash function on the field that you’re using to split or bucketize your data.

Use the last few digits of a hash function on the field that you’re using to split or bucketize your data.

Use the first few digits or the last few digits of a hash function on the field that you’re using to split or bucketize your data.

None of the options are correct.

Study These Flashcards

Use the last few digits of a hash function on the field that you’re using to split or bucketize your data.

How do you decide when to stop training a model?

When your loss metrics start to decrease
check
When your loss metrics start to increase

When your loss metrics start to both increase and decrease

None of the options are correct

Study These Flashcards

When your loss metrics start to increase

Which of the following allows you to split the dataset based upon a field in your data?

BUCKETIZE, an open-source hashing algorithm that is implemented in BigQuery SQL.
check
FARM_FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.

ML_FEATURE FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.

None of the options are correct.

Study These Flashcards

FARM_FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.

Which is the best way to assess the quality of a model?

Observing how well a model performs against a new dataset that it hasn’t seen before and observing how well a model performs against an existing known dataset.

Observing how well a model performs against an existing known dataset.
check

Observing how well a model performs against a new dataset that it hasn’t seen before.

None of the options are correct

Study These Flashcards

Observing how well a model performs against a new dataset that it hasn’t seen before.

Which of the following actions can you perform on your model when it is trained and validated?

You can write it once, and only once, against the independent test dataset.

You can write it multiple times against the dependent test dataset.

You can write it once, and only once against the dependent test dataset.

You can write it multiple times against the independent test dataset.

Study These Flashcards

You can write it once, and only once, against the independent test dataset.

Which of the following are categories of data quality tools? Both ‘Cleaning tools’ and ‘Monitoring tools’ Monitoring tools Cleaning tools None of the options

Both ‘Cleaning tools’ and ‘Monitoring tools’

What are the features of low data quality? Duplicated data Incomplete data Unreliable info All of the options

All of the options

What are the objectives of exploratory data analysis? Uncover a parsimonious model, one which explains the data with a minimum number of predictor variables. Gain maximum insight into the data set and its underlying structure. Check for missing data and other mistakes. All of the options

All of the options

Exploratory Data Analysis is majorly performed using the following methods: Both Univariate and Bivariate Bivariate Univariate None of the options

Both Univariate and Bivariate

Which of the following is not a component of Exploratory Data Analysis? Statistical Analysis and Clustering Anomaly Detection Accounting and Summarizing Hyperparameter tuning

Hyperparameter tuning

Why is regularization important in logistic regression? Finds errors in the algorithm Keeps training time down by regulating the time allowed Avoids overfitting Encourages the use of large weights

Avoids overfitting

Which of the following machine learning models have labels, or in other words, the correct answers to whatever it is that we want to learn to predict? Reinforcement Model Supervised Model Unsupervised Model None of the options

Supervised Model

Which model would you use if your problem required a discrete number of values or classes? Supervised Model Unsupervised Model Regression Model Classification Model

Classification Model

To predict the continuous value of our label, which of the following algorithms is used? Unsupervised Regression Classification None of the options

Regression

What is the most essential metric a regression model uses? Both ‘Mean squared error as their loss function’ & ‘Cross entropy’ Cross entropy Mean squared error as their loss function None of the options

Mean squared error as their loss function

Introduction to Machine Learning Flashcards

(34 cards)