Introduction to Machine Learning Flashcards
If the business case is to predict fraud detection, which is the correct Objective to choose in Vertex AI?
Segmentation
Forecasting
Regression/Classification
Clustering
Regression/Classification
Which of the following metrics can be used to find a suitable balance between precision and recall in a model?
F1 Score
ROC AUC
Log Loss
PR AUC
PR AUC, this is the area under the precision-recall PR curve.
F1 score, this is the harmonic mean of precision and recall. F1 is a useful metric if you’re looking for a balance between precision and recall and there’s an uneven class distribution.
ROC AUC, this is the area under the receiver operating characteristic ROC curve.
Log loss, this is the cross-entropy between the model predictions and the target values.
MAE, MAPE, RMSE, RMSLE and R2 are all available as test examples in the Evaluate section of Vertex AI and are common examples of what type of metric?
Decision Trees Progression Metrics
Linear Regression Metrics
Clustering Regression Metrics
Forecasting Regression Metrics
Linear Regression Metrics
If a dataset is presented in a Comma Separated Values (CSV) file, which is the correct data type to choose in Vertex AI?
Video
Tabular
Text
Image
Tabular
For a user who can use SQL, has little Machine Learning experience and wants a ‘Low-Code’ solution, which Machine Learning framework should they use?
Python
BigQuery ML
AutoML
Scikit-Learn
BigQuery ML
What is the default setting in AutoML Tables for the data split in model evaluation?
80% Training 10% Validation, 10% Testing
80% Training, 15% Validation, 5% Testing
80% Training, 5% Validation, 15% Testing
70% Training, 20% Validation, 10% Testing
80% Training 10% Validation, 10% Testing
What does the Feature Importance attribution in Vertex AI display?
How much each feature impacts the model, expressed as a decimal
How much each feature impacts the model, expressed as a ratio
How much each feature impacts the model, expressed as a ranked list
How much each feature impacts the model, expressed as a percentage
How much each feature impacts the model, expressed as a percentage
Which of the following are stages of the Machine Learning workflow that can be managed with Vertex AI?
All of the options.
Train an ML model on your data.
Deploy your trained model to an endpoint for serving predictions.
Create a dataset and upload data.
All of the options.
What is the main benefit of using an automated Machine Learning workflow?
It reduces the time it takes to develop trained models and assess their performance.
It makes the model run faster.
It deploys the model into production.
It makes the model perform better.
It reduces the time it takes to develop trained models and assess their performance.
Which of the following are advantages of BigQuery ML when compared to Python based ML frameworks?
All of the options
BigQuery ML automates multiple steps in the ML workflow
BigQuery ML custom models can be created without the use of multiple tools
Moving and formatting large amounts of data takes longer with Python based models compared to model training in BigQuery
All of the options
Which of these BigQuery supported classification models is most relevant for predicting binary results, such as True/False?
DNN Classifier (TensorFlow)
AutoML Tables
XGBoost
Logistic Regression
Logistic Regression
Where labels are not available, for example where customer segmentation is required, which of the following BigQuery supported models is useful?
Time Series Anomaly Detection
Recommendation - Matrix Factorization
Time Series Forecasting
K-Means Clustering
K-Means Clustering
For Classification or Regression problems with decision trees, which of the following models is most relevant?
XGBoost
AutoML Tables
Wide and Deep NNs
Linear Regression
XGBoost
What are the 3 key steps for creating a Recommendation System with BigQuery ML?
Prepare training data in BigQuery, specify the model options in BigQuery ML, export the predictions to Google Analytics
Import training data to BigQuery, train a recommendation system with BigQuery ML, tune the hyperparameters
Prepare training data in BigQuery, train a recommendation system with BigQuery ML, use the predicted recommendations in production
Prepare training data in BigQuery, select a recommendation system from BigQuery ML, deploy and test the model
Which of the following loss functions is used for classification problems?
MSE
Cross entropy
None of the options are correct.
Both MSE & Cross entropy
Cross entropy
Which of the following gradient descent methods is used to compute the entire dataset?
Batch gradient descent
Gradient descent
None of the options are correct.
Mini-batch gradient descent
Batch gradient descent
What are the basic steps in an ML workflow (or process)?
Collect data
Check for anomalies, missing data and clean the data
All options are correct.
Perform statistical analysis and initial visualization
All options are correct
For the formula used to model the relationship i.e. y = mx + b, what does ‘m’ stand for?
It captures the amount of change we’ve observed in our label in response to a small change in our feature.
It refers to a bias term which can be used for regression.
None of the options are correct.
It refers to a bias term which can be used for regression and it captures the amount of change we’ve observed in our label in response to a small change in our feature.
It captures the amount of change we’ve observed in our label in response to a small change in our feature.
Which of the following are benefits of Performance metrics over loss functions?
Performance metrics are easier to understand.
Performance metrics are directly connected to business goals.
None of the options are correct.
Performance metrics are easier to understand and are directly connected to business goals.
Performance metrics are easier to understand and are directly connected to business goals.
Which of the following allows you to create repeatable samples of your data?
Use the first few digits of a hash function on the field that you’re using to split or bucketize your data.
Use the last few digits of a hash function on the field that you’re using to split or bucketize your data.
Use the first few digits or the last few digits of a hash function on the field that you’re using to split or bucketize your data.
None of the options are correct.
Use the last few digits of a hash function on the field that you’re using to split or bucketize your data.
How do you decide when to stop training a model?
When your loss metrics start to decrease
check
When your loss metrics start to increase
When your loss metrics start to both increase and decrease
None of the options are correct
When your loss metrics start to increase
Which of the following allows you to split the dataset based upon a field in your data?
BUCKETIZE, an open-source hashing algorithm that is implemented in BigQuery SQL.
check
FARM_FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.
ML_FEATURE FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.
None of the options are correct.
FARM_FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.
Which is the best way to assess the quality of a model?
Observing how well a model performs against a new dataset that it hasn’t seen before and observing how well a model performs against an existing known dataset.
Observing how well a model performs against an existing known dataset.
check
Observing how well a model performs against a new dataset that it hasn’t seen before.
None of the options are correct
Observing how well a model performs against a new dataset that it hasn’t seen before.
Which of the following actions can you perform on your model when it is trained and validated?
You can write it once, and only once, against the independent test dataset.
You can write it multiple times against the dependent test dataset.
You can write it once, and only once against the dependent test dataset.
You can write it multiple times against the independent test dataset.
You can write it once, and only once, against the independent test dataset.