ML Training 2 Flashcards
(34 cards)
What is Precision?
From all the test examples that were assigned a label, how many actually were supposed to be categorized with that label.
TP/(TP+FP)
What is Recall?
From all the test examples that should have had the label assigned, how many were actually assigned the label.
TP/(TP+FN)
What are other ways to evaluate an AutoML model?
Precision, Recall, Confusion Matrix (see diagonal line), use Precision-Recall curve to decide score threshold (possible to assign to labels individually).
Difference between Colab Enterprise vs Vertex AI Notebook
Colab Enterprise: A collaborative, managed notebook environment with the security and compliance capabilities of Google Cloud. Choose this if your project’s priorities are to collaborate with others and to avoid spending time managing infrastructure.
Vertex AI Workbench: A Jupyter notebook-based environment provided through virtual machine (VM) instances with features that support the entire data science workflow. Choose this if your project’s priorities are control and customizability.
What platforms or features does Vertex AI Workbench support?
Importing conda environments, access data from Cloud Storage or BigQuery, automated notebook runs and idle shutdown, custom containers, third party credentials, monitoring instance, full control over infrastructure (VM instance).
How do you overcome imbalanced datasets?
Downsample the majority class examples and upweight the downsampled examples to reduce prediction bias. Experiment with this rebalancing ratio, just like a hyperparameter. The batch size should be several times greater than the imbalance ratio (>=5).
What is prediction bias?
A value indicating how far apart the average of predictions is from the average of labels in the dataset.
What is selection bias?
Errors in conclusions drawn from sampled data due to a selection process that generates systematic differences between samples observed in the data and those not observed.
Includes Coverage bias, sampling bias, non-response/participation bias.
What is coverage bias?
The population represented in the dataset doesn’t match the population that the machine learning model is making predictions about.
What is sampling bias?
Data is not collected randomly from the target group.
What is non-response/participation bias?
Users from certain groups opt-out of surveys at different rates than users from other groups.
What is collaborative filtering model?
Collaborative filtering is a recommendation technique that filters and predicts items a user might like based on the reactions and preferences of similar users.
The fundamental premise is that people who agreed in their evaluation of certain items are likely to agree again in the future.
What are the three main approaches to building recommendation systems on Google Cloud?
The three approaches are Matrix Factorization in BigQuery Machine Learning (BQML), Recommendations AI, and Two-Tower built-in algorithm.
What is required to train a matrix factorization model on BigQuery?
A table with three input columns: user(s), item(s), and a feedback variable (implicit or explicit, such as ratings).
What are the main benefits of Matrix Factorization on BigQuery?
The benefits include minimal ML expertise required (uses SQL), simple data input requirements, and ability to discover new user interests through collaborative filtering.
What are the limitations of Matrix Factorization?
The limitations include inability to handle large feature sets (cannot handle more than 2 dimensions (user vs items), difficulty with incorporating new items into the matrix (cannot be continuously updated) and requirements for sufficient feedback data due to sparse input matrix.
How does one test recommendation systems?
Set up A/B experiments.
What is Recommendations AI?
It’s a fully managed service that deploys scalable recommendation systems using state-of-the-art deep learning techniques, including two-tower encoders.
Using ML, it solves the limitations of Matrix factorization.
How often are Recommendations AI models updated?
Models are automatically retrained daily and tuned quarterly to capture changes in customer behavior, product assortment, pricing, and promotions.
How do Recommendations AI achieve low serving latency?
It utilizes a scalable approximate nearest neighbors (ANN) service for efficient item retrieval during inference, resulting in low latency.
What feature ensures data consistency in Recommendations AI?
It employs a scalable feature store that maintains consistency between online and offline tasks, preventing data leakage and training-serving skew issues.
What makes the deployment process reliable in Recommendations AI?
It uses a robust CI/CD routine that validates models before deployment and ensures zero-downtime transitions to production.
What is the main purpose of Two-Tower encoders?
They surface the most relevant items for users by encoding both candidate and query data into the same embedding space.
What are the key benefits of the Two-Tower approach?
Benefits include greater control over model training, ability to handle various feature types (text, images), and better handling of cold-start cases.