- Models the probability of a binary outcome using a weighted linear combination of features. - Work well when the data is linearly separable - Not a good choice for ad click prediction

- Can replace manual feature cross method - Deep network: Learns complex generalizable features using DNN arch - Cross network: Automatically captures feature interactions and learns good feature crosses

ML Models Flashcards by Katie D

Cons of Logistic Regression

Non-linear problems can’t be solved with LR (only produces a linear decision boundary).
Can’t capture feature interactions when the value of one feature influences the value of another (between user, ad, and publisher for instance)

How well did you know this?

Not at all

Perfectly

Pros of Logistic Regression

Pros:
- Easy to implement
- Easy to train
- Fast inference
- Interpretable
- Often useful as a baseline model

How well did you know this?

Not at all

Perfectly

Logistic Regression

Models the probability of a binary outcome using a weighted linear combination of features.
Work well when the data is linearly separable
Not a good choice for ad click prediction

How well did you know this?

Not at all

Perfectly

Gradient-boosted decision trees

Pros
- Interpretable and easy to understand
- Can be used for feature selection (importance) and feature extraction

Cons
- Inefficient for continual learning. Not designed to be fine-tuned with new data. Usually need to retrain the model from scratch.

How well did you know this?

Not at all

Perfectly

Two-tower neural network

Generate user embeddings for user features
Generate ad embeddings for ad features
The similarity between the user and ad embeddings is used to calculate relevance

How well did you know this?

Not at all

Perfectly

Challenges of ad click prediction

Feature space is large and sparse. most features are filled with zeros.
Difficult to capture pairwise interactions
Continuous retraining

How well did you know this?

Not at all

Perfectly

Deep & Cross Network

Can replace manual feature cross method
Deep network: Learns complex generalizable features using DNN arch
Cross network: Automatically captures feature interactions and learns good feature crosses

How well did you know this?

Not at all

Perfectly

Factorization machines

Efficiently captures pairwise interactions between features
Improves logistic regression
Useful for ad click prediction

How well did you know this?

Not at all

Perfectly

How do factorization machines work?

Learns an embedding vector for each feature. The interaction between two features is the dot product of their embeddings

How well did you know this?

Not at all

Perfectly

Support Vector Machines

Kind of like logistic regression in multi-dimensional space

Find a shape in n-dim space that classifies data points

How well did you know this?

Not at all

Perfectly

What is Learn To Rank?

Supervised machine learning to solve ranking problems
Given a query and a list of items, determine the optimal ordering of the items from most relevant to least relevant

How well did you know this?

Not at all

Perfectly

What are the types of Learn to Rank?

Pointwise
Pairwise
Listwise

How well did you know this?

Not at all

Perfectly

Point-wise Learn to Rank

The score of each item is predicted independently of the other items
The final ranking is achieved by sorting the predicted relevance scores

How well did you know this?

Not at all

Perfectly

Pair-wise Learn to Rank

Given a query and two items, predicts which item is more relevant to the query
Examples: RankNet, LambdaRank, LambdaMART

How well did you know this?

Not at all

Perfectly

List-wise Learn to Rank

Given a query and a list of items, predict the optimal ordering of an entire list
Examples: SoftRank, ListNet, and AdaRank

How well did you know this?

Not at all

Perfectly

Tradeoffs for Learn to Rank Approaches?

Pairwise and listwise approaches produce more accurate results but they are more difficult to implement and train

Pros of Decision Trees

Fast training
Fast inference
Minimal data prep (doesn’t require normalization or scaling) since the algorithm doesn’t depend on the distributions of input features
Interpretable

Cons of Decision Trees

Over-fitting. Decision trees and very sensitive to small variations in data. A small change in input may lead to different outcomes at serving time. Small changes in training data can produce a different tree structure.
Too sensitive. Predictions are less reliable. Naive decision trees are rarely used in practice.

Techniques to reduce the sensitivity of decision trees

Bagging
Boosting

Bagging (Decision Trees)

Ensemble learning method that trains a set of ML models in parallel on multiple subsets of training data
Predictions of all these trained models are combined to make a final prediction
Reduces the model sensitivity

Random Forest

Builds multiple decision trees in parallel during training
A voting mechanism is used to combine the predictions to make a final prediction
Example of bagging a decision tree

Advantages of Bagging

-Reduces the effect of over-fitting (high variance)
- Doesn’t increase training time very much because the decision trees can be trained in parallel
- Does not add much latency at the inference time because decision trees can process the input in parallel

Disadvantages of Bagging

Not helpful when the model faces under-fitting (high bias)
Need boosting for that

Linear Regression vs Logistic Regression

Linear regression is used to estimate the dependent variable in case of a change in independent variables. For example, predict the price of houses.
Logistic regression is used to calculate the probability of an event. For example, classify if tissue is benign or malignant.

Boosting

- Ensemble learning technique to improve the performance of weak learners by combining predictions. - The final model is a weighted combination of the weak learners

Bias

Stubbornness of the algorithm when confronted with new data

Deep Factorization Machines

- Combines the strengths of a NN and FM - The DNN captures higher order features - The FM captures low-level pairwise interactions