Full Course Flashcards

Question

What is the Model Evidence and Bayes Factors?

Answer 1

Used for model comparison based on how well models explain data.

Answer 2

Predict future observations based on posterior.

Answer 3

Sampling methods (e.g., MCMC) to approximate posterior distributions.

Answer 4

Approximates posterior near its mode using a Gaussian.

Answer 5

Forward: Add variables one at a time. Backward: Start with all variables, remove one at a time. Stepwise: Mix of forward and backward.

Answer 6

Tries all combinations of predictors, selects best according to criteria (e.g., AIC, BIC).

Answer 7

L1 penalty; can shrink some coefficients exactly to zero (feature selection).

Answer 8

L2 penalty; shrinks coefficients but none to exactly zero.

Answer 9

Combines L1 and L2 penalties; balance between Lasso and Ridge.

Answer 10

Places a prior on coefficients and derives a posterior.

Answer 11

Predict categorical outcomes (classes) rather than continuous values.

Answer 12

Models log-odds of outcome as a linear function of inputs.

Answer 13

Finds parameters that maximise likelihood of observed outcomes.

Answer 14

Iterative minimisation of cost.

Answer 15

Faster convergence using second-order derivatives.

Answer 16

Adds penalties (L1 or L2) or Bayesian priors to logistic regression.

Answer 17

Models joint distribution p(x, y) and uses Bayes' rule for prediction.

Answer 18

Assumes common covariance matrix across classes.

Answer 19

Separate covariance matrix for each class.

Answer 20

Metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC.

Answer 21

Builds a tree-like structure (dendrogram) of nested clusters.

Answer 22

Partitions data into k groups minimising within-cluster variance.

Answer 23

Probabilistic models assuming data comes from a mixture of distributions.

Answer 24

Reduces features while preserving structure (e.g., PCA).

Answer 25

Find directions of maximum variance.

Answer 26

Identify underlying latent factors.

Answer 27

Each split based on feature values; leaf nodes represent final predictions.

Answer 28

Greedy algorithm that splits data at each node to reduce impurity.

Answer 29

Predict categorical outcomes; use measures like Gini impurity or entropy.

Answer 30

Cut back tree to avoid overfitting (based on validation error).

Answer 31

Combines multiple models' predictions by majority rule.

Answer 32

Build multiple models on bootstrapped datasets, then average.

Answer 33

Sequentially improves models by focusing on previous errors.

Answer 34

Combines predictions from multiple models for better performance.

Answer 35

Train a second-level model to combine first-level models' predictions.

Answer 36

Bagging applied to decision trees; random forests add feature randomness at splits.

Answer 37

Gradient boosting sequentially improves trees on residual errors.

Answer 38

Models inspired by the human brain with neurons (nodes) and connections.

Answer 39

Can only learn linear boundaries.

Answer 40

Can model complex, non-linear functions.

Answer 41

Input layer, hidden layers, output layer.

Answer 42

Introduce non-linearity (e.g., sigmoid, ReLU, tanh).

Answer 43

Networks with many hidden layers; learn high-level features.

Answer 44

Train by minimising loss (cost function).

Answer 45

Updates weights incrementally per mini-batch of data.

Answer 46

Algorithm for computing gradients efficiently for training.

Answer 47

Piecewise polynomials ensuring smoothness at joins (knots).

Answer 48

Fits a smooth curve through data by penalising roughness.

Answer 49

Functions measuring similarity between data points (e.g., RBF kernel).

Answer 50

Probabilistic models over functions; provide uncertainty estimates.

Answer 51

Use GPs to make predictions with quantified uncertainty.

Answer 52

Classify text data using logistic regression or discriminant models.

Answer 53

Map inputs to outputs by learning patterns from data.

Answer 54

How wrong the model's predictions are compared to actual outcomes.

Answer 55

Training error is error on seen data; test error is error on new unseen data.

Answer 56

The model learns noise instead of true patterns, leading to high test error.

Answer 57

The balance between model accuracy and model flexibility.

Answer 58

Estimating the model's test error reliably.

Answer 59

Structured workflow from data preprocessing to model deployment.

Answer 60

Probability of observed data given parameter values.

Answer 61

Chooses parameter values that maximise the likelihood function.

Answer 62

MSE = Bias² + Variance.

Answer 63

A range where the true parameter is likely to fall with a certain probability.

Answer 64

Linearity, Independence, Homoscedasticity, Normality of errors.

Answer 65

Updating prior beliefs using observed data.

Answer 66

A prior that results in a posterior from the same distribution family.

Answer 67

A probability-based interval for the parameter.

Answer 68

Shrink coefficients to prevent overfitting.

Answer 69

Shrinks some coefficients exactly to zero (feature selection).

Answer 70

Models the log-odds of class membership.

Answer 71

Minimising the cost function iteratively.

Answer 72

LDA assumes common covariance, QDA allows different covariance matrices.

Answer 73

Grouping data into a nested tree of clusters.

Answer 74

Partitioning data into k groups minimizing within-cluster variance.

Answer 75

Predicts outcomes by splitting data into decision paths.

Answer 76

Building models on bootstrapped datasets and averaging them.

Answer 77

Sequentially improving models by focusing on previous mistakes.

Answer 78

Learns linear decision boundaries.

Answer 79

Introduces non-linearity into a neural network.

Answer 80

Updates weights using small batches of data.

Answer 81

Calculate gradients efficiently for training neural networks.

Answer 82

Used for probabilistic predictions with uncertainty quantification.

Answer 83

High bias → underfit; High variance → overfit

Answer 84

y = β₀ + β₁x + ε

Answer 85

log(p / (1-p)) = β₀ + β₁x

Answer 86

Maximize likelihood; often use log-likelihood

Answer 87

Posterior ∝ Prior × Likelihood

Answer 88

TP(True Positive), TN(True Negative), FP(False Positive), FN(False Negative)

Answer 89

TP/(TP+FP) True Positive/(True Positive/False Positive)

Answer 90

TP/(TP+FN) True Positive/(True Positive/False Negative)

Answer 91

2×((Precision×Recall)/(Precision+Recall))

Answer 92

Higher AUC = Better classifier

Answer 93

Maximises margin; Kernel trick for non-linear cases

Answer 94

Projects data onto principal components to reduce dimensions

Answer 95

Minimise within-cluster variance

Answer 96

Bagging = reduce variance; Boosting = reduce bias

Answer 97

Many decorrelated trees, majority voting

Answer 98

Monitor for model drift post-deployment

Answer 99

Watch for bias, ensure fairness

Answer 100

A model learns noise in the training data and performs poorly on new data.

Answer 101

A model is too simple to capture patterns in the data.

Answer 102

Model, Cost Function, Data.

Answer 103

Increasing model complexity decreases bias but increases variance; aim to balance total error.

Answer 104

To measure the error between predicted and actual values and guide learning.

Answer 105

y = β₀ + β₁x + ε

Answer 106

Linearity, Independence, Homoscedasticity, No perfect multicollinearity, Normality of errors.

Answer 107

R² = 1 - (SS_residual / SS_total)

Answer 108

The expected change in y for a one-unit increase in x. β₀ = intercept (predicted y when x=0)

Answer 109

It maps any real value into a probability between 0 and 1.

Answer 110

log(p / (1-p)) = β₀ + β₁x

Answer 111

Maximum Likelihood Estimation (MLE).

Answer 112

A method that predicts based on the majority class of the k nearest points.

Answer 113

Euclidean distance.

Answer 114

A classification technique based on Bayes' theorem assuming feature independence.

Answer 115

Maximise information gain (reduce entropy or Gini impurity).

Answer 116

Training multiple models on bootstrapped data and averaging results.

Answer 117

Sequentially improving models by focusing on previous errors.

Answer 118

An ensemble of decorrelated decision trees trained on random feature sets.

Answer 119

The hyperplane that maximises the margin between classes.

Answer 120

A method to transform non-linear data into higher dimensions to make it separable.

Answer 121

Principal Component Analysis; reduces dimensionality by projecting data onto directions of maximum variance.

Answer 122

Partitioning data into k clusters minimizing within-cluster variance.

Answer 123

TP / (TP + FP)

Answer 124

TP / (TP + FN)

Answer 125

2 × (Precision × Recall) / (Precision + Recall)

Answer 126

A method to estimate a model's performance by splitting data into training and validation sets.

Answer 127

Penalising model complexity to prevent overfitting.

Answer 128

A table showing true positives, false positives, true negatives, and false negatives.

Answer 129

Area under the ROC curve; higher AUC indicates better classifier.

Answer 130

Integrating a trained model into a production environment for real-time predictions.

Answer 131

To detect data drift and performance degradation.

Answer 132

Systematic errors that unfairly favour certain groups.

Answer 133

A workflow including data collection, preprocessing, modeling, evaluation, and deployment.

Full Course Flashcards

(157 cards)