Full Course Flashcards

(157 cards)

1
Q

What is a Machine Learning Model?

A

A function that maps input data to predicted outputs based on training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a Cost Function?

A

A mathematical function measuring how wrong the model’s predictions are (e.g., Mean Squared Error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a Train and Test Error?

A

Training error: Error on the training data. Test error: Error on unseen data — more important for model evaluation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Overfitting?

A

Model fits noise instead of pattern; low training error, high test error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Bias-Variance Trade-off?

A

Bias: Error from wrong assumptions. Variance: Error from model sensitivity to data. Need a balance to minimise total prediction error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Cross Validation?

A

Repeatedly splits data into training and validation sets to estimate test error reliably (e.g., k-fold CV).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Machine Learning Pipelines?

A

A structured sequence: data cleaning → feature engineering → model training → validation → deployment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a Linear Regression Model?

A

Predicts continuous outcome as a linear function of inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Polynomial Regression?

A

Extends linear regression by including polynomial terms (e.g., x^2, x^3).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a General Linear Model?

A

Includes multiple predictors and interaction terms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Linear Basis Functions?

A

Transforms inputs into new space (e.g., polynomials, splines) before linear modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Estimators and Likelihood Function?

A

Estimator: Rule for estimating parameters. Likelihood function: Probability of data given parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Maximum Likelihood Estimates (MLEs)?

A

Parameter values that maximise likelihood function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Bias, Variance, and Mean Squared Error of Estimators?

A

Bias: Difference between estimator’s expected value and true parameter. Variance: Variability of estimator. MSE = Bias² + Variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Asymptotic Optimality of MLEs?

A

MLEs are consistent and efficient as sample size increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Confidence Intervals?

A

Range estimate for parameters with a specified probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Hypothesis Testing and p-values?

A

Test whether an effect exists; p-value measures strength of evidence against null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Frequentist Inference for Linear Regression?

A

Uses OLS estimates, confidence intervals, t-tests, F-tests for model inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the Assumptions, and Limitations of Regression Output?

A

Assumptions: Linearity, Independence, Homoscedasticity, Normality of errors. Limitations: Sensitive to outliers, multicollinearity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the Elements of Bayesian Inference?

A

Updates beliefs using Bayes’ theorem: Posterior∝Likelihood×Prior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the Prior and Posterior Distributions?

A

Prior: Beliefs before seeing data. Posterior: Updated beliefs after data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are Conjugate Models?

A

Priors chosen so that posterior belongs to the same family.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the Bayes Estimators?

A

Posterior mean or median used as point estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What are the Credible Intervals?

A

Bayesian version of confidence intervals; contains parameter with certain probability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What is the Model Evidence and Bayes Factors?
Used for model comparison based on how well models explain data.
26
What is the Posterior-Predictive Distributions?
Predict future observations based on posterior.
27
What is the Monte Carlo for Bayesian Inference?
Sampling methods (e.g., MCMC) to approximate posterior distributions.
28
What is Laplace Approximation?
Approximates posterior near its mode using a Gaussian.
29
What is the Forward, Backward, and Stepwise Selection?
Forward: Add variables one at a time. Backward: Start with all variables, remove one at a time. Stepwise: Mix of forward and backward.
30
What is the Best Subset Selection?
Tries all combinations of predictors, selects best according to criteria (e.g., AIC, BIC).
31
What is the Lasso Regression?
L1 penalty; can shrink some coefficients exactly to zero (feature selection).
32
What is the Ridge Regression?
L2 penalty; shrinks coefficients but none to exactly zero.
33
What is the Elastic Net?
Combines L1 and L2 penalties; balance between Lasso and Ridge.
34
What is the Bayesian Linear Regression?
Places a prior on coefficients and derives a posterior.
35
What is Classification?
Predict categorical outcomes (classes) rather than continuous values.
36
What is Logistic Regression?
Models log-odds of outcome as a linear function of inputs.
37
What is the Maximum Likelihood for Logistic Regression?
Finds parameters that maximise likelihood of observed outcomes.
38
What is Gradient Descent?
Iterative minimisation of cost.
39
What is Newton-Raphson?
Faster convergence using second-order derivatives.
40
What is Penalised and Bayesian Logistic Regression?
Adds penalties (L1 or L2) or Bayesian priors to logistic regression.
41
What are the Generative Models?
Models joint distribution p(x, y) and uses Bayes' rule for prediction.
42
What is the Linear Discriminant Analysis (LDA)?
Assumes common covariance matrix across classes.
43
What is the Quadratic Discriminant Analysis (QDA)?
Separate covariance matrix for each class.
44
What is the Evaluation of Classification Models?
Metrics: Accuracy, Precision, Recall, F1 Score, ROC-AUC.
45
What is Hierarchical Clustering?
Builds a tree-like structure (dendrogram) of nested clusters.
46
What is k-means Clustering?
Partitions data into k groups minimising within-cluster variance.
47
What are the Mixture Models?
Probabilistic models assuming data comes from a mixture of distributions.
48
What is the Dimension Reduction?
Reduces features while preserving structure (e.g., PCA).
49
What is the Principal Components Analysis (PCA)?
Find directions of maximum variance.
50
What is the Factor Analysis?
Identify underlying latent factors.
51
What is the Interpretation of a Tree?
Each split based on feature values; leaf nodes represent final predictions.
52
What is Recursive Binary Splitting?
Greedy algorithm that splits data at each node to reduce impurity.
53
What are the Classification Trees?
Predict categorical outcomes; use measures like Gini impurity or entropy.
54
What is Tree Pruning?
Cut back tree to avoid overfitting (based on validation error).
55
What is Majority Voting?
Combines multiple models' predictions by majority rule.
56
What is Bagging?
Build multiple models on bootstrapped datasets, then average.
57
What is Boosting?
Sequentially improves models by focusing on previous errors.
58
What is Model Averaging?
Combines predictions from multiple models for better performance.
59
What is Stacking?
Train a second-level model to combine first-level models' predictions.
60
What is Bagging Trees and Random Forests?
Bagging applied to decision trees; random forests add feature randomness at splits.
61
What is Boosting Trees?
Gradient boosting sequentially improves trees on residual errors.
62
What are Neural Networks?
Models inspired by the human brain with neurons (nodes) and connections.
63
What is Single Layer Perceptron?
Can only learn linear boundaries.
64
What is the Multi-Layer Perceptron?
Can model complex, non-linear functions.
65
What is the Architecture of Neural Networks?
Input layer, hidden layers, output layer.
66
What is Activation Functions?
Introduce non-linearity (e.g., sigmoid, ReLU, tanh).
67
What is Deep Neural Networks?
Networks with many hidden layers; learn high-level features.
68
What are the Fitting Neural Networks?
Train by minimising loss (cost function).
69
What is the Stochastic Gradient Descent (SGD)?
Updates weights incrementally per mini-batch of data.
70
What is Backpropagation?
Algorithm for computing gradients efficiently for training.
71
What are Cubic Splines?
Piecewise polynomials ensuring smoothness at joins (knots).
72
What are the Smoothing Splines?
Fits a smooth curve through data by penalising roughness.
73
What are the Kernels?
Functions measuring similarity between data points (e.g., RBF kernel).
74
What is the Gaussian Processes?
Probabilistic models over functions; provide uncertainty estimates.
75
What is the Gaussian Process Regression and Classification?
Use GPs to make predictions with quantified uncertainty.
76
What is the Application to Natural Language Processing?
Classify text data using logistic regression or discriminant models.
77
What is the Role of a machine learning model?
Map inputs to outputs by learning patterns from data.
78
What is the Cost function?
How wrong the model's predictions are compared to actual outcomes.
79
What is the Training error vs Test error?
Training error is error on seen data; test error is error on new unseen data.
80
What is Overfitting?
The model learns noise instead of true patterns, leading to high test error.
81
What is the Bias-variance trade-off?
The balance between model accuracy and model flexibility.
82
What is Cross-validation?
Estimating the model's test error reliably.
83
What is the Machine learning pipeline?
Structured workflow from data preprocessing to model deployment.
84
What is the Likelihood function?
Probability of observed data given parameter values.
85
What is the MLE (Maximum Likelihood Estimation)?
Chooses parameter values that maximise the likelihood function.
86
What is the MSE (Mean Squared Error)?
MSE = Bias² + Variance.
87
What is the Confidence interval?
A range where the true parameter is likely to fall with a certain probability.
88
What are the Assumptions of linear regression?
Linearity, Independence, Homoscedasticity, Normality of errors.
89
What is Bayesian inference?
Updating prior beliefs using observed data.
90
What is the Conjugate prior?
A prior that results in a posterior from the same distribution family.
91
What is the Credible interval in Bayesian inference?
A probability-based interval for the parameter.
92
What is the Main goal of Ridge Regression?
Shrink coefficients to prevent overfitting.
93
What is the Lasso Regression's unique feature?
Shrinks some coefficients exactly to zero (feature selection).
94
What is the Logistic regression?
Models the log-odds of class membership.
95
What is the Role of gradient descent?
Minimising the cost function iteratively.
96
What is the Difference between LDA and QDA?
LDA assumes common covariance, QDA allows different covariance matrices.
97
What is Hierarchical clustering?
Grouping data into a nested tree of clusters.
98
What is K-means clustering?
Partitioning data into k groups minimizing within-cluster variance.
99
What is the Regression tree?
Predicts outcomes by splitting data into decision paths.
100
What is Bagging in ensemble methods?
Building models on bootstrapped datasets and averaging them.
101
What is Boosting?
Sequentially improving models by focusing on previous mistakes.
102
What is Single-layer perceptron?
Learns linear decision boundaries.
103
What is the Activation function?
Introduces non-linearity into a neural network.
104
What is Stochastic gradient descent?
Updates weights using small batches of data.
105
What is Purpose of backpropagation?
Calculate gradients efficiently for training neural networks.
106
What is the Gaussian Processes?
Used for probabilistic predictions with uncertainty quantification.
107
What is the Bias-Variance Tradeoff?
High bias → underfit; High variance → overfit
108
What is the Linear Regression Model?
y = β₀ + β₁x + ε
109
What is the Logistic Regression Model?
log(p / (1-p)) = β₀ + β₁x
110
What is MLE?
Maximize likelihood; often use log-likelihood
111
What is the Bayes' Rule?
Posterior ∝ Prior × Likelihood
112
What is the Confusion Matrix?
TP(True Positive), TN(True Negative), FP(False Positive), FN(False Negative)
113
What is Precision?
TP/(TP+FP) True Positive/(True Positive/False Positive)
114
What is Recall (Sensitivity)?
TP/(TP+FN) True Positive/(True Positive/False Negative)
115
What is F1 Score?
2×((Precision×Recall)/(Precision+Recall))
116
What is ROC-AUC?
Higher AUC = Better classifier
117
What is SVM?
Maximises margin; Kernel trick for non-linear cases
118
What is PCA?
Projects data onto principal components to reduce dimensions
119
What is K-Means Clustering?
Minimise within-cluster variance
120
What is Ensemble Learning?
Bagging = reduce variance; Boosting = reduce bias
121
What is Random Forest?
Many decorrelated trees, majority voting
122
What is the Deployment Reminder?
Monitor for model drift post-deployment
123
What is the Ethics Reminder?
Watch for bias, ensure fairness
124
What is overfitting?
A model learns noise in the training data and performs poorly on new data.
125
What is underfitting?
A model is too simple to capture patterns in the data.
126
What are the three components of a machine learning system?
Model, Cost Function, Data.
127
What is the bias-variance trade-off?
Increasing model complexity decreases bias but increases variance; aim to balance total error.
128
What is the purpose of a cost (loss) function?
To measure the error between predicted and actual values and guide learning.
129
What is the equation for simple linear regression?
y = β₀ + β₁x + ε
130
What assumptions does OLS rely on?
Linearity, Independence, Homoscedasticity, No perfect multicollinearity, Normality of errors.
131
How is R² computed from sums of squares?
R² = 1 - (SS_residual / SS_total)
132
In y = β₀ + β₁x + ε, what does β₁ and β₀ represent?
The expected change in y for a one-unit increase in x. β₀ = intercept (predicted y when x=0)
133
What does the sigmoid function model in logistic regression?
It maps any real value into a probability between 0 and 1.
134
What is the logistic regression model?
log(p / (1-p)) = β₀ + β₁x
135
What estimation method is used in logistic regression?
Maximum Likelihood Estimation (MLE).
136
What is k-nearest neighbors (k-NN)?
A method that predicts based on the majority class of the k nearest points.
137
What distance is commonly used in k-NN?
Euclidean distance.
138
What is Naive Bayes?
A classification technique based on Bayes' theorem assuming feature independence.
139
What is the decision tree splitting criterion?
Maximise information gain (reduce entropy or Gini impurity).
140
What is bagging?
Training multiple models on bootstrapped data and averaging results.
141
What is boosting?
Sequentially improving models by focusing on previous errors.
142
What is a Random Forest?
An ensemble of decorrelated decision trees trained on random feature sets.
143
What does SVM aim to find?
The hyperplane that maximises the margin between classes.
144
What is the kernel trick?
A method to transform non-linear data into higher dimensions to make it separable.
145
What is PCA?
Principal Component Analysis; reduces dimensionality by projecting data onto directions of maximum variance.
146
What is k-means clustering?
Partitioning data into k clusters minimizing within-cluster variance.
147
What is Precision?
TP / (TP + FP)
148
What is Recall?
TP / (TP + FN)
149
What is F1 Score?
2 × (Precision × Recall) / (Precision + Recall)
150
What is cross-validation?
A method to estimate a model's performance by splitting data into training and validation sets.
151
What is regularisation?
Penalising model complexity to prevent overfitting.
152
What is a confusion matrix?
A table showing true positives, false positives, true negatives, and false negatives.
153
What is ROC-AUC?
Area under the ROC curve; higher AUC indicates better classifier.
154
What is model deployment?
Integrating a trained model into a production environment for real-time predictions.
155
Why is monitoring deployed models important?
To detect data drift and performance degradation.
156
What is algorithmic bias?
Systematic errors that unfairly favour certain groups.
157
What is an ML pipeline?
A workflow including data collection, preprocessing, modeling, evaluation, and deployment.