ML and AI Flashcards
(197 cards)
What is the difference between supervised, unsupervised, and reinforcement learning?
Supervised learning is task-driven (e.g., classification, regression), unsupervised is data-driven (e.g., clustering), and reinforcement learning involves learning from feedback (e.g., trial-and-error).
Why is probabilistic machine learning important?
It helps quantify uncertainty, which is crucial for real-world decision-making (e.g., COVID-19’s R number). Probabilistic ML enables decisions under uncertainty.
What are aleatoric and epistemic uncertainties?
Aleatoric uncertainty comes from inherent randomness (e.g., coin flips), while epistemic uncertainty arises from lack of knowledge (e.g., unknown coin side).
What is probabilistic modelling?
It’s the use of statistical models to specify a probability distribution over data using parameters (e.g., θ). These models support prediction and uncertainty quantification.
What distribution is known as the bell curve and what are its parameters?
The Gaussian (normal) distribution. Parameters: mean (μ) and standard deviation (σ).
How is the likelihood function defined for IID data?
L(θ) = Πᵢ₌₁ⁿ p(yᵢ | θ). It shows how likely data y is for given θ. Different models yield different likelihoods.
What is the goal of Maximum Likelihood Estimation (MLE)?
To find parameter values θ that maximise the likelihood function given the data.
Why is log-likelihood used in MLE?
It simplifies mathematics, especially for product-based likelihoods, by turning them into sums.
Give an example of using MLE with normally distributed data.
If p(yᵢ | θ) ~ N(μ, σ²) and σ is known, MLE is used to estimate μ by maximising likelihood based on data.
How is MLE used for a Bernoulli distribution (e.g., coin flips)?
With heads as 1 and tails as 0, estimate the parameter θ (probability of heads) using the likelihood of observed outcomes.
Why is the choice of distribution critical in probabilistic modelling?
It influences parameter estimates and model behaviour. Different distributions represent different assumptions about the data.
What is the Bayesian perspective in probabilistic inference?
It treats parameters as uncertain and models them with a prior distribution, combining it with the likelihood to get the posterior.
Define Prior, Likelihood, and Posterior in Bayesian inference.
Prior (p(θ)): belief before data. Likelihood (p(y | θ)): probability of data given parameters. Posterior (p(θ | y)): updated belief after observing data.
What is Bayes’ rule?
p(θ | y) = [p(y | θ) * p(θ)] / p(y). Posterior is proportional to likelihood times prior. p(y) is the normalisation constant.
What is Maximum A Posteriori Estimation (MAP)?
MAP estimates the most probable parameter values by maximising the posterior: θ_MAP = argmax_θ [log p(y | θ) + log p(θ)].
What is the difference between MLE and MAP?
MLE uses only the likelihood. MAP includes both the likelihood and the prior, favouring more plausible parameter values.
Why can the product of likelihood and prior not be used directly as a probability?
It must be normalised to integrate to 1. Bayes’ rule achieves this by dividing by the marginal likelihood p(y).
What distribution is typically used as a prior for Bernoulli likelihoods?
The Beta distribution, due to its conjugacy, simplifies posterior calculation.
What are conjugate priors and why are they useful?
A prior is conjugate to a likelihood if the posterior is in the same distribution family. It simplifies computations and avoids numerical integration.
What are informative and non-informative priors?
Informative priors encode strong beliefs (low variance); non-informative priors are flat or weakly informative, allowing data to dominate inference.
How do weakly informative priors differ from non-informative priors?
Weakly informative priors slightly constrain plausible values based on domain knowledge, while non-informative priors assume minimal knowledge.
What are the advantages of using posterior distributions?
They allow prediction, quantification of uncertainty, and model checking using posterior predictive distributions.
What approximation methods exist for Bayesian inference when analytical solutions are hard?
Common methods include Laplace approximation, variational inference, and Monte Carlo (e.g., MCMC).
Summarise the Bayesian workflow in three key terms.
Prior (belief), Likelihood (data given belief), Posterior (updated belief after data). This trio defines Bayesian inference.