Linear Regression Flashcards
(10 cards)
What are basis functions? What they possibilitate?
Are functions that transform the input
They are used to extend linear models to non-linear relationships
What is the Ordinary Least Square? When is it most suitable to use?
It is a method that calculates the optimal weight vector by multiplying the inverse of the multiplication of the transpose of the basis by the basis by the transpose of the basis and the target vector.
It derives the optimal weights analytically using normal equations.
It yields the exact global optimum when the error function is quadratic and convex .
It is most suitable to smaller datasets.
How can we solve a regression problem for large scale problems?
For large scale problems where the closed-form solutions is impractical, iterative methods such as stochastic gradient descent are discussed.
Why the closed-form solution is not suitable for large problems?
Because it requires the computation of the inverse of the multiplication of the transpose of the basis by the basis what is computationally expensive with larga datasets (roughly O(M^3)
Also it assumes that that same matrix is nonsingular (invertible), if this condition fails, the closed-form solution might not exist or require regularization.
What is the central goal when using maximum likelihood for solving linear regressions? What does the likelihood function measure?
The maximum likelihood approach treats the observed data as being generated by a model with Gaussian noise.
The likelihood function quantifies how probable the observed data is for a given set of parameters w.
What are the steps taken to maximize the likelihood that a given set of parameters generated the data we have?
- We assume the target value t is generated by a deterministic function y(x, w) plus some random noise
- We assume the noise follows a Gaussian normal distribution with zero mean and variance sigma squared
- Then we estimate the probability of observing a target t given the parameters, the weights, and the variance
- Next, assuming all samples are independent we estimate the function likelihood for the whole dataset
- Then we focus on maximizing the ln of this function
What is the size of the penalties for Ridge and Lasso? Which one can be solved with a closed form solution? Which one produce sparse models?
The penalties in Ridge regression is Proportional to square of weights
For Lasso otherwise it is Proportional to absolute value of weights
Ridge can be solved with a closed form solution
Lasso produce sparse models
What is the basic assumption done in the Bayesian approach to linear regression? What does this possibilitates?
We model the parameters w as random variables with a specified prior distribution
This method allows for incorporating prior beliefs and provides a natural way to quantify uncertainty
What does it means to consider the weights as random variables?
It means that rather than having a fixed point estimate for w, we have a probability distribution that reflects our beliefs about the likely values of w
Give a general framework starting from the prior distribution of the weights up to the prediction moment. How does the Bayesian method related to regularizatiin?
The essence is: we want to estimate the probability of the weights given the data, for that we multiply the probability of the data given the weigths by the probability of the weights.
We start by doing the same steps as we did for the maximization of the likelihood method up to the part in which we have the likelihood for all targets given the parameters, weights, and variance. This this our probability of the data given the weights
Then it considers the probability of the weights as a normal one (our prior)
And to calculate the posterior we multiply the results obtained
We finish by minimizing the log of the posterior and what we obtain is an equation similar to the Ridge one