Intro To Linear Regression + General Flashcards

1
Q

Give a definition of ML

A

A computer program (machine learning) is said to learn from experience E with some class of tasks T and a performance measure P

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Give a definition of supervised learning

A

Supervised learning is a type of machine learning where an algorithm learns from a labeled dataset, which means that each input data point is associated with a corresponding target output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the difference between classification and regression?

A

the aim of classification is to classify the output in predefined class (if it has two value is binary) however the aim of regression is to predict a continuos numerical value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

make two examples, one of classification and one of regression

A

the two classic examples could be classification of spam email and the prediction of house value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

give a definition of unsupervised learning

A

Unsupervised learning is a type of machine learning where the algorithm is trained on unlabeled data, meaning that the input data does not have corresponding target outputs or class labels. The main objective of unsupervised learning is to discover patterns, structures, or relationships within the data without explicit guidance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a discriminative model?

A

These models are trained over a training set.
When these models take an input, they estimate the most probable
output. The purpose is to estimate the conditional probability p (y|x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is a recommender system?

A

A recommender system, also known as a recommendation system or recommendation engine, is a type of machine learning system that provides personalized suggestions or recommendations to users.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is a generative model?

A

The purpose is to estimate the joint probability p(x, y).
These are probabilistic models that produce both input and
output. After the model is trained, the conditional probability can
be inferred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is the difference between joint probability and conditional probability

A

The joint probability of two or more events occurring is the probability that all of those events occur simultaneously. while Conditional probability is the probability of an event occurring given that another event has already occurred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is linear regression?

A

Linear regression is a supervised machine learning technique used for modeling the relationship between a dependent variable (or target) and one or more independent variables (or predictors) by fitting a linear equation to the observed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how can we choose theta in linear regression models?

A

In linear regression, the goal is to choose the values of the model parameters (θ or coefficients) that best fit the observed data.
Firstly we define the Cost Function than we define a method to calculate the minimum Gradient Descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the cost function?

A

the cost function, also known as the objective function or loss function, It is a mathematical function that quantifies the error or discrepancy between the predicted values generated by a model and the actual target values in a supervised learning problem.

FORMULA
** J(θ) = 1/2m *
m

i=1 (h(θ,x ^ (i) −y ^(i))^2 **
where m is the number of samples and in parentes we have the **mean square error (MSE) **

The division by 2m is a convenience factor for simplifying computations and doesn’t affect the optimization process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is gradient descent?

A

Gradient Descent is an optimization algorithm used to minimize a cost function.

** θj = θj − α∗∂(cost)/∂θj **
where α is the learning rate and affect the convergence and j=1…n where j is the index associate to the parameter.
for example in a easy LR we use
​θ0 and θ1 so j= 0,1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the learning rate α

A

α is an hyperparameter which means we have to put it before the computation and it is not learn by the algorithm.
if α is small the convergence is sure but slow however if it is big the convergence is fast but not sure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the idea behind linear regression? Intuition of LR

A

the idea is the search a function that given an input predict an output.
in this case we have a straight line -> h(x) = θ0 + θ1 * x .
h (x) approximate the behaviour of f(x), the aim is to find θ* in order to have the precise h(x).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Derive the GD formula starting from the definition of the cost function

A

GD demostration

17
Q

Explain the linear regression with multiple features, derive the close form of the formula

A

dimostrazione sulle slide

18
Q

explain the difference between BGD, SGD and MBGD

A

BGD : use all the training samples for calculating the GD, advantage : more precise and sure convergence
disadvantage: slow computation, not good for bigger dataset,

SGD : use one random sample every step
advantage : faster then BGD
disadvantage : convergence is not sure either is not a linear convergence

MBGD : represent a compromise between BGD and SGD infact use a b sample group subset of D (dimension m) where the D(i) with dimension b is &laquo_space; of D with dimension m.

19
Q

Make some consideration about the chosen criteria of the b sample group in the MGBD

A

Each iteration we must be sure that among the 𝑏 samples we only took elements that have never been
taken into consideration before.
Approximately at the 𝑚/𝑏 iteration we will have considered all the samples inside the 𝑚 dataset and
this whole process is repeated.

20
Q

makes some propability interpretaion about the linear regression

A

see the paint saved file

21
Q

explain the likelyhood function

A

see the paint saved file

22
Q

explain the likelyhood function maximizing the parameters and explain the user of logarithm in the formula

A

see the paint saved file

23
Q

derive the optimal result for l(θ)

A

see the paitn saved file