Intro To Linear Regression + General Flashcards

Question 1

Q

Give a definition of ML

Answer

A

A computer program (machine learning) is said to learn from experience E with some class of tasks T and a performance measure P

Question 2

Q

Give a definition of supervised learning

Answer

A

Supervised learning is a type of machine learning where an algorithm learns from a labeled dataset, which means that each input data point is associated with a corresponding target output.

Question 3

Q

what is the difference between classification and regression?

Answer

A

the aim of classification is to classify the output in predefined class (if it has two value is binary) however the aim of regression is to predict a continuos numerical value

Question 4

Q

make two examples, one of classification and one of regression

Answer

A

the two classic examples could be classification of spam email and the prediction of house value

Question 5

Q

give a definition of unsupervised learning

Answer

A

Unsupervised learning is a type of machine learning where the algorithm is trained on unlabeled data, meaning that the input data does not have corresponding target outputs or class labels. The main objective of unsupervised learning is to discover patterns, structures, or relationships within the data without explicit guidance.

Question 6

Q

what is a discriminative model?

Answer

A

These models are trained over a training set.
When these models take an input, they estimate the most probable
output. The purpose is to estimate the conditional probability p (y|x).

Question 7

Q

what is a recommender system?

Answer

A

A recommender system, also known as a recommendation system or recommendation engine, is a type of machine learning system that provides personalized suggestions or recommendations to users.

Question 8

Q

what is a generative model?

Answer

A

The purpose is to estimate the joint probability p(x, y).
These are probabilistic models that produce both input and
output. After the model is trained, the conditional probability can
be inferred

Question 9

Q

what is the difference between joint probability and conditional probability

Answer

A

The joint probability of two or more events occurring is the probability that all of those events occur simultaneously. while Conditional probability is the probability of an event occurring given that another event has already occurred

Question 10

Q

what is linear regression?

Answer

A

Linear regression is a supervised machine learning technique used for modeling the relationship between a dependent variable (or target) and one or more independent variables (or predictors) by fitting a linear equation to the observed data.

Question 11

Q

how can we choose theta in linear regression models?

Answer

A

In linear regression, the goal is to choose the values of the model parameters (θ or coefficients) that best fit the observed data.
Firstly we define the Cost Function than we define a method to calculate the minimum Gradient Descent

Question 12

Q

what is the cost function?

Answer

A

the cost function, also known as the objective function or loss function, It is a mathematical function that quantifies the error or discrepancy between the predicted values generated by a model and the actual target values in a supervised learning problem.

FORMULA
** J(θ) = 1/2m *
m
∑
i=1 (h(θ,x ^ (i) −y ^(i))^2 **
where m is the number of samples and in parentes we have the **mean square error (MSE) **

The division by 2m is a convenience factor for simplifying computations and doesn’t affect the optimization process.

Question 13

Q

what is gradient descent?

Answer

A

Gradient Descent is an optimization algorithm used to minimize a cost function.

** θj = θj − α∗∂(cost)/∂θj **
where α is the learning rate and affect the convergence and j=1…n where j is the index associate to the parameter.
for example in a easy LR we use
θ0 and θ1 so j= 0,1.

Question 14

Q

what is the learning rate α

Answer

A

α is an hyperparameter which means we have to put it before the computation and it is not learn by the algorithm.
if α is small the convergence is sure but slow however if it is big the convergence is fast but not sure.

Question 15

Q

what is the idea behind linear regression? Intuition of LR

Answer

A

the idea is the search a function that given an input predict an output.
in this case we have a straight line -> h(x) = θ0 + θ1 * x .
h (x) approximate the behaviour of f(x), the aim is to find θ* in order to have the precise h(x).

Question 16

Q

Derive the GD formula starting from the definition of the cost function

Answer

Study These Flashcards

A

GD demostration

Question 17

Q

Explain the linear regression with multiple features, derive the close form of the formula

Answer

Study These Flashcards

A

dimostrazione sulle slide

Question 18

Q

explain the difference between BGD, SGD and MBGD

Answer

Study These Flashcards

A

BGD : use all the training samples for calculating the GD, advantage : more precise and sure convergence
disadvantage: slow computation, not good for bigger dataset,

SGD : use one random sample every step
advantage : faster then BGD
disadvantage : convergence is not sure either is not a linear convergence

MBGD : represent a compromise between BGD and SGD infact use a b sample group subset of D (dimension m) where the D(i) with dimension b is &laquo_space; of D with dimension m.

Question 19

Q

Make some consideration about the chosen criteria of the b sample group in the MGBD

Answer

Study These Flashcards

A

Each iteration we must be sure that among the 𝑏 samples we only took elements that have never been
taken into consideration before.
Approximately at the 𝑚/𝑏 iteration we will have considered all the samples inside the 𝑚 dataset and
this whole process is repeated.