04 Training Models Flashcards

1
Q

In an equation y = b1x1 + b2x2 +c what does ML calculate

A

In an equation x1, x2 values are there in the data(columns) and b1 and b2 are calculated by the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

name order of equations in linear regression

A

Normal Equation followed by linear regression model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why is normal equation used

A

to calculate cost function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the main component in linear regression model

A

theta is the main component. we need to find the value of theta where RMSE value is less.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Drawback of normal equation

A

It is very slow on large dataset with many features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Methods used to run Linear regression model with large dataset

A
  1. Gradient Descent
  2. Batch Gradient Descent
  3. Stochastic Gradient Descent
  4. Mini - Batch Gradient Descent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is gradient descent

A

It is an algorithm to find optimal solution to a complex problems.
it measures local gradient of error function to the theta.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the relation of learning rate and theta

A

if the learning rate is too low than it will take more time to reach to the theta value and if the learning rate is too high than it will cross the optimal theta value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Difference between Gradient Descent and Batch Gradient Descent method

A

in the Gradient Descent the change in the cost function and parameter is calculated at each step and in the batch Gradient Descent all the change is calculated using the entire training data and in single step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Stochastic Gradient Descent Method

A

it uses an instance (randomly selected) from the data to calculate the optimal theta value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Advantage of using Stochastic Gradient Descent Method

A

1.Very quick on large dataset.
2. effective when the data has multiple local minima’s.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what each iteration in linear regression called

A

epoch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is learning schedule

A

function to determine learning rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is mini batch stochastic Gradient Descent

A

it is the mixture of stochastic and batch Gradient Descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Do we require scaling for any of the Gradient Descent method

A

yes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Performance of Gradient Descent models on large datasets

A

Normal eq - fast;
BGD - Slow;
SGD - Fast
Mini-batch GD - Fast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Performance of Gradient Descent models with many features

A

Normal eq - slow;
BGD - Fast;
SGD - Fast
Mini-batch GD - Fast

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

which logistic regression model have 0 hyper parameter

A

normal equation(LinearRegression)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can a normal equation be used to solve a polynomial equation?

A

the feature with x^2 will be added as a new feature.

20
Q

what is overfitting

A

the model is performing well training data and is not performing well on the validation set.

21
Q

what is underfitting?

A

the model is not performing well on both training and validation set.

22
Q

how to deal with an underfitting model?

A

we need to add more features or choose a complex model.

23
Q

how to deal with an overfitting model?

A

we need to add more training date.

24
Q

name 3 types of model errors

A

1.Bias;
2.Variance;
3.Irreducible error

25
Q

what is Bias model error and how can we recognize it?

A

It is due to wrong assumptions, i.e. we think it is normal equation while it is quadratic equation. In this case the model underperformance.

26
Q

what is Variance model error and how can we recognize it?

A

It is because the model will be sensitive to even slight change.
In this case the model will overfit.

27
Q

What is regularization?

A

It is constraining the model.
it is a way to control overfitting in a model.
It is achieved by applying weights.

28
Q

what are different types of regularization?

A

1.Ridge Regression
2.Lasso Regression
3.Elastic Net

29
Q

What is Ridge Regression?

A

a regularization term is added to the algorithm forcing the algorithm to fit the data and keep the weights as small as possible.

30
Q

Is scaling necessary for Ridge Regression?

A

Yes standardization is necessary.

31
Q

which hyperparameter in regularization model controls the regularization?

A

alpha

32
Q

which method to use for ridge regression?

A

Ridge()

33
Q

which method to use for ridge regression in sgd?

A

SGDRegressor(penalty = “l2”)

34
Q

what is full form of Lasso Regression

A

Least Absolute Shrinkage and Selection Operator.

35
Q

how does lasso regression work?

A

it adds weights to feature, it adds weight 0 for the least important feature.

36
Q

how is ridge regression different from Lasso regression

A

ridge regression it adds weight & Lasso regression it adds weight 0 to the least important features.

37
Q

which method to use for Lasso regression?

A

Lasso()

38
Q

which method to use for Lasso regression in sgd?

A

SGDRegressor(penalty = “l1”)

39
Q

What is Elastic Net?

A

It is a middle ground between Lasso and Ridge regression

40
Q

which hyperparameter is used to control the ratio of lasso and ridge regression

A

l1_ratio - where if it is close to 0 it will be ridge regression and close to 1 will be lasso regression.

41
Q

Which model among linear regression, ridge & lasso is better

A

ridge & lasso are better as they have regularization.

42
Q

if I have a data with high correlated dataset which model to use between Lasso and Ridge.

A

Ridge is better to start with high correlated dataset.

43
Q

What is Early Stopping?

A

Another way to regularize a model is to use early stopping. where the model stops training as soon as its validation error reaches its minimum.

44
Q

Can Logistic Regression Model be used for classification and regression?

A

Yes

45
Q

What is Logistic Regression?

A

Like Linear Regression it calculates probability for each instance and based on the probability it classifies into 0’s or 1’s. on this Linear Regression a sigmoid function which gives result in 0’s and 1’s.

46
Q

What is Decision Boundaries?

A

it is a boundary between class 0 and 1 which will allow to differentiate between both classes.