Optimisation and gradient descent algorithm Flashcards

1
Q

explain simply how a machine learning model works?

A

predict—->calculate error—–>learn—–>predict
this is called an algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is an algorithm?

A

algorithm is a set of mathematical instruction for solving a problem
it is basically a word used by programmers when they don’t want to explain what they did

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

where did the term algorithm originate from?

A

Muhammad ibn Musa Al-Khwarizmi a ninth-century Persian mathematician who wrote a popular mathematics book of that time, when that book was translated to latin the translators where confused with his name and termed it algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is cost function in machine learining?

A

-a cost function is an important parameter in deciding how well a machine learning model fits into the dataset
-it the sum of squares of the difference between actual and fitted values
-we need a function that can find when the model is most accurate whole the way between undertrained and overtrained model
-by minimizing the value of cost function we will get an optimal solution
-Cost function is a measure of how wrong the model is in estimating the relationship between X(input) and Y(output) Parameter
-also called as loss function, error function etc..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is Latex markdown?

A

it is a syntax to write down mathematical expressions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is linspace in numpy?

A

it generates an array of linearly spaced numbers between a and b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is subplot and how to implement it in matplotlib?

A

used to display two figures side by side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

explain about cost function implementation in python

A

-represent function
-represent derivative of function
-at the minimum f(x) the slope of function will be zero which is found from the derivative plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

explain briefly about gradient descent algorithm?

A

Gradient Descent is an optimization algorithm used for minimizing the cost function in various machine learning algorithms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

visualize 3d model of cost function

A

downward convex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Implement an optimization algorithm in python

A

Gradient Descent

new_x = 3
previous_x = 0
step_multiplier = 0.1
precision = 0.00001

x_list = [new_x]
slope_list = [df(new_x)]

for n in range(500):
previous_x = new_x
gradient = df(previous_x)
new_x = previous_x - step_multiplier * gradient

step_size = abs(new_x - previous_x)
# print(step_size)

x_list.append(new_x)
slope_list.append(df(new_x))

if step_size < precision:
    print('Loop ran this many times:', n)
    break

print(‘Local minimum occurs at:’, new_x)
print(‘Slope or df(x) value at this point is:’, df(new_x))
print(‘f(x) value or cost at this point is:’, f(new_x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

scatter function can plot with list of x true or false?

A

False
scatter function cannot plot list it can only plot arrays therefore we will have to convert list to array using numpy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what happens with gradient descent when there is a maxima ,local minima and global minima?

A

the gradient descent depends on the initial guess
if the initial guess is near the local minima then the algorithm won’t converge to global minima thereby giving us the wrong output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

implement gradient descent by calling a function?

A

def gradient_descent(df,initial_guess,step_multiplier=0.02,precision=0.001):

new_x = initial_guess

x_list = [new_x]
slope_list = [df(new_x)]

for n in range(500):
    previous_x = new_x
    gradient = df(previous_x)
    new_x = previous_x - step_multiplier * gradient

    step_size = abs(new_x - previous_x)


    x_list.append(new_x)
    slope_list.append(df(new_x))

    if step_size < precision:
        print('Loop ran this many times:', n)
        break

return new_x,x_list,slope_list

localmin,x_list,slope_list=gradient_descent(df,0,0.02,0.001)
print(localmin)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is the difference between stochastic and batch gradient descent?

A

stochastic descent has the feature of randomness
it can deal with random initial guesses thereby trying to predict the correct minima better that batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is divergence and overflow in gradient descent how it occurs and how can you solve it ?

A

overflow the result is too large for the system to handle
it can be solved by limiting the number of iterations

17
Q

what is sys module in python?

A

system module gives various information about python runtime environment
like max floating number python can deal with

18
Q

what is tuple packing and tuple unpacking?

A

packing- breakfast=”bacon”,”beans”,”avacado”
unpacking- “x”,”y”,”z”=breakfast

19
Q

what is learning rate in gradient descent algorithm?

A

learning rate decides how fast the algorithm can converge to the minimal point
if the learning rate is small then it will take more time to converge
if the learning rate is large then it might diverge and never converge to the minima
in our example learning rate can be changed by changing the multipler

20
Q

Bold driver learning rate mechanism?

A

if your cost fn has reduced since the last iteration then increase learning rate by 5%
if your cost fn has increased since the last iteration (algorithm crossed minimal point) then go back to the last iteration and reduce the learning rate by 50%

21
Q

how can you create a 3d model of cost function in python? what is cmap and how it is implemented?

A

TODO generat 3d plot

from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm

fig=plt.figure(figsize=(16,12))
ax=fig.gca(projection=”3d”)
ax.set_xlabel(“x”,fontsize=20)
ax.set_ylabel(“y”,fontsize=20)
ax.set_zlabel(“f(x,y)-cost”,fontsize=20)
#gca-get current axes
ax.plot_surface(x4,y4,f(x4,y4),cmap=cm.coolwarm,alpha=0.4)
plt.show()

22
Q

what is a bug?

A

an unintended behaviour or defect in a program that causes it to crash or malfunction

23
Q

How do you find partial derivative of a function in python? what does symbols do ?

A

from sympy import symbols,diff
a,b=symbols(“x,y”) - it recognises x,y as a,b (now we can print function by calling f(a,b))
f(a,b)
diff(f(a,b),a)-find partial diff of f(x) w.r.t a
f(a,b).evalf(subs={a:1.8,b:1.0})-evaluate f(1.8,1.0)
diff(f(a,b).evalf(subs={a:1.8,b:1.0}))

24
Q

implement batch gradient descent for multivariable cost function?

A

TODO Batch gradient descent with python

in case of multivariable function we have two differentials w.r.t both x and y both of them has to be considered for finding the minimal point

multiplier=0.1
max_iter=200
params=np.array([1.8,1.0])#initial guess

for i in range(max_iter):
gradient_x=diff(f(a,b),a).evalf(subs={a:params[0],b:params[1]})
gradient_y=diff(f(a,b),b).evalf(subs={a:params[0],b:params[1]})
gradinets=np.array([gradient_x,gradient_y])
params=params-multiplier*gradinets

print(params[0],params[1])
print(‘cost is’,f(params[0],params[1]))

25
Q

what is the drawback of sympy module ?

A

computational time is higher as it have to differentiate the function every time it is run so we can write partial derivative as a function to reduce the time required

26
Q

what type of datastructure can be used to plot 3d function?how to create that datastructure?

A

2d array

kirk = np.array([[‘Captain’, ‘Guitar’]])
print(kirk.shape)

hs_band = np.array([[‘Black Thought’, ‘MC’], [‘Questlove’, ‘Drums’]])
print(hs_band.shape)

print(‘hs_band[0] :’, hs_band[0])
print(‘hs_band[0][1] :’, hs_band[1][0])

or you can use reshape function

27
Q

How do you append data to a 2d array?what is axis?

A

kirk = np.array([[‘Captain’, ‘Guitar’]])
print(kirk.shape)

hs_band = np.array([[‘Black Thought’, ‘MC’], [‘Questlove’, ‘Drums’]])
print(hs_band.shape)

print(‘hs_band[0] :’, hs_band[0])
print(‘hs_band[0][1] :’, hs_band[1][0])

the_roots = np.append(arr=hs_band, values=kirk, axis=0)
print(the_roots)

axis defines the way by which you want to add the data either by column or by row
if you want to add the data by row then the column number must match
if you want to add the data by column then the row
number must match
i.e dimensions should match
you can do this by reshaping the array

28
Q

how do you access a particular row or column in a 2d array?

A

print(‘Printing nicknames…’, the_roots[:, 0])
: selects all the rows
0-prints first column

29
Q

explain ways in which you can add elements to a 2d array?

A

values_array = np.append(values_array, params.reshape(1, 2), axis=0)
values_array = np.concatenate((values_array, params.reshape(1, 2)), axis=0)

30
Q

what is the need for MSE when there is RSS?

A

when there are large number of datapoints RSS becomes very big and we might encounter overflow error but when we divide it with the number of datapoints it becomes easy to deal with

31
Q

write a python code to return MSE without using a for loop when two arrays are passed as input

A

TODO define a function to

def MSE(pred,actu):
mse_calc=(1/len(pred))*sum((pred-actu)**2)
return mse_calc
mse=MSE(pred_v,actu_v)
print(mse)

where pred and actu are arrays

32
Q

what is an array ? is tuple an array what about dictionary? what is the difference between array and dictionary?

A

array is a collection of same datatype in contiguous memory locations
tuple is an array if it have same datatype
dictionary is like an array but instead of index, keys are used to access these element

33
Q

what is the difference between meshgrid and reshape function?

A

meshgrid adds more elements to the array by duplicating current element

Input : x = [0, 1, 2, 3, 4, 5]
y = [2, 3, 4, 5, 6, 7, 8]

Output :
x_1 = array([[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.],
[0., 1., 2., 3., 4., 5.]])

y_1 = array([[2., 2., 2., 2., 2., 2.],
[3., 3., 3., 3., 3., 3.],
[4., 4., 4., 4., 4., 4.],
[5., 5., 5., 5., 5., 5.],
[6., 6., 6., 6., 6., 6.],
[7., 7., 7., 7., 7., 7.],
[8., 8., 8., 8., 8., 8.]]

reshape cannot add new elements but can only reshape the order of the array

x=np.arange(12)
y=np.reshape(x, (4,3))

34
Q

how do you access all elements of rows and column seperately using two for loops?

A

for i in range(no):
for j in range(no):
x=matrix[i][j]-access elements of row
y=matrix[j][i]-access elements of column

35
Q

what does unravel_index do in numpy?

A

it helps to obtain a particular index(row and column index) from a matrix

ij_min=np.unravel_index(indices=plot_cost.argmin(),shape=plot_cost.shape)

36
Q

find partial derivative of mean square error by substituting hypothesis equation?

A

we get two seperate equation for both partial derivatives

37
Q

how the actual cost function and study cost function differs?

A

in actual cost function the variables are theta0 and theta1
in study cost function the variables are x and y

normally machine learning problems we have to find the optimal values of thetas by gradient descent algorithm