Optimisation and gradient descent algorithm Flashcards
(37 cards)
explain simply how a machine learning model works?
predict—->calculate error—–>learn—–>predict
this is called an algorithm
what is an algorithm?
algorithm is a set of mathematical instruction for solving a problem
it is basically a word used by programmers when they don’t want to explain what they did
where did the term algorithm originate from?
Muhammad ibn Musa Al-Khwarizmi a ninth-century Persian mathematician who wrote a popular mathematics book of that time, when that book was translated to latin the translators where confused with his name and termed it algorithm
what is cost function in machine learining?
-a cost function is an important parameter in deciding how well a machine learning model fits into the dataset
-it the sum of squares of the difference between actual and fitted values
-we need a function that can find when the model is most accurate whole the way between undertrained and overtrained model
-by minimizing the value of cost function we will get an optimal solution
-Cost function is a measure of how wrong the model is in estimating the relationship between X(input) and Y(output) Parameter
-also called as loss function, error function etc..
what is Latex markdown?
it is a syntax to write down mathematical expressions
what is linspace in numpy?
it generates an array of linearly spaced numbers between a and b
what is subplot and how to implement it in matplotlib?
used to display two figures side by side
explain about cost function implementation in python
-represent function
-represent derivative of function
-at the minimum f(x) the slope of function will be zero which is found from the derivative plot
explain briefly about gradient descent algorithm?
Gradient Descent is an optimization algorithm used for minimizing the cost function in various machine learning algorithms.
visualize 3d model of cost function
downward convex
Implement an optimization algorithm in python
Gradient Descent
new_x = 3
previous_x = 0
step_multiplier = 0.1
precision = 0.00001
x_list = [new_x]
slope_list = [df(new_x)]
for n in range(500):
previous_x = new_x
gradient = df(previous_x)
new_x = previous_x - step_multiplier * gradient
step_size = abs(new_x - previous_x) # print(step_size) x_list.append(new_x) slope_list.append(df(new_x)) if step_size < precision: print('Loop ran this many times:', n) break
print(‘Local minimum occurs at:’, new_x)
print(‘Slope or df(x) value at this point is:’, df(new_x))
print(‘f(x) value or cost at this point is:’, f(new_x))
scatter function can plot with list of x true or false?
False
scatter function cannot plot list it can only plot arrays therefore we will have to convert list to array using numpy
what happens with gradient descent when there is a maxima ,local minima and global minima?
the gradient descent depends on the initial guess
if the initial guess is near the local minima then the algorithm won’t converge to global minima thereby giving us the wrong output
implement gradient descent by calling a function?
def gradient_descent(df,initial_guess,step_multiplier=0.02,precision=0.001):
new_x = initial_guess x_list = [new_x] slope_list = [df(new_x)] for n in range(500): previous_x = new_x gradient = df(previous_x) new_x = previous_x - step_multiplier * gradient step_size = abs(new_x - previous_x) x_list.append(new_x) slope_list.append(df(new_x)) if step_size < precision: print('Loop ran this many times:', n) break return new_x,x_list,slope_list
localmin,x_list,slope_list=gradient_descent(df,0,0.02,0.001)
print(localmin)
what is the difference between stochastic and batch gradient descent?
stochastic descent has the feature of randomness
it can deal with random initial guesses thereby trying to predict the correct minima better that batch
what is divergence and overflow in gradient descent how it occurs and how can you solve it ?
overflow the result is too large for the system to handle
it can be solved by limiting the number of iterations
what is sys module in python?
system module gives various information about python runtime environment
like max floating number python can deal with
what is tuple packing and tuple unpacking?
packing- breakfast=”bacon”,”beans”,”avacado”
unpacking- “x”,”y”,”z”=breakfast
what is learning rate in gradient descent algorithm?
learning rate decides how fast the algorithm can converge to the minimal point
if the learning rate is small then it will take more time to converge
if the learning rate is large then it might diverge and never converge to the minima
in our example learning rate can be changed by changing the multipler
Bold driver learning rate mechanism?
if your cost fn has reduced since the last iteration then increase learning rate by 5%
if your cost fn has increased since the last iteration (algorithm crossed minimal point) then go back to the last iteration and reduce the learning rate by 50%
how can you create a 3d model of cost function in python? what is cmap and how it is implemented?
TODO generat 3d plot
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib import cm
fig=plt.figure(figsize=(16,12))
ax=fig.gca(projection=”3d”)
ax.set_xlabel(“x”,fontsize=20)
ax.set_ylabel(“y”,fontsize=20)
ax.set_zlabel(“f(x,y)-cost”,fontsize=20)
#gca-get current axes
ax.plot_surface(x4,y4,f(x4,y4),cmap=cm.coolwarm,alpha=0.4)
plt.show()
what is a bug?
an unintended behaviour or defect in a program that causes it to crash or malfunction
How do you find partial derivative of a function in python? what does symbols do ?
from sympy import symbols,diff
a,b=symbols(“x,y”) - it recognises x,y as a,b (now we can print function by calling f(a,b))
f(a,b)
diff(f(a,b),a)-find partial diff of f(x) w.r.t a
f(a,b).evalf(subs={a:1.8,b:1.0})-evaluate f(1.8,1.0)
diff(f(a,b).evalf(subs={a:1.8,b:1.0}))
implement batch gradient descent for multivariable cost function?
TODO Batch gradient descent with python
in case of multivariable function we have two differentials w.r.t both x and y both of them has to be considered for finding the minimal point
multiplier=0.1
max_iter=200
params=np.array([1.8,1.0])#initial guess
for i in range(max_iter):
gradient_x=diff(f(a,b),a).evalf(subs={a:params[0],b:params[1]})
gradient_y=diff(f(a,b),b).evalf(subs={a:params[0],b:params[1]})
gradinets=np.array([gradient_x,gradient_y])
params=params-multiplier*gradinets
print(params[0],params[1])
print(‘cost is’,f(params[0],params[1]))