probability / gradient descent Flashcards
(38 cards)
what is gradient descent
numerical method for finding the input to a function f that minimizes the function
most common use for gradient descent
minimizing empirical risk
when is gradient descent guaranteed to work
convex function (happy face)
convex function
a function f is convex if, for every a, b in the domain of f, the line segment between (a, f(a)) and (b, f(b))
if f(t) is a function of a single variable and twice differentiable
f(t) is convex if and only if (d^2f)/(dt^2) (t) >= 0 for all t
if f(t) is convex and differentiable
then gradient descent converges to a global minimum of f as long as the step size is small enough
nonconvex functions and gradient descent
gradient descent might work, but not guaranteed
choosing a step size
constant step size, alpha
t(i+1) = ti - alpha(df/dt)(ti)
experiment
some process whose outcome is random
set
an unordered collection of items, usually denoted with curly brackets
sample space
set of all possible outcomes of an experiment
event
subset of the sample space or a set of outcomes
what do probabilities mean
if probability(E) = p, then if we repeat our experiment infinitely many times, the proportion of repetitions in which event E occurs is p
the sum of the probabilities of each outcome
must be exactly 1
probability distribution
describes the probability of each outcome s in a sample space S
probability of each outcome
must be between 0 and 1
the probability of an event
sum of the probabilities of the outcomes in the event
if S (sample space) w/ n possible outcomes and all outcomes are equally likely
then the probability of any one outcome occurring is 1/n
probability of an event
of outcomes in event/ # of outcomes in sample space
complement rule
it is a complement if it contains the set of all outcome in the space that are not in the circle or A
mutually exclusive
no overlap, or they both can’t happen at the same time
if A and B are mutually exclusive, then the probability that A or B happens is:
P(A U B) = P(A) + P(B)
U = union or symbol for or
principle of inclusion-exclusion
if A and B are any two events, then P(A U B) = P(A) + P(B) - P(A intersect B)
the probability that events A and B both happen is
P(A intersect B ) = P(A)P(B|A)
<– given that