Midterm Flashcards
(47 cards)
Backpropagation Steps. Input and Output Params
(1) plug input vector in, run it through until output value found
(2) use delta learning rule to modify weight values based on diff between expected and got output values
Input is Alpha, Inputs, Expected Outputs
(1) Use inputs through forward prop to get the expected values
(2) Apply found o to formula ∆Wij = α * (c - o) o (1-o) xj
(3) Add found ∆W to old W.
For 2 layer network ∆V = ∑ Wij ∆Wij (1 − hj) xk, use h instead of X for W.
V_2,2 = W_1,2 ∆W_1,2 (1-h_2)Xk, the summation means every i, so when only 1 input i always 1.
Delta Learning Rule
Delta Wij = alpha (c - o) o (1 - o) xj where c, o, x are vectors. c and o are of i and X of j.
Each weight contributes some amount to the output. Want to change weight so output gets more accurate.
Move each weight individually down the gradient using alpha
Since we want to move down the error gradient, use:
-α ( ∂E/∂Wij ) -> -α ( ∂E/∂o ) ( ∂o/∂Wij)
When 2 layer:
same process as above, in addition to: also use h instead of x for layer 1.
∆Vjk = -α ∑ ∂E/∂o ∂o/∂h ∂h/∂Vjk
Mean Squared Error
.5 ∑ (vec_c - vec_o)^2
output vector equation
o = step (∑ Wij hj)
McCulloch Pitts Network, how to solve a next step problem
Step function: X^(t+1) = step (∑ Wij Xj - Ui) where X_j is t
Multiply the weights of all edges by the value of all nodes, then subtract by the threshold. If more than 0 then its a 1 if less then 0.
Forward Propagation Steps. What’s its input and output values.
(1) h = σ ( V i)
(2) o = σ (W h)
Use the sigmoid function to standardize the result of Matrix times input vector.
Generates an expected output for a given input
Genetic Algorithms 3 Crossover Types: What is Crossover?
(1) 1 point, (2) 2 Point, (3) Uniform
Swapping all subtrees between 2 parent trees.
1 2 3 4 5 6 7 8 9
A B C D E F G H I
1 Point:
Offspring 1: 1 2 3 4 E F G H I
Offspring 2: A B C D 5 6 7 8 9
2 Point:
Offspring 1: 1 2 3 4 E F G 8 9
Offspring 2: A B C D 5 6 7 H I
Uniform (Any Point):
Offspring 1: 1 2 C 4 E F 7 H 9
Offspring 2: A B 3 D 5 6 G 8 I
What is the 1/5th Success Rule
Helps balance exploration vs exploitation
If < 20% of mutations are good, increase strength to increase exploration. (complete Randomization mutation). Exploring too much
If > 20% of mutations are good, then decrease the strength to get a more fine-tuned solution. (Random Walk Mutation “incremental changes”) Exploiting Optima too much
What is a Mutation? What is a Tweak. Why use mutations?
The process of randomly altering an individuals representation (genotype). The alteration is called a tweak
The operator injects diversity in to the search process by making random small changes to candidate solutions
ES (Evolution Strategies) Style evolutionary loop. Tweak Operations? Selection Operations
Eval -> Resample -> Repeat
Group of parents selected (truncation selection) and produces offspring (can be only mutation too)
Full or partial replacement can occur
emphasizes adaptive mutation and self adaptation
GA (Genetic Algorithm) Evolutionary Loop. What are its 3 knobs. Teak Operation? Selection Algo?
Eval -> Resample -> Repeat
- Selection: choose individuals based on tournament selection or fitness-proportionate for reproduction. SUS
- Breeding: 2 Selected parents do crossover (+ mutation) to make new offspring
- Replacement: Generational Replacement (new replace old) or Merged with old
Iterative and Individual based.
Knobs: population size, selection pressure, mutation variance
ES (Evolution Strategies) VS GA (Genetic Algo)
Both Eval -> Resample -> Repeat and are Generational and have Elitist mechanisms available
GA Resampling: select parents iteratively and use crossover/mutation for children until new population filled in
ES Resampling: Done in bulk, parents selection (truncation selection) and usually only use mutation, then each make kids.
Fitness Proportionate Selection
Similar to roulette, Normalize fitness values of all individuals and random number selects individual proportional to normalized value (probability)
Tournament Selection
Set number of individuals with replacement (tournament size) are selected at random, best among them are chosen.
With replacement means individuals can be selected multiple times, making it more likely fit individuals are often selected.
Gradient Descent / Ascent
x←x + α∇f(x)
uses only the first derivative to update parameters.
Repeatedly add portions of slope, if slopes positive x increases, if slopes negative then it decreases
Newtons Method
Uses first and second (Hessian in n-d) to update parameters. Near a max, the 2nd deriv is negative
x←x−α ( ∇f(x) / f’‘(x) )
x₍ₙ₊₁₎ = xₙ–f ′(xₙ)/f ″(xₙ).
Newton’s Method vs Gradient Descent / Ascent
Newtons method converges at optimum more rapidly and lessens overshoot bc it adjusts step size on curvature
How would you perform genetic mutation on a variable length list, how can you change the length of the list? What crossover algo can do the same thing? How to do a mutation based on gene duplication
Delete a random subeq of the list. Or inspired by gene duplication, select a subseq of the list make a COPY of it and insert the copy somewhere else in the list. For
crossover, you might pick a random index N where N is the minimum length of two lists, and do one-point crossover at that point, trading the tails of the lists.
How would you do tournament selection with REAL VALUED tournament
sizes >= 1.0?
Since they are not integers, find the probability by subtracting floor[t] then run either the ciel or floor value according to that probability.
for a real-valued tournament size T >= 1.0,
with probability T - Floor[T]
do a tournament of size Ceiling[T]
else
do a tournament is size Floor[T]
Simulated Annealing
Uses the annealing schedule (temp starts high and lowers over time). At high temp times, the algo accepts both better and worse moves, allowing movement out of optima. As temp decreases, it mostly only accepts increases.
If better always replace, else sometimes replace with:
Prob(t, R, S) = e ^( (quality of child- quality of original) divided by time )
Starts with high exploration then moves to only exploitation. Knob = Temp. Can be global opt if high enough temp
Hill Climbing with random restarts
Hill Climbing gets stuck in local optima a lot, random restarts overcome this. Restart from a new starting point.
Hill Climbing:
Start with a sample and compute its fitness. Make a small tweak and compute its new fitness. If higher then tweaked version replaces original and repeat. Very low exploration and high exploitation. Knob = replacement period time
(1+1) Evolution Stratergy, how is mutation usually found
Uses single candidate solution and uses mutation to make 1 offspring at a time.
1. start with candidate solution S and set init mutation strength for gaussian convulsion operator
2. Mutate one offspring R by applying Gaussian convulsion to S
3. Compare fitness R to S, update S only if R is more fit
4. Optionally adjust the mutation strength using the 1/5th rule
5. Repeat
Knob is mutation strength, Bc mutation can be anything
Similar to hill climbing and guaranteed to be a global optimizer
(μ+λ) Evolution Strategy (ES). (1+λ).
Populate of μ generates λ offspring. Both parents and children compete to make next generation.
1. Generate initial population (can be random), larger λ means more exploration
2. Compute fitness for all individuals
3. combine parents w/ children and select μ best individuals
4. Parents make equal number of children using gaussian convolution for mutation
5. Loop
Knob is μ, λ
Similar to steepest ascent hill climbing but uses gaussian convulsion and guaranteed to be a global optimizer
Gaussian Convultion
with a probability p, choose a random number n from the normal distribution of σ^2 with a mean of 0
add value to respective element in vector to be convolved so new element is greater than min and less than max
Do this for every v in V