Midterm Flashcards by Unknown Unknown

Backpropagation Steps. Input and Output Params

(1) plug input vector in, run it through until output value found
(2) use delta learning rule to modify weight values based on diff between expected and got output values

Input is Alpha, Inputs, Expected Outputs
(1) Use inputs through forward prop to get the expected values
(2) Apply found o to formula ∆Wij = α * (c - o) o (1-o) xj
(3) Add found ∆W to old W.

For 2 layer network ∆V = ∑ Wij ∆Wij (1 − hj) xk, use h instead of X for W.
V_2,2 = W_1,2 ∆W_1,2 (1-h_2)Xk, the summation means every i, so when only 1 input i always 1.

How well did you know this?

Not at all

Perfectly

Delta Learning Rule

Delta Wij = alpha (c - o) o (1 - o) xj where c, o, x are vectors. c and o are of i and X of j.

Each weight contributes some amount to the output. Want to change weight so output gets more accurate.
Move each weight individually down the gradient using alpha
Since we want to move down the error gradient, use:
-α ( ∂E/∂Wij ) -> -α ( ∂E/∂o ) ( ∂o/∂Wij)

When 2 layer:
same process as above, in addition to: also use h instead of x for layer 1.
∆Vjk = -α ∑ ∂E/∂o ∂o/∂h ∂h/∂Vjk

How well did you know this?

Not at all

Perfectly

Mean Squared Error

.5 ∑ (vec_c - vec_o)^2

How well did you know this?

Not at all

Perfectly

output vector equation

o = step (∑ Wij hj)

How well did you know this?

Not at all

Perfectly

McCulloch Pitts Network, how to solve a next step problem

Step function: X^(t+1) = step (∑ Wij Xj - Ui) where X_j is t
Multiply the weights of all edges by the value of all nodes, then subtract by the threshold. If more than 0 then its a 1 if less then 0.

How well did you know this?

Not at all

Perfectly

Forward Propagation Steps. What’s its input and output values.

(1) h = σ ( V i)
(2) o = σ (W h)

Use the sigmoid function to standardize the result of Matrix times input vector.
Generates an expected output for a given input

How well did you know this?

Not at all

Perfectly

Genetic Algorithms 3 Crossover Types: What is Crossover?

(1) 1 point, (2) 2 Point, (3) Uniform
Swapping all subtrees between 2 parent trees.
1 2 3 4 5 6 7 8 9
A B C D E F G H I

1 Point:
Offspring 1: 1 2 3 4 E F G H I
Offspring 2: A B C D 5 6 7 8 9

2 Point:
Offspring 1: 1 2 3 4 E F G 8 9
Offspring 2: A B C D 5 6 7 H I

Uniform (Any Point):
Offspring 1: 1 2 C 4 E F 7 H 9
Offspring 2: A B 3 D 5 6 G 8 I

How well did you know this?

Not at all

Perfectly

What is the 1/5th Success Rule

Helps balance exploration vs exploitation
If < 20% of mutations are good, increase strength to increase exploration. (complete Randomization mutation). Exploring too much

If > 20% of mutations are good, then decrease the strength to get a more fine-tuned solution. (Random Walk Mutation “incremental changes”) Exploiting Optima too much

How well did you know this?

Not at all

Perfectly

What is a Mutation? What is a Tweak. Why use mutations?

The process of randomly altering an individuals representation (genotype). The alteration is called a tweak

The operator injects diversity in to the search process by making random small changes to candidate solutions

How well did you know this?

Not at all

Perfectly

ES (Evolution Strategies) Style evolutionary loop. Tweak Operations? Selection Operations

Eval -> Resample -> Repeat
Group of parents selected (truncation selection) and produces offspring (can be only mutation too)
Full or partial replacement can occur
emphasizes adaptive mutation and self adaptation

How well did you know this?

Not at all

Perfectly

GA (Genetic Algorithm) Evolutionary Loop. What are its 3 knobs. Teak Operation? Selection Algo?

Eval -> Resample -> Repeat
- Selection: choose individuals based on tournament selection or fitness-proportionate for reproduction. SUS
- Breeding: 2 Selected parents do crossover (+ mutation) to make new offspring
- Replacement: Generational Replacement (new replace old) or Merged with old
Iterative and Individual based.
Knobs: population size, selection pressure, mutation variance

How well did you know this?

Not at all

Perfectly

ES (Evolution Strategies) VS GA (Genetic Algo)

Both Eval -> Resample -> Repeat and are Generational and have Elitist mechanisms available

GA Resampling: select parents iteratively and use crossover/mutation for children until new population filled in

ES Resampling: Done in bulk, parents selection (truncation selection) and usually only use mutation, then each make kids.

How well did you know this?

Not at all

Perfectly

Fitness Proportionate Selection

Similar to roulette, Normalize fitness values of all individuals and random number selects individual proportional to normalized value (probability)

How well did you know this?

Not at all

Perfectly

Tournament Selection

Set number of individuals with replacement (tournament size) are selected at random, best among them are chosen.

With replacement means individuals can be selected multiple times, making it more likely fit individuals are often selected.

How well did you know this?

Not at all

Perfectly

Gradient Descent / Ascent

x←x + α∇f(x)
uses only the first derivative to update parameters.
Repeatedly add portions of slope, if slopes positive x increases, if slopes negative then it decreases

How well did you know this?

Not at all

Perfectly

Newtons Method

Uses first and second (Hessian in n-d) to update parameters. Near a max, the 2nd deriv is negative
x←x−α ( ∇f(x) / f’‘(x) )
x₍ₙ₊₁₎ = xₙ–f ′(xₙ)/f ″(xₙ).

How well did you know this?

Not at all

Perfectly

Newton’s Method vs Gradient Descent / Ascent

Newtons method converges at optimum more rapidly and lessens overshoot bc it adjusts step size on curvature

How well did you know this?

Not at all

Perfectly

How would you perform genetic mutation on a variable length list, how can you change the length of the list? What crossover algo can do the same thing? How to do a mutation based on gene duplication

Delete a random subeq of the list. Or inspired by gene duplication, select a subseq of the list make a COPY of it and insert the copy somewhere else in the list. For
crossover, you might pick a random index N where N is the minimum length of two lists, and do one-point crossover at that point, trading the tails of the lists.

How well did you know this?

Not at all

Perfectly

How would you do tournament selection with REAL VALUED tournament
sizes >= 1.0?

Study These Flashcards

Since they are not integers, find the probability by subtracting floor[t] then run either the ciel or floor value according to that probability.
for a real-valued tournament size T >= 1.0,
with probability T - Floor[T]
do a tournament of size Ceiling[T]
else
do a tournament is size Floor[T]

Simulated Annealing

Study These Flashcards

Uses the annealing schedule (temp starts high and lowers over time). At high temp times, the algo accepts both better and worse moves, allowing movement out of optima. As temp decreases, it mostly only accepts increases.

If better always replace, else sometimes replace with:
Prob(t, R, S) = e ^( (quality of child- quality of original) divided by time )

Starts with high exploration then moves to only exploitation. Knob = Temp. Can be global opt if high enough temp

Hill Climbing with random restarts

Study These Flashcards

Hill Climbing gets stuck in local optima a lot, random restarts overcome this. Restart from a new starting point.
Hill Climbing:
Start with a sample and compute its fitness. Make a small tweak and compute its new fitness. If higher then tweaked version replaces original and repeat. Very low exploration and high exploitation. Knob = replacement period time

(1+1) Evolution Stratergy, how is mutation usually found

Study These Flashcards

Uses single candidate solution and uses mutation to make 1 offspring at a time.
1. start with candidate solution S and set init mutation strength for gaussian convulsion operator
2. Mutate one offspring R by applying Gaussian convulsion to S
3. Compare fitness R to S, update S only if R is more fit
4. Optionally adjust the mutation strength using the 1/5th rule
5. Repeat
Knob is mutation strength, Bc mutation can be anything
Similar to hill climbing and guaranteed to be a global optimizer

(μ+λ) Evolution Strategy (ES). (1+λ).

Study These Flashcards

Populate of μ generates λ offspring. Both parents and children compete to make next generation.
1. Generate initial population (can be random), larger λ means more exploration
2. Compute fitness for all individuals
3. combine parents w/ children and select μ best individuals
4. Parents make equal number of children using gaussian convolution for mutation
5. Loop
Knob is μ, λ
Similar to steepest ascent hill climbing but uses gaussian convulsion and guaranteed to be a global optimizer

Gaussian Convultion

Study These Flashcards

with a probability p, choose a random number n from the normal distribution of σ^2 with a mean of 0
add value to respective element in vector to be convolved so new element is greater than min and less than max
Do this for every v in V

Hamming Distance

For boolean vectors to find distance. Counts number of times 2 genes are different. d(i,j)= ∑ (i ⊕ j ) for i_k and j_k

Euclidean Distance

d = √ (∑(yi - vi1)^2

homologous crossover

When you do a crossover with yourself and get yourself: 1 point , 2 point, and uniform are good examples Standard GP Subtree Crossover is not homologous

Standard GP Subtree Crossover

1. Select random Subtree from parent 1. 2. Select random subtree from parent 2 3. swap them Non-homologous

Steepest Ascent Hill Climbing

provide n, which his the number of tweaks per iteration 1. choose a random solution s 2. loop until time over or solution reached 3. make R <- Tweaked s 4. for n-1 iterations set W = tweaked S and compare to R, replace if its better 5. finally compare r to s, setting r value if its better 6. After 2 loop end. Basically just hill climbing but sampling all around s instead of just choosing 1 new location

Steepest Ascent Hill Climbing With Replacement

Adds the global S variable, allowing the best to be stored there Don't compare r to s after, just store instead compare S at the end of each loop to store the best

What is Truncation Selection

Used in Evolution Strategies. (1) Select many starting samples. (2) Access fitness of all (3) Delete all from population but the top n samples

Evolutionary Programming

Similar to Evolution Strategies except for 2 things: (1) Only used (μ+λ) where (μ =λ) so half population removed and then replaced by children (2) Can be applied to any representation hence mutate can be adding/deleting/modifying an Edge or node

epistatis

when genes are functionally intertwined

Stochastic Universal Sampling

Variant of Fitness Proportionate Selection, biased so fit individuals always picked at least once. 1. Compute total fitness (sum of all) 2. Step = Total / #individuals 3. pick starting point in range [0,step] 4. place pointers from start point + xStep 5. take respective individuals as selected

What does it mean to have a similar genotype

Near each other in space with respect to a tweak operation

Encoding and Decoding

Phenotype -> genotype is encoding. reverse is decoding

Hamming Cliff

Where to a make a small change in phenotype, you must make a large change to genotype. Binary example 7-> 8 is same as 0111 to 1000

Gray Code

Each phenotype encoding differs by only 1 bit. v = original binary encoding w = v from i = 2 -> length if v-1 is true then set it w's respective bit to false

Integer mutation when the numbers correspond to labels

Change change the int value to another legal one, when it hits the probability check

When mutating integers that represent metric space

If probability check: either add or subtract 1 / -1 from value until coin toss prob < rand value 0 -> 1

point mutations

pick 1 or n random genes and just mutate those, not a global operator, can only move horizontally

recombination, Genetic algo, lines, genetic programming

similar to crossover line recombination takes a linear combination of corresponding genes. Draws line between 2 parent points and picks a new one on that line. Takes in 2 vectors and constant p α and β are random values from -p to 1+p the 2 offspring are calculated by: t = α·v_i + (1–α)·w_i t = β·v_i + (1– β)·w_i

How to draw a Mealy Machine

Label each internal state as a node Label each edge with the external state ( in a) and action ( do something). Edge goes from starting int state to ending

Moore Machine

If internal state is (NODE) -> and external state is (EDGE) -> Change to this internal state and do this action (NODE2)

Phenotype

How individual operates during fitness assessment

Kohonen's Algo

Competitive Learning, winning + nearby updated as opposed to winning only. delta Wi*j = alpha(Xj - Wi*j)

Midterm Flashcards

(47 cards)