Week 8 - reinforcement learning Flashcards

Question 1

Q

formula for qt(A)
estimates of average rewards in k armed bandits

Answer

A

(sum of rewards of A before t ) / no of times A occurred before t

Question 2

Q

exploitation vs exploration

Answer

A

Exploitation means:
“I know a₂ gives good rewards, so I’ll keep using it.”

Exploration means:
“Maybe a₁ just had bad luck before. If I try it again, it might actually be better than a₂.”

Question 3

Q

just read

Answer

A

The challenge is balancing the two:

If you exploit too early, you might miss better options.

If you explore too much, you waste time on suboptimal choices.

Question 4

Q

how greedy method works

Answer

A

so essentially greedy method :
at beginning it initialises all actions to have an estimate of 0
if theres a tie between maximum actions it randomly chooses betwen those maximum actions that tied
since everything is 0 at beginning ( everything ties) so it randomly picks from one of these actions
if the estimate it returns is higher than 0 it will keep exploiting and wont stop picking that action until ( the chance occurs that) the estimate returned from the action is lower than 0

Therefore greedy has a bad balance - it just exploits and has little to no exploration

Question 5

Q

how epsilon greedy works

Answer

A

how epsilon greedy works:

it has same inital setup as greedy but this time with probability:

ε (epsilon) it randomly chooses from amongst the actions ( and returns the random estimate)

with 1 - ε it behaves greedily

There is now a balance between exploration and exploitation meaning that you are eventually likely to find the optimal action

Question 6

Q

UCB formula

Answer

A

arg max [ Qt(a) + c sqrt ( lnt / Nta)]

Question 7

Q

just read

Answer

A

oh so essentially the thing with UCB if something looked bad ie its confidence bound was low and it was ignored for a long time its confidence bound would rise until we tried again and we would test it

this way we explore with UCB and e greedy

Week 8 - reinforcement learning Flashcards

(7 cards)