A few days ago i gave a talk on Reinforcement learning with a focus on Q-learning at Cookpads Tokyo office. https://www.meetup.com/tokyo-machine-learning-kitchen/events/242060161/
The main slides for the talk are here https://github.com/ashleysmart/mlgym/blob/master/qlearning/main_slides.html
I have been neglecting my blog lately so i figured i would convert the slides into a post so lets get started.
The Q function is a estimate of the systems potential value. It is accessed based on:
- The environment or state that the system is in
- The actions that can be taken from that state
- The rewards that can be acquired by performing the action
Q'(s_t, a_t)=Q(s_t,a_t)+\alpha\Big(r_t+\gamma\max_{a}Q(s_{t+1},a)-Q(s_t,a_t)\Big)
ReferenceError: katex is not defined
-
Q
ReferenceError: katex is not defined: The function that guess the total 'value' of rewards
-
Q
ReferenceError: katex is not defined: The new iteration of the 'value'
-
s_t
ReferenceError: katex is not defined: The “State” of the environment at time 't'
-
a_t
ReferenceError: katex is not defined: The “action” perform at time 't'
-
r_t
ReferenceError: katex is not defined: The “reward” received for the action at 't'
-
s_{t+1}
ReferenceError: katex is not defined: The “State” of the environment after action at time 't'
-
a
ReferenceError: katex is not defined: A possible action performed from state 't+1'
-
\alpha
ReferenceError: katex is not defined: The learning rate, how quickly to adjust when wrong. This limited between 0 and 1
-
\gamma
ReferenceError: katex is not defined: The discount rate, how important/trusted future rewards are. This limited between 0 and 1. and has a effect that can be considered as a EMA(exponential moving average)
Q'(s_t, a_t)=Q(s_t,a_t)+\alpha\Big(r_t+\gamma\max_{a}Q(s_{t+1},a)-Q(s_t,a_t)\Big)
ReferenceError: katex is not defined
Q'(s_t, a_t)
ReferenceError: katex is not defined
is in several places so we can group it together.. Q'(s_t, a_t)=(1-\alpha)Q(s_t, a_t)+\alpha\Big(r_t+\gamma\max_{a}Q(s_{t+1}, a)\Big)
ReferenceError: katex is not defined
Q_{target}=r_t+\gamma\max_{a}Q(s_{t+1}, a)
ReferenceError: katex is not defined
Q_{new}=(1-\alpha)Q_{target}+\alpha Q_{target}
ReferenceError: katex is not defined
Q_{target} \approx Q_{current} \approx Q_{new}
ReferenceError: katex is not defined
Q_{final} \approx Q_{target} = r_t+\gamma\max_{a} Q(s_{t+1},a)
ReferenceError: katex is not defined
"Q_{new}=(1-\alpha)Q_{current}+\alpha Q_{update}
ReferenceError: katex is not defined
- The forumla is iterative
- The is top down
Q_{update}=r_t+\gamma\max_{a}Q(s_{t+1}, a)
ReferenceError: katex is not defined
- This is the *local* best not the *global*
- It is a heuristic know in computer science as Greedy Optimization." },