The Coding Slim Jim: 2017

Thursday, August 17, 2017

Q learning - and its core flaws

A few days ago i gave a talk on Reinforcement learning with a focus on Q-learning at Cookpads Tokyo office. https://www.meetup.com/tokyo-machine-learning-kitchen/events/242060161/

The main slides for the talk are here https://github.com/ashleysmart/mlgym/blob/master/qlearning/main_slides.html

I have been neglecting my blog lately so i figured i would convert the slides into a post so lets get started.

The Q function is a estimate of the systems potential value. It is accessed based on:

The environment or state that the system is in
The actions that can be taken from that state
The rewards that can be acquired by performing the action

The Q function is the target function that we are trying to learn and it can be implemented as a neural network, table or some other standard linear regression machine learning system or function. The textbook formula:

Q'(s_t, a_t)=Q(s_t,a_t)+\alpha\Big(r_t+\gamma\max_{a}Q(s_{t+1},a)-Q(s_t,a_t)\Big)

Where:

```
Q
```
: The function that guess the total 'value' of rewards
```
Q
```
: The new iteration of the 'value'
```
s_t
```
: The “State” of the environment at time 't'
```
a_t
```
: The “action” perform at time 't'
```
r_t
```
: The “reward” received for the action at 't'
```
s_{t+1}
```
: The “State” of the environment after action at time 't'
```
a
```
: A possible action performed from state 't+1'
```
\alpha
```
: The learning rate, how quickly to adjust when wrong. This limited between 0 and 1
```
\gamma
```
: The discount rate, how important/trusted future rewards are. This limited between 0 and 1. and has a effect that can be considered as a EMA(exponential moving average)

Of course everyone understood that... If we manipulate the formula a bit how it works becomes much more clear Starting with the textbook form:

Q'(s_t, a_t)=Q(s_t,a_t)+\alpha\Big(r_t+\gamma\max_{a}Q(s_{t+1},a)-Q(s_t,a_t)\Big)

We notice that

Q'(s_t, a_t)

is in several places so we can group it together..

Q'(s_t, a_t)=(1-\alpha)Q(s_t, a_t)+\alpha\Big(r_t+\gamma\max_{a}Q(s_{t+1}, a)\Big)

Then we can group the non-Q terms into a common item

Q_{target}=r_t+\gamma\max_{a}Q(s_{t+1}, a)

And Finally we have something very clear

Q_{new}=(1-\alpha)Q_{target}+\alpha Q_{target}

As you can see alpha is acting as a ratio to merge the current value of Q with target value of Q and new info for the next iteration Also given that Q-learning does percential mixes 2 numbers to produce a third then when learning is complete and stable all 3 parts will match ie:

Q_{target} \approx Q_{current} \approx Q_{new}

So the core of what it learns is:

Q_{final} \approx Q_{target} = r_t+\gamma\max_{a} Q(s_{t+1},a)

This is just an recursive formula and com sci guys can often will instantly associate dynamic programming and tree diagrams with it. Ok so now lets have a look at how this works solutions to problems Each circle represents a world state, the number on the left are the instant reward for that state and the the number on the right is the current Q value for that state.

But there is of course a problem: Pay very close attention to the formulas..

"Q_{new}=(1-\alpha)Q_{current}+\alpha Q_{update}

Note that:

The forumla is iterative
The is top down

Also:

Q_{update}=r_t+\gamma\max_{a}Q(s_{t+1}, a)

Note carefully the effect and scope of the “max”

This is the *local* best not the *global*
It is a heuristic know in computer science as Greedy Optimization." },

These are the key flaws in this algorithm. So what really happens is:

Saturday, March 18, 2017

Setting up basic Machine Learning rig - Ubuntu

For ubuntu
https://www.tensorflow.org/install/install_linux

First install python and pip

sudo apt-get install python python-pip python-dev
sudo pip install --upgrade pip virtualenv

--- WITH A GPU ---

Install the GPU if not already done as described here:
https://www.pugetsystems.com/labs/hpc/NVIDIA-CUDA-with-Ubuntu-16-04-beta-on-a-laptop-if-you-just-cannot-wait-775/

Verify that you even have a usable gpu with (it must be one compatiable with cuda

lspci | grep -i nvidia

Install Cuda drivers

Remove prior installs (if you have a problem with it)

sudo apt-get purge nvidia-cuda* 
sudo apt-get install cuda

download the recent cuda drivers from
https://developer.nvidia.com/cuda-downloads

install the drivers

chmod 755 cuda_7.5.18_linux.run
sudo ./cuda_7.5.18_linux.run --override

Confirm setup

which nvcc 
nvcc --version
nvidia-smi

Output should be something like

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2015 NVIDIA Corporation
Built on Tue_Aug_11_14:27:32_CDT_2015
Cuda compilation tools, release 7.5, V7.5.17


Sat Mar 18 14:16:58 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.57                 Driver Version: 367.57                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 970M    Off  | 0000:01:00.0     Off |                  N/A |
| N/A   55C    P0    22W /  N/A |    586MiB /  3016MiB |      8%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1144    G   /usr/lib/xorg/Xorg                             366MiB |
|    0      1922    G   compiz                                         111MiB |
|    0      2302    G   ...bled/ExtensionDeveloperModeWarning/Defaul   107MiB |
+-----------------------------------------------------------------------------+

Install cudnn drivers
http://askubuntu.com/questions/767269/how-can-i-install-cudnn-on-ubuntu-16-04

Download the drivers
https://developer.nvidia.com/cudnn

Locate where your cuda installation is. it is /usr/lib/... and /usr/include or /urs/local/cuda/.

which nvcc 
ldconfig -p | grep cuda

Step 3: Copy the files:

cd extracted_driver/
sudo cp -P include/cudnn.h /usr/include
sudo cp -P lib64/libcudnn* /usr/lib/x86_64-linux-gnu/
sudo chmod a+r /usr/lib/x86_64-linux-gnu/libcudnn*

Confirm setup

ldconfig -p | grep cudnn

should be something like:

libcudnn.so.5 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn.so.5
libcudnn.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcudnn.so

--- INSTALL KEY TOOLS ---

Install Machine learning essentials

sudo pip install numpy
sudo pip install pandas
sudo pip install scikit-learn
sudo pip install jupyter
sudo pip install xgboost

Now you can install tensorflow with GPU as follows

sudo pip install tensorflow-gpu
sudo pip install keras

Or without:

sudo pip install tensorflow
sudo pip install keras