Wednesday, October 26, 2016

analytic regression in python using a parameterized family of functions

Im have been studying ML(machine learning). So here is the deal.. ML is just applied statics. You take a bunch of data and you fit a model to it. The data affords you a certain upper limit of accuracy compared to the real world. Your chosen model then subtracts a certain percentage off the top. The cost of abstraction is an imperfect fit. Simple.

So one of the primary rules of thumb when working with this stuff is start simple. You need to digest the dataset and then make incremental steps forward... its easy however to be making steps backwards without knowing it.

Now regression is just curve fitting. The simplest way to do this is just a direct analytic solution. no gradient descent, nothing more complex, but this approach has limits that quickly become clear when dealing with larger datasets.

So lets solve analytic regression for a parameterized family of functions and then codify it... you'll note i have some quirks to the way i do things.. i try to avoid matrix maths since i think it hides too much and fails to readily when you get beyond 2d axis of data

Define the model:
\hat y_i := \sum_k w_k f(x_i;k)

Define the loss:
MSE = \sum_i(\hat y_i - y_i)^2

Set target (when mse gradient is 0 the error is at its min for w of 0 to n):
\frac{\partial MSE}{\partial w} = 0

Sub the model into the target equation for each "n" derviate term:
= \frac{\partial}{\partial w_n} \Big( \sum_i( \sum_k w_k f(x_i;k) - y_i)^2 \Big)

Expand square (note not expanding the sum of k treat it as a single term):
= \frac{\partial}{\partial w_n} \Big( \sum_i\big( (\sum_k w_k f(x_i;k))^2 -2 y_i \sum_k w_k f(x_i;k) + y_i^2 \big) \Big)

Transfer outer sum to terms:
= \frac{\partial}{\partial w_n} \Big( \sum_i(\sum_k w_k f(x_i;k))^2 - \sum_i 2 y_i \sum_k w_k f(x_i;k) + \sum_iy_i^2 \Big)

Transfer derivative to terms and expand/rebase squares:
= \frac{\partial}{\partial w_n} \sum_i(\sum_k w_k f(x_i;k))(\sum_j w_j f(x_i;j)) - \frac{\partial}{\partial w_n} \sum_i \sum_k 2 y_i w_k f(x_i;k) + \frac{\partial}{\partial w_n} \sum_iy_i^2

Do simple derivatives to the 2 right most terms:
* Note that the partial derivative of a sum drops all terms unless n=k
* Note that sum of y's looks like a constant and the derivate of a constant is 0
= \sum_i\frac{\partial}{\partial w_n} (\sum_k w_k f(x_i;k))(\sum_j w_j f(x_i;j)) - \sum_i 2 y_i f(x_i;n)

Start applying the derivative product rule:
= \sum_i\sum_k w_k f(x_i;k) \frac{\partial}{\partial w_n}(\sum_j w_j f(x_j;j)) + \sum_i\sum_j w_j f(x_j;j) \frac{\partial}{\partial w_n}(\sum_k w_k f(x_i;k)) - \sum_i 2 y_i f(x_i;n)

Continue the derivative product rules:
= \sum_i\sum_k w_k f(x_i;k) f(x_i;n) + \sum_i\sum_j w_j f(x_i;j) f(x_i;n)) - \sum_i 2 y_i f(x_i;n)

Rebase vars into common ones:
= \sum_i\sum_j w_j f(x_i;j) f(x_i;n) + \sum_i\sum_j w_j f(x_i;j) f(x_i;n)) - \sum_i 2 y_i f(x_i;n)

Simply:
= 2 \sum_i\sum_j w_j f(x_i;j) f(x_i;n) - 2 \sum_i y_i f(x_i;n)

Equate for a linear solver implementation (recall there are now "n" of these equations):
\sum_i\sum_j w_j f(x_i;j) f(x_i;n) = \sum_i y_i f(x_i;n)

Now the above maths when codified produces the following code:

#!/usr/bin/env python

# analytic regression for a general linear combination of functions

import random
import numpy as np
import matplotlib.pyplot as plt

# convert the maths into a function
# note this is likely weak in respect to:
#  - overflow (because we are summing numbers)
#  - large amounts of data (no batching etc)
#   etc
def analytic_regress(x,y,params,func):
    size = len(params)
    left  = np.zeros((size,size))
    right = np.zeros(size)

    for i in range(size):
        right[i] = np.sum(y*func(x,params[i]))

        for j in range(size):
            left[i,j] = np.sum(func(x,params[i])*func(x,params[j]))

    w = np.linalg.solve(left, right)

    model = lambda x: (np.sum(w * np.array([func(x,i) for i in params])))
    return w, model

### testing ###

def generate(func,
             x_min, x_max,
             yerr,
             count):
    x = np.random.uniform(x_min, x_max, count)
    y = np.array([func(xi) for xi in x]) + np.random.uniform(0,yerr,count)

    return x,y

# basis functions to trial
def ploy(x,n):
    return x**n

def cossin(x,n):
    if (n < 0):
        return np.sin(-n * x)
    return np.cos(n * x)

def sigmoid(x,n):
    return 1.0/(1.0 + np.exp(n-x))

def guassian(x,n):
    return np.exp(-0.5*(x-n)**2)

def square(x,n,w):
    return (1.0*np.logical_and(x >= n, x < n+w))

# a general function to trial a few regressions
def test_set(func,
             x_min, x_max, y_error, count,
             basis, params,
             x_test):

    x,y = generate(func, x_min, x_max, y_error, count)

    models = {}
    for p in params:
        w, models[p]  = analytic_regress(x,y,range(p),basis)
        print "model basis up to:", p, " has w:"
        print w

    y_test = {}
    for p in params:
        model = models[p]
        y_test[p]  = np.array([model(i) for i in x_test])

    plt.plot(x , y , "o")
    for p in params:
        plt.plot(x_test, y_test[p])

    plt.show()

# a general funtion to trial a single regression
def test_one(func,
             x_min, x_max, y_error, count,
             basis, params,
             x_test):
    x,y = generate(func, x_min, x_max, y_error, count)

    w, model = analytic_regress(x,y, params, basis)
    print w

    y_test = np.array([model(i) for i in x_test])

    plt.plot(x , y , "o")
    plt.plot(x_test, y_test)
    plt.show()

print " ploy regress for function y = 78 - 2x"
test_set(lambda x: (78.0 - 2.0*x),
         0.0, 6.0, 6, 100,
         ploy, [2,3,5,9,11],
         np.array(range(-1, 70))/10.0)

print " ploy regress for function y = 78 + 60x - 13x^2"
test_set(lambda x: (78.0 + 60.0*x - 13.0*x**2),
         0.0, 6.0, 6, 100,
         ploy, [2,3,5,9,11],
         np.array(range(-2, 72))/10.0)

print "square regress for function y = 1 iff 1 < x < 6 othewise 0"
test_one(lambda x: (0 if (x < 1 or x > 6) else 1),
         0.0, 7.0, 0.4, 100,
         lambda x,n: (square(x,n,0.5)), np.array(range(14))/2.0,
         np.array(range(-50, 150))/10.0)

print "square regress for function y = 10*cos(2x)"
test_one(lambda x: 10*np.cos(2.0*x),
         0.0, 7.0, 0.4, 100,
         lambda x,n: (square(x,n,0.5)), np.array(range(14))/2.0,
         np.array(range(-50, 150))/10.0)

print "sigmod regress for function y = 10*cos(2x)"
test_one(lambda x: 10*np.cos(2.0*x),
         0.0, 7.0, 0.4, 100,
         sigmoid, np.array(range(14))/2.0,
         np.array(range(-50, 150))/10.0)

print "guassian regress for function y = 10*cos(2x)"
test_one(lambda x: 10*np.cos(2.0*x),
         0.0, 7.0, 0.4, 100,
         guassian, np.array(range(14))/2.0,
         np.array(range(-50, 150))/10.0)

print "cos/sin regress for function y = 10*cos(2x)"
test_one(lambda x: 10*np.cos(2.0*x),
         0.0, 7.0, 0.4, 100,
         cossin, np.array(range(-6,8))/2.0,
         np.array(range(-50, 150))/10.0)

And then the various outputs...

The regression using using a family of polynomials on a linear equation:

The regression using using a family of polynomials on a quadratic:

The regression using square bars on a square bar:

The regression using square bars on a cos:


The regression using guassians on a cos:

The regression using sigmoids on a cos:

The regression using sin and cos on a cos:

Find the n shortest paths from a to b

// compile with: g++ --std=c++11 multipath.cpp 
//
// Problem Statement:
//  find the n shortest paths from a to b

// Answer:
//  The proposer of the question seemed to think this was hard
//  and even stated many more limitations to the question(no cycles etc)..
//  but i took it as dead easy. Since this inst a challenge of efficiency
//  ill stop at the first pass naive recursive answer..

#include <utility>
#include <stdint.h>
#include <vector>
#include <map>
#include <memory>
#include <iostream>

struct Node
{
    uint32_t id_;
    std::vector<std::pair<Node*,uint32_t> > kids_;

    Node(uint32_t id) :
        id_(id)
    {}
};

struct Graph
{
    // meh should be better..
    std::map<uint32_t, std::shared_ptr<Node> > nodes_;

    Node* get(uint32_t id)
    {
        Node* n = nodes_[id].get();

        if (n == NULL)
        {
            nodes_[id] = std::make_shared<Node>(id);
            n = nodes_[id].get();
        }

        return n;
    }

    void link(uint32_t fromId,
              uint32_t toId,
              uint32_t weight)
    {
        Node* from = nodes_[fromId].get();
        if (from == NULL)
        {
            nodes_[fromId] = std::make_shared<Node>(fromId);
            from = nodes_[fromId].get();
        }

        Node* to = nodes_[toId].get();
        if (to == NULL)
        {
            nodes_[toId] = std::make_shared<Node>(toId);
            to = nodes_[toId].get();
        }

        from->kids_.emplace_back(std::make_pair(to,weight));
    }
};

struct PathNode
{
    std::shared_ptr<PathNode> parent_;
    Node*      node_;
    uint32_t   cost_;

    PathNode(std::shared_ptr<PathNode> parent,
             Node*      node,
             uint32_t   cost) :
        parent_(parent),
        node_(node),
        cost_(cost)
    {}
};

std::ostream& operator<<(std::ostream& os,
                         const PathNode& path)
{
    // inverted.. but it meets my needs
    if (path.parent_.get() != NULL)
    {
        const PathNode& p = *(path.parent_);
        os << p;
        os << "-";
    }

    if (path.node_ != NULL)
        os << path.node_->id_;

    return os;
}


// ************ NAIVE APPROACH ***********
// WARNING do be careful here.. there are some large weaknesses
// in the Naive version:
//  - negative weights will break the shortest first ordering
//  - cycles that trap the search in disconnected areas will never terminate
//
// A stronger algo would fix this by checking reachability from the node and
// pruning the impossible branches from the tree (in fact that would be my next
// step in optimization as it also saves cycles.. but im here for a different
// target so not going there.. a more simple fix would be to add a cycle
// limit and terminate on that..)

std::vector<std::shared_ptr<PathNode> > findPaths(Node*    from,
                                                  Node*    to,
                                                  uint32_t limit)
{
    std::vector<std::shared_ptr<PathNode> > results;
    auto comp = [] (const std::shared_ptr<PathNode>& a,
                    const std::shared_ptr<PathNode>& b) -> bool
        { return a->cost_ > b->cost_; };

    // I was using a std::priority_queue.. problem is i
    // cant print it to demonstrate the stack action to
    // the question proposer
    std::vector<std::shared_ptr<PathNode> > stack;

    // ok start the algo by adding the *from* node into a path.. and stacking it
    stack.push_back(std::make_shared<PathNode>(nullptr,
                                               from,
                                               0));

    while (results.size() < limit and not stack.empty())
    {
        // show the stack...
        // std::cout << "  stack: \n";
        // for (const std::shared_ptr<PathNode>& path : stack)
        //     std::cout << " -- c:" << path->cost_ << " p:" << *path << "\n";

        std::pop_heap(stack.begin(), stack.end(), comp);
        std::shared_ptr<PathNode> current = stack.back();
        stack.pop_back();

        // check if node was at the terminal
        if (current->node_ == to)
        {
            // register it as the next result
            results.push_back(current);

            // and end if we have our "limit" of solutions
            if (results.size() > limit)
                return results;
        }

        // and then expand search tree using the current node
        for (const std::pair<Node*,uint32_t>& edge : current->node_->kids_)
        {
            std::shared_ptr<PathNode> step =
                std::make_shared<PathNode>(current,
                                           edge.first,
                                           current->cost_ + edge.second);

            // now here is a tricky part.. the next shortest route may
            // include the current one even if it is a *terminal* (ie self loop)
            // hence we *always* push even if its the *end*!
            stack.push_back(step);
            std::push_heap(stack.begin(), stack.end(), comp);
        }
    }

    // ok we fell of the end of the stack that means there are not "limit" solutions
    return results;
}

void printPaths(std::ostream& os,
                const std::vector<std::shared_ptr<PathNode> >& paths)
{
    os << "Results...\n";
    for (const std::shared_ptr<PathNode>& path : paths)
    {
        os << "d:" << path->cost_ << " p:" << *path << "\n";
    }
}

void test()
{
    {
        // warning naive algo death - dont do this with a cycle it will never terminate
        Graph disconnected;

        Node* to   = disconnected.get(0);
        Node* from = disconnected.get(1);

        printPaths(std::cout, findPaths(from,to,3));
    }

    {
        // self
        Graph self;
        self.link(0,0,1);

        Node* zero = self.get(0);

        printPaths(std::cout, findPaths(zero,zero,3));
    }

    {
        // cycle
        Graph cycle;
        cycle.link(0,1,1);
        cycle.link(1,0,1);

        Node* from = cycle.get(0);
        Node* to   = cycle.get(1);

        printPaths(std::cout, findPaths(from,to,3));
    }

    {
        // a line of nodes ( asking for 3 anwers.. impossible)
        Graph oneLine;
        oneLine.link(0,1,1);
        oneLine.link(1,2,1);
        oneLine.link(2,3,1);

        Node* from = oneLine.get(0);
        Node* to   = oneLine.get(3);

        printPaths(std::cout, findPaths(from,to,3));
    }

    {
        // 2 lines of nodes ( asking for 3 anwers.. impossible)
        Graph twoLine;
        twoLine.link(0,1,1);
        twoLine.link(1,2,1);
        twoLine.link(2,3,1);
        twoLine.link(0,4,2);
        twoLine.link(4,3,2);

        Node* from = twoLine.get(0);
        Node* to   = twoLine.get(3);

        printPaths(std::cout, findPaths(from,to,3));
    }

    {
        // the questioners challenge graph
        Graph triangle;
        triangle.link(1,2,1);
        triangle.link(1,3,2);

        triangle.link(2,1,1);
        triangle.link(2,3,3);
        triangle.link(2,4,2);
        triangle.link(2,5,1);

        triangle.link(3,1,2);
        triangle.link(3,2,3);
        triangle.link(3,5,2);
        triangle.link(3,6,1);

        triangle.link(4,2,2);
        triangle.link(4,5,3);

        triangle.link(5,2,1);
        triangle.link(5,3,2);
        triangle.link(5,4,3);
        triangle.link(5,6,3);

        triangle.link(6,5,3);
        triangle.link(6,3,1);

        Node* from = triangle.get(1);
        Node* to   = triangle.get(6);

        printPaths(std::cout, findPaths(from,to,10));
    }
}

int main()
{
    test();
}




And output will look like this:
Results...
Results...
d:0 p:0
d:1 p:0-0
d:2 p:0-0-0
Results...
d:1 p:0-1
d:3 p:0-1-0-1
d:5 p:0-1-0-1-0-1
Results...
d:3 p:0-1-2-3
Results...
d:3 p:0-1-2-3
d:4 p:0-4-3
Results...
d:3 p:1-3-6
d:5 p:1-2-3-6
d:5 p:1-2-1-3-6
d:5 p:1-2-5-6
d:5 p:1-3-6-3-6
d:5 p:1-2-5-3-6
d:7 p:1-2-1-2-5-6
d:7 p:1-2-1-2-1-3-6
d:7 p:1-2-3-6-3-6
d:7 p:1-2-5-2-1-3-6

Wednesday, October 5, 2016

Adding math rendering via katex to blogger

First edit the Html version of the blogs template and add the following code before the end of the body

<!--KATEX RENDER BEGIN -->
<link href="https://khan.github.io/KaTeX/bower_components/katex/dist/katex.min.css" type="text/css" rel="stylesheet"/>
<script src="https://khan.github.io/KaTeX/bower_components/katex/dist/katex.min.js" type="text/javascript"></script>
<script language='javascript'>
//<![CDATA[ 
function renderLatex()
{
    var locs = document.getElementsByTagName("pre");
    for (var x = 0; x < locs.length; x++)
    {
        if (locs[x].className == "render:latex")
        {
            var eq =  locs[x].innerHTML;
            var div=document.createElement("div");
            div.className = "latexRendered";
            locs[x].parentNode.insertBefore(div,locs[x].nextSibling);

            try
            {
                katex.render("\\displaystyle{" + eq + "}", div);
                locs[x].className = "done:latex";
                locs[x].style = "display:none";
            }
            catch (err)
            {
                div.innerHTML = err;
                locs[x].className = "error:latex";
            }
        }
    }
}

renderLatex();
//]]>
</script>
<!-- KATEX RENDER END -->

You then blog using html tags like the following

<pre class="render:latex">y_i = \sum_iw_ix_i</pre>

And it should render like this

y_i = \sum_iw_ix_i