So one of the primary rules of thumb when working with this stuff is start simple. You need to digest the dataset and then make incremental steps forward... its easy however to be making steps backwards without knowing it.
Now regression is just curve fitting. The simplest way to do this is just a direct analytic solution. no gradient descent, nothing more complex, but this approach has limits that quickly become clear when dealing with larger datasets.
So lets solve analytic regression for a parameterized family of functions and then codify it... you'll note i have some quirks to the way i do things.. i try to avoid matrix maths since i think it hides too much and fails to readily when you get beyond 2d axis of data
Define the model:
\hat y_i := \sum_k w_k f(x_i;k)
Define the loss:
MSE = \sum_i(\hat y_i - y_i)^2
Set target (when mse gradient is 0 the error is at its min for w of 0 to n):
\frac{\partial MSE}{\partial w} = 0
Sub the model into the target equation for each "n" derviate term:
= \frac{\partial}{\partial w_n} \Big( \sum_i( \sum_k w_k f(x_i;k) - y_i)^2 \Big)
Expand square (note not expanding the sum of k treat it as a single term):
= \frac{\partial}{\partial w_n} \Big( \sum_i\big( (\sum_k w_k f(x_i;k))^2 -2 y_i \sum_k w_k f(x_i;k) + y_i^2 \big) \Big)
Transfer outer sum to terms:
= \frac{\partial}{\partial w_n} \Big( \sum_i(\sum_k w_k f(x_i;k))^2 - \sum_i 2 y_i \sum_k w_k f(x_i;k) + \sum_iy_i^2 \Big)
Transfer derivative to terms and expand/rebase squares:
= \frac{\partial}{\partial w_n} \sum_i(\sum_k w_k f(x_i;k))(\sum_j w_j f(x_i;j)) - \frac{\partial}{\partial w_n} \sum_i \sum_k 2 y_i w_k f(x_i;k) + \frac{\partial}{\partial w_n} \sum_iy_i^2
Do simple derivatives to the 2 right most terms:
* Note that the partial derivative of a sum drops all terms unless n=k
* Note that sum of y's looks like a constant and the derivate of a constant is 0
= \sum_i\frac{\partial}{\partial w_n} (\sum_k w_k f(x_i;k))(\sum_j w_j f(x_i;j)) - \sum_i 2 y_i f(x_i;n)
Start applying the derivative product rule:
= \sum_i\sum_k w_k f(x_i;k) \frac{\partial}{\partial w_n}(\sum_j w_j f(x_j;j)) + \sum_i\sum_j w_j f(x_j;j) \frac{\partial}{\partial w_n}(\sum_k w_k f(x_i;k)) - \sum_i 2 y_i f(x_i;n)
Continue the derivative product rules:
= \sum_i\sum_k w_k f(x_i;k) f(x_i;n) + \sum_i\sum_j w_j f(x_i;j) f(x_i;n)) - \sum_i 2 y_i f(x_i;n)
Rebase vars into common ones:
= \sum_i\sum_j w_j f(x_i;j) f(x_i;n) + \sum_i\sum_j w_j f(x_i;j) f(x_i;n)) - \sum_i 2 y_i f(x_i;n)
Simply:
= 2 \sum_i\sum_j w_j f(x_i;j) f(x_i;n) - 2 \sum_i y_i f(x_i;n)
Equate for a linear solver implementation (recall there are now "n" of these equations):
\sum_i\sum_j w_j f(x_i;j) f(x_i;n) = \sum_i y_i f(x_i;n)
Now the above maths when codified produces the following code:
#!/usr/bin/env python # analytic regression for a general linear combination of functions import random import numpy as np import matplotlib.pyplot as plt # convert the maths into a function # note this is likely weak in respect to: # - overflow (because we are summing numbers) # - large amounts of data (no batching etc) # etc def analytic_regress(x,y,params,func): size = len(params) left = np.zeros((size,size)) right = np.zeros(size) for i in range(size): right[i] = np.sum(y*func(x,params[i])) for j in range(size): left[i,j] = np.sum(func(x,params[i])*func(x,params[j])) w = np.linalg.solve(left, right) model = lambda x: (np.sum(w * np.array([func(x,i) for i in params]))) return w, model ### testing ### def generate(func, x_min, x_max, yerr, count): x = np.random.uniform(x_min, x_max, count) y = np.array([func(xi) for xi in x]) + np.random.uniform(0,yerr,count) return x,y # basis functions to trial def ploy(x,n): return x**n def cossin(x,n): if (n < 0): return np.sin(-n * x) return np.cos(n * x) def sigmoid(x,n): return 1.0/(1.0 + np.exp(n-x)) def guassian(x,n): return np.exp(-0.5*(x-n)**2) def square(x,n,w): return (1.0*np.logical_and(x >= n, x < n+w)) # a general function to trial a few regressions def test_set(func, x_min, x_max, y_error, count, basis, params, x_test): x,y = generate(func, x_min, x_max, y_error, count) models = {} for p in params: w, models[p] = analytic_regress(x,y,range(p),basis) print "model basis up to:", p, " has w:" print w y_test = {} for p in params: model = models[p] y_test[p] = np.array([model(i) for i in x_test]) plt.plot(x , y , "o") for p in params: plt.plot(x_test, y_test[p]) plt.show() # a general funtion to trial a single regression def test_one(func, x_min, x_max, y_error, count, basis, params, x_test): x,y = generate(func, x_min, x_max, y_error, count) w, model = analytic_regress(x,y, params, basis) print w y_test = np.array([model(i) for i in x_test]) plt.plot(x , y , "o") plt.plot(x_test, y_test) plt.show() print " ploy regress for function y = 78 - 2x" test_set(lambda x: (78.0 - 2.0*x), 0.0, 6.0, 6, 100, ploy, [2,3,5,9,11], np.array(range(-1, 70))/10.0) print " ploy regress for function y = 78 + 60x - 13x^2" test_set(lambda x: (78.0 + 60.0*x - 13.0*x**2), 0.0, 6.0, 6, 100, ploy, [2,3,5,9,11], np.array(range(-2, 72))/10.0) print "square regress for function y = 1 iff 1 < x < 6 othewise 0" test_one(lambda x: (0 if (x < 1 or x > 6) else 1), 0.0, 7.0, 0.4, 100, lambda x,n: (square(x,n,0.5)), np.array(range(14))/2.0, np.array(range(-50, 150))/10.0) print "square regress for function y = 10*cos(2x)" test_one(lambda x: 10*np.cos(2.0*x), 0.0, 7.0, 0.4, 100, lambda x,n: (square(x,n,0.5)), np.array(range(14))/2.0, np.array(range(-50, 150))/10.0) print "sigmod regress for function y = 10*cos(2x)" test_one(lambda x: 10*np.cos(2.0*x), 0.0, 7.0, 0.4, 100, sigmoid, np.array(range(14))/2.0, np.array(range(-50, 150))/10.0) print "guassian regress for function y = 10*cos(2x)" test_one(lambda x: 10*np.cos(2.0*x), 0.0, 7.0, 0.4, 100, guassian, np.array(range(14))/2.0, np.array(range(-50, 150))/10.0) print "cos/sin regress for function y = 10*cos(2x)" test_one(lambda x: 10*np.cos(2.0*x), 0.0, 7.0, 0.4, 100, cossin, np.array(range(-6,8))/2.0, np.array(range(-50, 150))/10.0)
And then the various outputs...
The regression using using a family of polynomials on a linear equation:
The regression using using a family of polynomials on a quadratic:
The regression using square bars on a square bar:
The regression using square bars on a cos:
The regression using guassians on a cos:
The regression using sigmoids on a cos:
The regression using sin and cos on a cos: