The Coding Slim Jim: May 2010

Monday, May 31, 2010

random number generator

Problem: Using a function(rand5) which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.

Solution:
You cant add add random numbers it creates a bias(think 2 dice the most common number is 7) Instead treat it as a base X number(in this case 5), scale it upto to sufficient granularity. (5*5 => 25 and 3*7 = 21)
Then drop the top end that is biasing the range. (in this case 21,22,23,24)

Of course in normal rand generation the base generator puts out a massive range so the low number bias is very minor and generally ignored.

//code untested
int rand7()
{
  int scaleRand   = 25;
  while(scaleRand >= 21)
     scaleRand = 5*(rand5()-1) + (rand5() - 1);   

  //rescale to the new base
  return (scaleRand % 7) + 1;
}

Semaphores, Mutex and other threading terms

A Semaphore is;
- A counter designed to restrict access to a resource to a specific count
- Typically a semaphore is using to signal the presence and count of of messages in a producer-consumer multithreaded system.

A Mutex is;
- MUTually EXclusive
- A counter with the count 1.
- Designed to allow single access to a piece of code.
- Typically used to enforce concurrent access to a re-entrant piece of code or critical region

A critical region is;
- A piece of code that cant not be executed simultaneously by 2 or more threads

An Atomic operation is;
- An operation that once execution begins is uninterruptable until it completes.
- A thread that executes such an operation doesnt need an mutex to guard it.
- The operation can be used to implement a mutex in threaded systems.

A Wait command is;
- The wait command is used to obtain a mutex or semaphore and will suspensed the thread until it

Signal command is;
- The command used to release of a single semaphore or mutex

Broadcast command is:
- A command used to release all threads waiting for a condition variable..

Producer-Consumer:
- The fundamental design pattern of how to communicate meaningful data from 1 tread to another.

Saturday, May 29, 2010

Shell Sort

Shell sort

Basically the idea of shell sort is to move numbers in a large step first, and slow reduce this step down to 1. The idea summarizes as simulated annealing. The whole reason for the additionally outer loop is to reduce the overall randomness in inital sort the data by quickly and roughly grouping low value items to the left and high value items to the right.

The inner core of it is insertion sort which will correct the sorting of the data irrespective of what your choice of outer steps are so long as the last step is 1.

How it Works;
1) choose a step by which to do your insertion sort
2) preform insertion sort as normal but using the step to determine which data is side by side.

The understandable version of the algorithm works as follows:
1) break the list of data into rows and columns
2) preform insertion sort on each column
3) repeat with more less columns until we have a single column.

In my code example I have have created an easy to understand implementation of shell sort it explicitly names out the columns and rows

Properties;
Worst case speed is O(n^2) or O(nlog^2n) if you use the special sequence of steps.
Memory is 1
It is Unstable (due the interleaving of the data several neigbour items are skipped)

//compile with g++
#include <iostream>
#include <iomanip>
using namespace std;

void print(int data[], int size);

void shellSort(int data[], int size)
{
  int tmp;
  int i; 
   // int step = 1; //comment the for and uncomment this to see that shell is just insertation sort!
   for(int step = size/2; step > 0; step = step/2)
      for(int j = step; j < size; j++)
        {
          tmp = data[j];
          i = j;
          while(
                (i >= step) &&
                (data[i-step] > tmp)
                )
            {
              data[i] = data[i-step];
              i -=step;
            }

          if(i != j)
            data[i] = tmp;
        }
}

void shellSortUnderstandable(int data[], int size)
{
  int tmp;
  int srow;
  for(int step = size/2; step > 0; step = step/2)
    {
      for(int column= 0; column < step; column++)
        {
          int rows = (int)((size-column)/step) + 1;
          for(int row = 1; row < rows-1; row++)
            {
              tmp = data[row*step + column];

              for(srow = row-1; srow >= 0; srow--)
                if(data[srow*step + column] > tmp)
                  data[(srow+1)*step + column] = data[(srow*step) + column];
                else
                  break;
              
              if(srow+1 != row)
                data[(srow+1)*step + column] = tmp;
            }
        }
    }
}

//test
#define SCRAMBLE(x, y) ((0xa57b & ~y) + ((0x3829 & x) << 1))

bool check(int data[], int size)
{
  for(int i = 1; i < size; i++)
    if(data[i] < data[i-1])
      {
        cout << "FAIL!" << endl;
        return false;
      }
  cout << "PASS" << endl;
  return true;
}

void print(int data[], int size)
{
  for(int i = 0; i < size; i++)
    cout << setw(5) << data[i] << " ";
  cout << endl;
}

bool test(int data[], int size)
{
  print(data, size);
  shellSort(data,size);
  print(data, size);
  return check(data, size);
}

#define SIZE 300
int main()
{
  int data[SIZE];
  bool pass = true;

  //easy data
  data[0] = 1;
  for(int i = 0; i < SIZE; i++)
    data[i] = SIZE - i;
  pass &= test(data, SIZE);

  //semi repeated data
  data[0] = 1;
  for(int i = 1; i < SIZE; i++)
    data[i] = SCRAMBLE(i, data[i-1]);
  pass &= test(data, SIZE);

  //the sort killer!
  for(int i = 0; i < SIZE; i++)
    data[i] = 5;
  pass &= test(data, SIZE);

  //and some randoms to catch anything i missed
  srand ( time(NULL) );

  for(int j = 1; j < 100; j++)
    {
      for(int i = 0; i < 3*j; i++)
        data[i] = (int)((float)(j+1)*((float)rand()/(float)RAND_MAX));
      pass &= test(data, 3*j);
    }

  if(pass)
    cout << "ALL PASSED" << endl;
  else
    cout << "FAILED" << endl;
}

Insertion Sort

How it Works;
0) Sorted data will be at the front of the array, starting 1 place after the start of the data,
1) take your current value out of the array, creating an empty point
2) set your insertion point to the tail of the sorted data.
3a) if the current item is lower then the insertion point item move the insertion points item up to the empty place in the array repeat with the next possible insertion point
3b) if the current item is equal to or greater than the insertion point place the current item back into the arrays empty place and repeat from 1 with the next unsorted data.

Properties;
Worst and average case speed is O(n^2)
Memory is 1
It is Stable (unless you implement it badly)

//compile with g++
#include <iostream>
#include <iomanip>
using namespace std;

void print(int data[], int size);

void insertionSort(int data[], int size)
{
  int tmp;
  int i;
  for(int j = 1; j < size; j++)
    {
      tmp = data[j];
      i = j;
      while(
            (i > 0) &&
            (data[i-1] > tmp)
            )
        {
          data[i] = data[i-1];
          i--;
        }

      if(i != j)
        data[i] = tmp;
    }
}

//test
#define SIZE 20
#define SCRAMBLE(x, y) ((0xa57b & ~y) + ((0x3829 & x) << 1))

bool check(int data[], int size)
{
  for(int i = 1; i < size; i++)
    if(data[i] < data[i-1])
      {
        cout << "FAIL!" << endl;
        return false;
      }
  cout << "PASS" << endl;
  return true;
}

void print(int data[], int size)
{
  for(int i = 0; i < size; i++)
    cout << setw(5) << data[i] << " ";
  cout << endl;
}

bool test(int data[], int size)
{
  print(data, SIZE);
  insertionSort(data,SIZE);
  print(data, SIZE);
  return check(data, SIZE);
}

int main()
{
  int data[SIZE];
  bool pass = true;

  //easy data
  data[0] = 1;
  for(int i = 0; i < SIZE; i++)
    data[i] = SIZE - i;
  pass &= test(data, SIZE);

  //semi repeated data
  data[0] = 1;
  for(int i = 1; i < SIZE; i++)
    data[i] = SCRAMBLE(i, data[i-1]);
  pass &= test(data, SIZE);

  //the sort killer!
  for(int i = 0; i < SIZE; i++)
    data[i] = 5;
  pass &= test(data, SIZE);

  //and some randoms to catch anything i missed
  srand ( time(NULL) );

  for(int j = 0; j < 100; j++)
    {
      for(int i = 0; i < SIZE; i++)
        data[i] = (int)((float)(j+1)*((float)rand()/(float)RAND_MAX));
      pass &= test(data, SIZE);
    }

  if(pass)
    cout << "ALL PASSED" << endl;
  else
    cout << "FAILED" << endl;
}

Selection Sort

The algorithm works as follows:
How it Works;
1) Find the minimum value in the list
2) Swap the minimum item with first item
3) Repeat for all remain positions

Properties;
Worst and average case speed is O(n^2)
Memory is 1
It is Stable (unless you implement it badly)

//compile with g++
#include <iostream>
#include <iomanip>
using namespace std;

void print(int data[], int size);

void selectionSort(int data[], int size)
{
  int min;
  int tmp;
  for(int j = 0; j < size; j++)
    {
      min = j;
      for(int i = j;i < size;i++)
        if(data[i] < data[min])
          min = i;

      if(min != j)
        {
          tmp = data[min];
          data[min] = data[j];
          data[j] = tmp;
        }
    }
}

//test
#define SIZE 20
#define SCRAMBLE(x, y) ((0xa57b & ~y) + ((0x3829 & x) << 1))

bool check(int data[], int size)
{
  for(int i = 1; i < size; i++)
    if(data[i] < data[i-1])
      {
        cout << "FAIL!" << endl;
        return false;
      }
  cout << "PASS" << endl;
  return true;
}

void print(int data[], int size)
{
  for(int i = 0; i < size; i++)
    cout << setw(5) << data[i] << " ";
  cout << endl;
}

bool test(int data[], int size)
{
  print(data, SIZE);
  selectionSort(data,SIZE);
  print(data, SIZE);
  return check(data, SIZE);
}

int main()
{
  int data[SIZE];
  bool pass = true;

  //easy data
  data[0] = 1;
  for(int i = 0; i < SIZE; i++)
    data[i] = SIZE - i;
  pass &= test(data, SIZE);

  //semi repeated data
  data[0] = 1;
  for(int i = 1; i < SIZE; i++)
    data[i] = SCRAMBLE(i, data[i-1]);
  pass &= test(data, SIZE);

  //the sort killer!
  for(int i = 0; i < SIZE; i++)
    data[i] = 5;
  pass &= test(data, SIZE);

  //and some randoms to catch anything i missed
  srand ( time(NULL) );

  for(int j = 0; j < 100; j++)
    {
      for(int i = 0; i < SIZE; i++)
        data[i] = (int)((float)(j+1)*((float)rand()/(float)RAND_MAX));
      pass &= test(data, SIZE);
    }

  if(pass)
    cout << "ALL PASSED" << endl;
  else
    cout << "FAILED" << endl;
}

Testing sorts

Basically in my last 2 posts I have been using a macro called SCAMBLE to test my sorts

#define SCRAMBLE(x, y) ((0xa57b & ~y) + ((0x3829 & x) << 1))

The average sort algorithm starts to have issues when there is repetition in the data. This SCAMBLE macro is a broken Psedo-Random number generator core. In short since its broken it gets stuck into a loop of numbers which makes it useless as a random number generator but is exactly the input I want to test the sorts.

Bubble Sort

Bubble sort

How it works;
Basically bubble sort works by stepping through the data a swapping the current and next data items if they are out of order.

Properties;
Best case and worst case speed are both O(n^2)
Memory is 1 (the temp swap location)

//compile with g++
#include <iostream>
#include <iomanip>
using namespace std;

void print(int data[], int size);

void bubblesort(int data[], int size)
{
  bool change = true;
  int tmp;
  while(change)
    {
      change = false;
      for(int i = 0;i < size-1;i++)
        if(data[i] > data[i+1])
          {
            tmp = data[i];
            data[i] = data[i+1];
            data[i+1] = tmp;
            change = true;
          }
    }
}

//test
#define SIZE 20
#define SCRAMBLE(x, y) ((0xa57b & ~y) + ((0x3829 & x) << 1))

bool check(int data[], int size)
{
  for(int i = 1; i < size; i++)
    if(data[i] < data[i-1])
      {
        cout << "FAIL!" << endl;
        return false;
      }
  cout << "PASS" << endl;
  return true;
}

void print(int data[], int size)
{
  for(int i = 0; i < size; i++)
    cout << setw(5) << data[i] << " ";
  cout << endl;
}

bool test(int data[], int size)
{
  print(data, SIZE);
  bubblesort(data,SIZE);
  print(data, SIZE);
  return check(data, SIZE);
}

int main()
{
  int data[SIZE];
  bool pass = true;

  //easy data
  data[0] = 1;
  for(int i = 0; i < SIZE; i++)
    data[i] = SIZE - i;
  pass &= test(data, SIZE);

  //semi repeated data
  data[0] = 1;
  for(int i = 0; i < SIZE; i++)
    data[i] = SCRAMBLE(i, data[i-1]);
  pass &= test(data, SIZE);

  //the sort killer!
  for(int i = 0; i < SIZE; i++)
    data[i] = 5;
  pass &= test(data, SIZE);

  //and some randoms to catch anything i missed
  srand ( time(NULL) );

  for(int j = 0; j < 100; j++)
    {
      for(int i = 0; i < SIZE; i++)
        data[i] = (int)((float)(j+1)*((float)rand()/(float)RAND_MAX));
      pass &= test(data, SIZE);
    }

  if(pass)
    cout << "ALL PASSED" << endl;
  else
    cout << "FAILED" << endl;
}

Friday, May 28, 2010

Quick sort

UPDATE: I noticed a lot o traffic coming to this entry.. since it seems popular ill clean up and simplify the code some time later this week(currently its 2010/06/17). Then repost it in a new entry and note the new link here..

Quick sort..

How it works.
1) First choose a pivot value.
2) Then divide the list into 2 sections, do the division by finding the left most item the exceeds the pivot and swapping it with the right most item that is less than the pivot.
3) Then recurse on the 2 halves.

There are several hidden fail points
1) if a bad pivot value is chosen then the list wont divide into 2
2) if the data becomes to similar, or even completely identical then the chances of a bad pivot increase.

As a result of the fail point care needs to be taken to correctly find the right and left swap items and where that the the list is be split.

Speed:
Quick sort is O(nlog(n)) on average. and O(n^2) in the worst case.

//compile with g++
#include <iostream>
#include <iomanip>
using namespace std;

void print(int data[], int size);

void quicksort(int data[], int sectionLow, int sectionHigh)
{
  if(sectionLow >= sectionHigh-1)
    return;

  int low  = sectionLow;
  int high = sectionHigh-1;
  int mid  = (low+high)/2;

  int pivotValue = data[mid];

  while(low < high)
    {
      while(
            (low < high) &&
            (data[low]  < pivotValue) &&
            (data[high] > pivotValue)
            )
        {
          //O(N^2) avoidance.. move the pointers inorder or 1 will start to dominate 
          // when the data starts to get too similar
          low++;
          high--;
        }
      
      //one or both of these wont run.. the other will find its end...
      while((low < high) && (data[low]  < pivotValue)) low++;
      while((low < high) && (data[high] > pivotValue)) high--;

      cout << low << "<>" << high << " ";

      //do the swap
      if(low < high)
        {
          int temp = data[low];
          data[low]  = data[high];
          data[high] = temp;

          //step away from it otherwise it will get stuck in case that these are both pivotValues
          low++;
          high--;
        }
    }

  cout << endl;

  cout << low << " ";
  //quick sort boundary conditions are problem matic
  //make certian of where the end of the data was
  if(data[low] < pivotValue)   mid = low+1;
  else                         mid = low;

  cout << mid << " ";

  //make certain that this isnt the start or end edge or we will end in an infinite loop
  if(mid == sectionHigh) mid--;
  if(mid == sectionLow)  mid++;

  cout << mid << endl;

  cout << "stage result: " << sectionLow
       << "<->" << mid
       << "<->" << sectionHigh << " pivot:" << pivotValue << endl;
  print(&(data[sectionLow]), sectionHigh-sectionLow);

  quicksort(data, sectionLow, mid);
  quicksort(data, mid, sectionHigh);
}

//test
#define SIZE 20
#define SCRAMBLE(x, y) ((0xa57b & ~y) + ((0x3829 & x) << 1))

bool check(int data[], int size)
{
  for(int i = 1; i < size; i++)
    if(data[i] < data[i-1])
      {
        cout << "FAIL!" << endl;
        return false;
      }
  cout << "PASS" << endl;
  return true;
}

void print(int data[], int size)
{
  for(int i = 0; i < size; i++)
    cout << setw(5) << data[i] << " ";
  cout << endl;
}

bool test(int data[], int size)
{
  print(data, SIZE);
  quicksort(data,0, SIZE);
  print(data, SIZE);
  return check(data, SIZE);
}

int main()
{
  int data[SIZE];
  bool pass = true;

  //easy data
  data[0] = 1;
  for(int i = 0; i < SIZE; i++)
    data[i] = SIZE - i;
  pass &= test(data, SIZE);

  //semi repeated data
  data[0] = 1;
  for(int i = 0; i < SIZE; i++)
    data[i] = SCRAMBLE(i, data[i-1]);
  pass &= test(data, SIZE);

  //the sort killer!
  for(int i = 0; i < SIZE; i++)
    data[i] = 5;
  pass &= test(data, SIZE);

  //and some randoms to catch anything i missed
  srand ( time(NULL) );

  for(int j = 0; j < 100; j++)
    {
      for(int i = 0; i < SIZE; i++)
        data[i] = (int)((float)(j+1)*((float)rand()/(float)RAND_MAX));
      pass &= test(data, SIZE);
    }
  
  if(pass)
    cout << "ALL PASSED" << endl;
  else
    cout << "FAILED" << endl;
}

Thursday, May 27, 2010

How to reverse a linked list

//compile with g++ 
#include <iostream>
using namespace std;

struct Node 
{
  Node(Node* _next, int _value) 
  { 
    next = _next; 
    value = _value; 
  }

  ~Node()
  { if(next != NULL) delete next; }

  Node* next;
  int value;
};

Node* reverse(Node* head)
{
  Node* new_head = NULL; 
  Node* current  = NULL; 
  
  while(head != NULL)
    {
      current = head;
      head = current->next;
      current->next = new_head;
      new_head = current;
    }

  return new_head;
}

//test
void print(Node* current)
{
  while(current != NULL)
    {
      cout << current->value << " ";
      current = current->next;
    }
  cout << endl;
}

int main()
{
  Node* head = NULL;

  for(int i = 0; i < 10; i++)
    head = new Node(head, i);

  print(head);
  head = reverse(head);
  print(head);

  delete head;
}

Japanese biz negotiation

For japanese biz negotiations
RULE: USE THE PASSIVE!
using the non-passive
1) tends to imply blame on the other party. (ie it sounds like; You did this causing the problem)
2) imply arrogrance on you behalf (ie it sounds like; I did this for up anit i great!)
CORRECT: The change X was rejected.
INCORRECT: You have rejected the change
アップロードができました
CORRECT: It was possible to upload the new site.
アップロードをしましました
INCORRECT: I have uploaded the site
Japanese are very thank-you-tastic and apolitic-for-no-reason-at-all and 80% of email gets intrupted as the negitive meaning so its better the say thanks at start and end of the email to pep up readers feelings.

Wednesday, May 19, 2010

apache2 black magic

Apache2 is like most open source projects. Started with nice clean code base that got hacked into spegitti by good intention "upgraders" and "fixes"
The apache2 manuals reflect its insanity.
Here is what I think its really doing(after hacking with allow denys for a while)
file loading precedence appears to be:

apache2.conf
mods-enabled directory (not certain could be after conf.d but suspects its before)
conf.d file
global doc roots .htaccess
virtual hosts and .htaccess

It appears that each DocRoot is searched for .htaccess files to be loaded when the vhost block is closed.

NOTE .htaccess files are NOT loaded 100%.. Their parsing is crippled to what the AllowOverride directive says. Non-allowed lines in the .htaccess files are basically ignored!
directive operating precedence (from weakest to strongest)

Directory
File
Location

So the full loading order on a per-file/directive basis becomes.

Directory (conf)
Directory (.htaccess from default docroot)
Directory (vhost)
Directory (.htaccess)
File      (conf)
File      (.htaccess from default docroot)
File      (vhost)
File      (.htaccess)
Location  (conf)
Location  (.htaccess from default docroot)
Location  (vhost)
Location  (.htaccess)

This means that to generically block all accesses to a test server with test versions of web sites u need to use the weakest most overloadable type. So in the conf.d area add a "block_all" file with these contents;

<Directory />
  Order deny, allow
  Deny form all
  Allow form <IP_OR_RANGE>
<Directory>

Tuesday, May 18, 2010

.htaccess blocking for upgrades

A quick and dirty .htaccess or virtualhost to block a site while its being upgraded

<Location />
  Order deny,allow
  Deny from all
  Allow from <YOUR_IP>
  ErrorDocument 403 "Under maintance, please be back back soon"
</Location>

finding the size of tables in mysql

show table status

Monday, May 17, 2010

Japanese - antonyms synonyms

http://hanntaigo.main.jp/

Rails - wiping sessions from the ActiveRecordSession store

Rails doesn't clean up it session. This will wipe the active record sessions in the newer version of rails without wasting as much memory as a destroy all.

rows = ActiveRecord::SessionStore::Session.delete_all ["updated_at < ?", time]