Tuesday, November 18, 2014

How to create a parsing/printing system with 1 line of code per field

Recently had a friend claim that you cant parse and print a blob of data without cutting and pasting stuff every where or using the new c++11 compilers and techniques.

I have in the past(2012) shown techniques to do this kind of parsing and generation in c11-tuples-and-schema-generation.html. So with a few short hours of hacking and here is the proof of concept.. The solution is meta programming black magic but hey u can define the parser and printer (and whatever else u want to add) for "SomeMessage" in basically 1 line of code per field.

// compile with:
// g++ -I"c:\tools\boost_1_49_0" -L"c:\tools\boost_1_49_0\stage\lib" -static self_print_struct.cpp -o self_print_struct.exe

#include <iostream>
#include <boost/mpl/for_each.hpp>
#include <boost/mpl/list.hpp>
#include <boost/mpl/string.hpp>
#include <boost/mpl/range_c.hpp>

template <typename T,
          typename N>
struct Field
{
    typedef T Type;
    typedef N Name;

    static unsigned char* print(std::ostream& os, unsigned char* ptr)
    {
        os << boost::mpl::c_str<N>::value
           << " : " << *(static_cast<T*>(static_cast<void*>(ptr)));
        return ptr + sizeof(Type);
    }
};

template <typename Base>
struct PrintMixin
{
    struct DoPrint
    {
        unsigned char* cursor_;

        DoPrint(unsigned char* cursor) :
           cursor_(cursor)
        {}

        template< typename U > void operator()(U x)
        {
            std::cout << " + ";
            U::print(std::cout, cursor_);
            std::cout << '\n';
            cursor_ += sizeof(typename U::Type);
        }
    };

    static void print(std::ostream& os, unsigned char* ptr)
    {
        boost::mpl::for_each< typename Base::Type >( DoPrint(ptr) );
    }
};

struct SomeMessage : PrintMixin<SomeMessage>
{
    typedef boost::mpl::list<Field<int,  boost::mpl::string<'fiel','d 1'> >,
                             Field<char, boost::mpl::string<'fiel','d 2'> > > Type;
};

struct AnotherMessage : PrintMixin<AnotherMessage>
{
    typedef boost::mpl::list<Field<int,   boost::mpl::string<'this'> >,
                             Field<char,  boost::mpl::string<'that'> >,
                             Field<float, boost::mpl::string<'the ','othe','r'> >
                             > Type;
};

int main()
{
    unsigned char message1[] =
    {
        0x01,0x02,0x03,0x04,
        'a'
    };

    std::cout << "message 1\n";
    SomeMessage::print(std::cout, message1);

    unsigned char message2[] =
    {
        0x15, 0xCD, 0x5B, 0x07,
        'b',
        0x19, 0x04, 0x9e, 0x3f
    };

    std::cout << "message 2\n";
    AnotherMessage::print(std::cout, message2);
}


And the output looks like:
$ self_print_struct.exe
message 1
 + field 1 : 67305985
 + field 2 : a
message 2
 + this : 123456789
 + that : b
 + the other : 1.2345

Monday, July 14, 2014

Runtime value to Compile time value/action conversion via variadic selection tree

With c++11 some of the older macro or table based generator techniques can be tossed out. One of these is the runtime to compile time translation of a value.

For example lets say you have a table of of idents and related actions. 2 possible solution are
* Populate an array with function pointers.
* Generate a switch statement with marcos..

But c++11 has opened a new and what seems to flexible possibility: The ability to use a variaditic template to generated a reduction tree. This seems to have some very interesting possibilities.
* The redux tree can have its action templated... (But I seemed to have a compiler bug that stops it working)
* Its not locked into a single function signature a "Param..." argument pack gives it the flexibility to adapt on both entry and exit of the selection tree.
* good use of inlining lets the tree flatten out and the compiler gets to hit it with opt routines like the macro generated select version.


#include <iostream>
#include <cstring>

// -----------------------------------------------
//                         IDs
// -----------------------------------------------

enum {
    UNKNOWN = 0,
    MIN_TAG = UNKNOWN,
    INT,
    FLOAT,
    MAX_TAG
};

// -----------------------------------------------
//            ID -> Type translation trait
// -----------------------------------------------

template <int Tag> struct Trait {};

template <> struct Trait<UNKNOWN>
{
    typedef void Type;
    static const char* id() { return "unknown";   }
};

template <> struct Trait<INT>
{
    typedef int Type;
    static const char* id() { return "int";   }
};

template <> struct Trait<FLOAT>
{
    typedef float Type;
    static const char* id() { return "float"; }
};

// -----------------------------------------------
//                    Execution point
// -----------------------------------------------

template <int I>
void function(char* data)
{
    typename Trait<I>::Type value =
        (*static_cast<typename Trait<I>::Type*>(
            static_cast<void*>(data)));

    std::cout
        << " type:"  << Trait<I>::id()
        << " value:" << value
        << "\n";
}

template <>
void function<UNKNOWN>(char* data)
{
    std::cout
        << " unknown type number given!"
        << "\n";
}

template <>
void function<MAX_TAG>(char* data) { function<UNKNOWN>(data); }

struct DoAction
{
    template <int I, typename... Params>
    static void call(Params... params)
    {
        function<I>(params...);
    }
};

// -----------------------------------------------
//  SelectTree: runtime -> complie time selector
// -----------------------------------------------

template <typename Action, int min, int max>
struct SelectTree
{
    enum { mid = (max+min)/2 };

    template <typename... Params>
    static void call(int i, Params... params)
    {
        if (i <= mid) SelectTree<Action,min  ,mid>::call(i, params...);
        else          SelectTree<Action,mid+1,max>::call(i, params...);
    }
};

template <typename Action, int value>
struct SelectTree<Action, value, value>
{
    template <typename... Params>
    static void call(int i, Params... params)
    {
        //Action::call<value>(params...);    // compiler bug!
        DoAction::call<value>(params...);
    }
};

// -----------------------------------------------
//                        MAIN
// -----------------------------------------------

int main()
{
    float x = 1.0;

    char data[4];
    std::memcpy(data, &x, sizeof(x));

    int sel = 3;
    std::cin >> sel;

    SelectTree<DoAction,MIN_TAG,MAX_TAG>::call(sel, data);
}


the output looks like
$ a.exe
1
 type:int value:1065353216
$ a.exe
2
 type:float value:1
$ a.exe
-2
 unknown type number given!
$ a.exe
0
 unknown type number given!
$ a.exe
123213
 unknown type number given!

Saturday, July 5, 2014

Async usage of futures and promises in ASIO

One of the new c++11 additions with great potential is futures and promises. However im a heavy asio based async programmer. Most exiting examples on the web deal directly with std::thread and handle the std::promise via std::move. But the asio design basically means you need to use the binding approach to coding. Unfortunately std::bind is great until u need to use a un-copyable move only class like std::promise.

Now you can do something crazy like rewrite the bind so that it can handle movable items, as i did in my post movable-stdbind-with-placeholders. Which is all well and good in theroy BUT there is something to be said about keeping to the standard and well used libs(like boost). Why? Well compiler developers and lib devs like the boost guys are working very hard to improve there stuff. So if you keep to the standard tools and libs you get all the bonuses from their hard work for free when u upgrade. Compiler designers research, profile and add better optimizations and well used libs get more and more eyes on, who in turn find and contribute better code.

And second to all that there is a really trivial and obvious solution to the problem: std::shared_ptr...

So here is how you use std::promise and std::future in lambdas, binds, asio and anything else u cant move into.. I also tossed in an example of how to use non-blocking checks on the futures which is another key tool for using futures in asio code.

#include <iostream>
#include <chrono>
#include <thread>
#include <future>
#include <boost/asio.hpp>

void asyncRun()
{
    std::cout << "Async..." << std::flush;

    boost::asio::io_service io_service;
    std::shared_ptr< std::promise<int> > promise(new std::promise<int>());
    std::future<int> future = promise->get_future();

    io_service.post(
                    [promise]()
                    {
                        std::chrono::milliseconds dura( 2000 );
                        std::this_thread::sleep_for( dura );
                        promise->set_value(9);
                    }
                    );

    std::thread t1( [&io_service]{ io_service.run(); });
    t1.detach();

    std::cout << "Waiting..." << std::flush;
    future.wait();
    std::cout << "Done!\nResults are: "
              << future.get() << '\n';

}


void nonBlockingRun()
{
    std::cout << "Non Blocking..." << std::flush;

    std::promise<int> promise;
    std::future<int> future = promise.get_future();
    std::thread t1( [](std::promise<int> p)
                    {
                        std::chrono::milliseconds dura( 2000 );
                        std::this_thread::sleep_for( dura );
                        p.set_value(9);
                    },
                    std::move(promise) );
    t1.detach();

    std::cout << "Waiting...\n" << std::flush;
    std::future_status status;
    do {
        status = future.wait_for(std::chrono::seconds(0));

        if (status == std::future_status::deferred) {
            std::cout << "+";
        } else if (status == std::future_status::timeout) {
            std::cout << ".";
        }
    } while (status != std::future_status::ready);
    std::cout << "Done!\nResults are: "
              << future.get() << '\n';
}

void blockingRun()
{
    std::cout << "Blocking..." << std::flush;

    std::promise<int> promise;
    std::future<int> future = promise.get_future();
    std::thread t1( [](std::promise<int> p)
                    {
                        std::chrono::milliseconds dura( 2000 );
                        std::this_thread::sleep_for( dura );
                        p.set_value(9);
                    },
                    std::move(promise) );
    t1.detach();

    std::cout << "Waiting..." << std::flush;
    future.wait();
    std::cout << "Done!\nResults are: "
              << future.get() << '\n';
}

int main()
{
    nonBlockingRun();
    blockingRun();
    asyncRun();
}

Wednesday, April 23, 2014

Async vs Sync

For most web server designs it is recognized that Asynchronous multithread implementations are the fastest you can choose. Heres why:

Lets assume we have 3 processing blocks: A, B and C, each block consumes the formers output. Block A takes on average 20mS, blocks B and C average 10mS (as shown in the diagram below titled "The Pipeline")

For "Single threaded"
In a single threaded system, an event 1 must be processed by block A then B then C, this means that system takes 40mS to output 1 event and can handle one event every 40mS

For "Multi threaded"
In the multi threaded system, lets give A, B and C each a thread, and assume threading and message passing overheads are zip. In this system the first event 1 is processed by block A in 20mS. After that block A is free to take the next event 2. while block A starts processing event 2, block B gets event 1 and can handle that in 10mS it then passes it to block C which can also handle that in 10mS.

So as you can see the system still takes 40mS to handle event1 HOWEVER it can do 2 events at the same time. So the Latency for each event is still 40mS BUT the throughput is now 2events evey 40mS which is 20mS

For "Async Multi threaded"
In the Aync multithreaded world, We dispatch a "worker" which is a portable thread to the next waiting job in the async event queue. So first of all event1 arrives job1 is created with event1 and unitA. The dispatcher sends job1(event1 in unitA) to worker1. Job1 completes after 20mS and creates job2 (event1 and unitB) and the next event arrives creating job3 (event2 and uintA), the worker2 is dispatched Job2(event1 and unitB) and worker1 is dispatched job3(event2 and unitA). 10mS later Worker2 finshes and job4 (event1 and unitC) is created. worker2 is free so it is dispatched job4 (event1 and unitC). 10mS later both workers complete and job5(event2 and uintB) is queued.

As you can see this system also has a Latency of 40mS and a Throughput of 20mS. BUT the async implementation only required 2 thread resources vs the multithreads 3 threads.

The WHY
You will note that in the multithread example there was a period of "dead time" while threads are waiting for data. the thread is not portable and must stay with its assigned processing unit. Now you might be inclined to say BUT why didn't you bundle units B and C into 1 and now the system needs only 2 thread.. well yes that is true... BUT only if you units work at the PERFECT operating speed..

As a human designer we can only guess at the real operating speeds of processing unit so sooner or later the division of threads will result in wasted dead time and spin cycles for the thread. Also the speed of of operation is never a single perfect number and is likely to be a bell curve
or some other distribution and as a result more dead time gets introduced and the threads have to twiddle fingers waiting for the next event. Async handling removes this human guess work and the statistical bumps in runtimes and stops this dead time by assigning the next waiting job directly to the next waiting thread.

The scalability and overall performance of the designs from least to greatest is therefore:
  • single threaded
  • multi threaded
  • async multi threaded
zZrdk6I4EMD/GqvuHmZKjF/zqLs7dw+3VVvnVe1zDjKSWgQLcdX967dDOgh0HBmVgD4ojQnpXzfpDxywT5vjXynfhl+TQESD0TA4DtjnwWg08p7HDD6V6KRFT95QC9apDLTIQ8FeBmJXEWVJEmVyWxX6SRwLP6vI3pKoOtmWrwURrHweUel3GWShls5H07P8byHXobmMN33RZ3bZycwRiDe+j7KnXATn1OkNN3OhmsehPsThp+rhlseV9fxKkk1FkIqd/FVd85vEReEV/k/SQKQVUSTjH2Vo7AvYKE0SGKi+bY6fRKTsZCygh71eOFuwSkVcufSlAQy1+8mjPa6dwDuEMhOrLffV8QGcB4eINBPoPZbrnrUBjxPJRmTpSUFGqM8v5ZdxPfQ843iHs71nBlhYsvV8hkKOCNfFhc46wxdU244AFS4R+C8UIPgmtwKsA19rPHYHuYk4nGDLNtDU2Mzn6OjIxnAosZnOLWw888N72KBZSmyWA3XjsSMH7Rc/QbiA895wsyrJB+pXr09wYqbkD3Gn3EqXmRlHQEhjCqngUYY0fQCjMWEEQz7AaOKI0bhDRhPCCJBQRrApAwu4ngWT5ueAU4eY8E5/bzMWcbBI0+QAR37EdzvpAyUQvko162fUVgQkptbUh0mTfZpzO9/nGU/XAn+VuzWFlIqIZ1IZrDT7PTrPutIZfbKsc47Bgc5zovNKxmtQF5YTpoIHIuhZ1PHMXeEk7NCkpLyl/iF+glLenxRRyLfq6y4T25sx5ba5vIl6tcylQok13CsMuHsYmRzpQmzuEhLrDyRceyW5UzdYHm9qZEBhuPSSR3KtUn0ftFXJ+lKxkFCPLPDERgaBGgO470dVDTYTysZ2k5mYdBcamveWY/Lt/sOWWAcNn73hS/X9AGJezbnmV72rSJkf7l00PQbvSvbrcLuHy6iMZlzN+v7hmYh9WKrl3L9CRydYUJ4vFsGgF446qaVFFs42T30IZpphf4ViXvY3WjJLAWuNlsUP7+JDs2t34VIb53K8ZLg4U71aytf2dn+aT7sLkVfAMKO0SbCcgqFJN/WYYTdgRp16DM3Mqcd0BKZQuhOPoSl5T7KFD1JzmR+Yed+tbaFqXeFhnOQBq1TuoqRBgWuCZLnC1fsfRVN2mKGthjPCxoUvXuNbImE9FztO3gRNaqbQi8VR5S51baJJLU326s0WXcSTiXIDFYo3sxmtGFq3WbkFobfm7mxWv19Gt9qsKFAvTfRAm9la+Cox9NyVedeiWmd1nqWHr+HAavsJx7O0XFujQ2sLTYf1lY6tId0anff79jrqj3oY9evZtdOo36SL32bUx+S+ElVyQ3YYVUyNWjzgrYFuHlVwohaiSJMnEW3aDWuPit26yuDutRMZ+EA70Rqt3nXTzxFv6bpBsMSm0q4fu//EpLrvVH1tNd4sz2EWu1Psg6jnDbipJUhaG3BFZLjrXxK0zutJcawt2Lg4tlBrLUwyWmh9T9IfIqW7bjs33hU29aR0goHHQdrFaD2j0aAWPUMzdUiGFjPuGpfXqNQ9xjQyXfTnGK1j3DUuP7jLuAXT5cORK2DGnYLp8uHIFTDk4YhTMLQO6EnR+0FqTqN5m0l5XhU5TcqvkJ6YZABJz5pFwBtAw+H5z+C6WDr/e599+Q0=

The reason why we use stderr and stdout

The usage of stdout and stderr are rather important in programs.. However this is often done wrong.

The objective is:
  • stdout is the main program output.
  • stderr is for things that you want the user to see and pay attention to.
This is because in calls to sub processes, what you are aiming to do is get ONLY the good ouput captured and handled by the main script.. when there is an error you want this to bypass the normal path. Capturing the error contents is a problem because:
  • The user doesnt see it... unless
  • your main program has special code to detect and deal with error conditions from the sub program and then toss them upwards to the user. which is a total waste of your effort in most script systems where a simple return code will get you miles down the road much quicker
For example:
#first example toss an error from the remote
> A=`ssh ugly 'echo "error" 1>&2' ` 
error 
> echo $A 

#second example generate data from a remote and capture it
> A=`ssh ugly 'echo "data" 2>&1'` 
> echo $A 
data

#third example generate data from a remote and capture it. BUT also toss an error
> A=`ssh ugly 'echo "data" 2>&1; echo "error" 1>&2'` 
error 
> echo $A 
data