Wednesday, April 23, 2014

Async vs Sync

For most web server designs it is recognized that Asynchronous multithread implementations are the fastest you can choose. Heres why:

Lets assume we have 3 processing blocks: A, B and C, each block consumes the formers output. Block A takes on average 20mS, blocks B and C average 10mS (as shown in the diagram below titled "The Pipeline")

For "Single threaded"
In a single threaded system, an event 1 must be processed by block A then B then C, this means that system takes 40mS to output 1 event and can handle one event every 40mS

For "Multi threaded"
In the multi threaded system, lets give A, B and C each a thread, and assume threading and message passing overheads are zip. In this system the first event 1 is processed by block A in 20mS. After that block A is free to take the next event 2. while block A starts processing event 2, block B gets event 1 and can handle that in 10mS it then passes it to block C which can also handle that in 10mS.

So as you can see the system still takes 40mS to handle event1 HOWEVER it can do 2 events at the same time. So the Latency for each event is still 40mS BUT the throughput is now 2events evey 40mS which is 20mS

For "Async Multi threaded"
In the Aync multithreaded world, We dispatch a "worker" which is a portable thread to the next waiting job in the async event queue. So first of all event1 arrives job1 is created with event1 and unitA. The dispatcher sends job1(event1 in unitA) to worker1. Job1 completes after 20mS and creates job2 (event1 and unitB) and the next event arrives creating job3 (event2 and uintA), the worker2 is dispatched Job2(event1 and unitB) and worker1 is dispatched job3(event2 and unitA). 10mS later Worker2 finshes and job4 (event1 and unitC) is created. worker2 is free so it is dispatched job4 (event1 and unitC). 10mS later both workers complete and job5(event2 and uintB) is queued.

As you can see this system also has a Latency of 40mS and a Throughput of 20mS. BUT the async implementation only required 2 thread resources vs the multithreads 3 threads.

The WHY
You will note that in the multithread example there was a period of "dead time" while threads are waiting for data. the thread is not portable and must stay with its assigned processing unit. Now you might be inclined to say BUT why didn't you bundle units B and C into 1 and now the system needs only 2 thread.. well yes that is true... BUT only if you units work at the PERFECT operating speed..

As a human designer we can only guess at the real operating speeds of processing unit so sooner or later the division of threads will result in wasted dead time and spin cycles for the thread. Also the speed of of operation is never a single perfect number and is likely to be a bell curve
or some other distribution and as a result more dead time gets introduced and the threads have to twiddle fingers waiting for the next event. Async handling removes this human guess work and the statistical bumps in runtimes and stops this dead time by assigning the next waiting job directly to the next waiting thread.

The scalability and overall performance of the designs from least to greatest is therefore:
  • single threaded
  • multi threaded
  • async multi threaded
zZrdk6I4EMD/GqvuHmZKjF/zqLs7dw+3VVvnVe1zDjKSWgQLcdX967dDOgh0HBmVgD4ojQnpXzfpDxywT5vjXynfhl+TQESD0TA4DtjnwWg08p7HDD6V6KRFT95QC9apDLTIQ8FeBmJXEWVJEmVyWxX6SRwLP6vI3pKoOtmWrwURrHweUel3GWShls5H07P8byHXobmMN33RZ3bZycwRiDe+j7KnXATn1OkNN3OhmsehPsThp+rhlseV9fxKkk1FkIqd/FVd85vEReEV/k/SQKQVUSTjH2Vo7AvYKE0SGKi+bY6fRKTsZCygh71eOFuwSkVcufSlAQy1+8mjPa6dwDuEMhOrLffV8QGcB4eINBPoPZbrnrUBjxPJRmTpSUFGqM8v5ZdxPfQ843iHs71nBlhYsvV8hkKOCNfFhc46wxdU244AFS4R+C8UIPgmtwKsA19rPHYHuYk4nGDLNtDU2Mzn6OjIxnAosZnOLWw888N72KBZSmyWA3XjsSMH7Rc/QbiA895wsyrJB+pXr09wYqbkD3Gn3EqXmRlHQEhjCqngUYY0fQCjMWEEQz7AaOKI0bhDRhPCCJBQRrApAwu4ngWT5ueAU4eY8E5/bzMWcbBI0+QAR37EdzvpAyUQvko162fUVgQkptbUh0mTfZpzO9/nGU/XAn+VuzWFlIqIZ1IZrDT7PTrPutIZfbKsc47Bgc5zovNKxmtQF5YTpoIHIuhZ1PHMXeEk7NCkpLyl/iF+glLenxRRyLfq6y4T25sx5ba5vIl6tcylQok13CsMuHsYmRzpQmzuEhLrDyRceyW5UzdYHm9qZEBhuPSSR3KtUn0ftFXJ+lKxkFCPLPDERgaBGgO470dVDTYTysZ2k5mYdBcamveWY/Lt/sOWWAcNn73hS/X9AGJezbnmV72rSJkf7l00PQbvSvbrcLuHy6iMZlzN+v7hmYh9WKrl3L9CRydYUJ4vFsGgF446qaVFFs42T30IZpphf4ViXvY3WjJLAWuNlsUP7+JDs2t34VIb53K8ZLg4U71aytf2dn+aT7sLkVfAMKO0SbCcgqFJN/WYYTdgRp16DM3Mqcd0BKZQuhOPoSl5T7KFD1JzmR+Yed+tbaFqXeFhnOQBq1TuoqRBgWuCZLnC1fsfRVN2mKGthjPCxoUvXuNbImE9FztO3gRNaqbQi8VR5S51baJJLU326s0WXcSTiXIDFYo3sxmtGFq3WbkFobfm7mxWv19Gt9qsKFAvTfRAm9la+Cox9NyVedeiWmd1nqWHr+HAavsJx7O0XFujQ2sLTYf1lY6tId0anff79jrqj3oY9evZtdOo36SL32bUx+S+ElVyQ3YYVUyNWjzgrYFuHlVwohaiSJMnEW3aDWuPit26yuDutRMZ+EA70Rqt3nXTzxFv6bpBsMSm0q4fu//EpLrvVH1tNd4sz2EWu1Psg6jnDbipJUhaG3BFZLjrXxK0zutJcawt2Lg4tlBrLUwyWmh9T9IfIqW7bjs33hU29aR0goHHQdrFaD2j0aAWPUMzdUiGFjPuGpfXqNQ9xjQyXfTnGK1j3DUuP7jLuAXT5cORK2DGnYLp8uHIFTDk4YhTMLQO6EnR+0FqTqN5m0l5XhU5TcqvkJ6YZABJz5pFwBtAw+H5z+C6WDr/e599+Q0=

The reason why we use stderr and stdout

The usage of stdout and stderr are rather important in programs.. However this is often done wrong.

The objective is:
  • stdout is the main program output.
  • stderr is for things that you want the user to see and pay attention to.
This is because in calls to sub processes, what you are aiming to do is get ONLY the good ouput captured and handled by the main script.. when there is an error you want this to bypass the normal path. Capturing the error contents is a problem because:
  • The user doesnt see it... unless
  • your main program has special code to detect and deal with error conditions from the sub program and then toss them upwards to the user. which is a total waste of your effort in most script systems where a simple return code will get you miles down the road much quicker
For example:
#first example toss an error from the remote
> A=`ssh ugly 'echo "error" 1>&2' ` 
error 
> echo $A 

#second example generate data from a remote and capture it
> A=`ssh ugly 'echo "data" 2>&1'` 
> echo $A 
data

#third example generate data from a remote and capture it. BUT also toss an error
> A=`ssh ugly 'echo "data" 2>&1; echo "error" 1>&2'` 
error 
> echo $A 
data