Lets assume we have 3 processing blocks: A, B and C, each block consumes the formers output. Block A takes on average 20mS, blocks B and C average 10mS (as shown in the diagram below titled "The Pipeline")
For "Single threaded"
In a single threaded system, an event 1 must be processed by block A then B then C, this means that system takes 40mS to output 1 event and can handle one event every 40mS
For "Multi threaded"
In the multi threaded system, lets give A, B and C each a thread, and assume threading and message passing overheads are zip. In this system the first event 1 is processed by block A in 20mS. After that block A is free to take the next event 2. while block A starts processing event 2, block B gets event 1 and can handle that in 10mS it then passes it to block C which can also handle that in 10mS.
So as you can see the system still takes 40mS to handle event1 HOWEVER it can do 2 events at the same time. So the Latency for each event is still 40mS BUT the throughput is now 2events evey 40mS which is 20mS
For "Async Multi threaded"
In the Aync multithreaded world, We dispatch a "worker" which is a portable thread to the next waiting job in the async event queue. So first of all event1 arrives job1 is created with event1 and unitA. The dispatcher sends job1(event1 in unitA) to worker1. Job1 completes after 20mS and creates job2 (event1 and unitB) and the next event arrives creating job3 (event2 and uintA), the worker2 is dispatched Job2(event1 and unitB) and worker1 is dispatched job3(event2 and unitA). 10mS later Worker2 finshes and job4 (event1 and unitC) is created. worker2 is free so it is dispatched job4 (event1 and unitC). 10mS later both workers complete and job5(event2 and uintB) is queued.
As you can see this system also has a Latency of 40mS and a Throughput of 20mS. BUT the async implementation only required 2 thread resources vs the multithreads 3 threads.
You will note that in the multithread example there was a period of "dead time" while threads are waiting for data. the thread is not portable and must stay with its assigned processing unit. Now you might be inclined to say BUT why didn't you bundle units B and C into 1 and now the system needs only 2 thread.. well yes that is true... BUT only if you units work at the PERFECT operating speed..
As a human designer we can only guess at the real operating speeds of processing unit so sooner or later the division of threads will result in wasted dead time and spin cycles for the thread. Also the speed of of operation is never a single perfect number and is likely to be a bell curve
or some other distribution and as a result more dead time gets introduced and the threads have to twiddle fingers waiting for the next event. Async handling removes this human guess work and the statistical bumps in runtimes and stops this dead time by assigning the next waiting job directly to the next waiting thread.
The scalability and overall performance of the designs from least to greatest is therefore:
- single threaded
- multi threaded
- async multi threaded