(This post has been moved from my other blog on the grounds that it is quite general, and not all that technical).
This is sort of an intro to how easy it is to write multi threaded programs these days. Most people don't bother about parallelizing their applications because the prospect of managing threads and synchronization is too daunting. This is where OpenMP steps in. It is a simple system to use, based on pre-processor directives. It works portably in both windows and linux. And best of all, if you do not invoke the OpenMP option while compiling the code, the parallelization part of your code is ignored, and it works as a single threaded app. Below, you will find a quite pointless program, but it serves well to test my dual core dual socket setup :)
it computes the sum of: [sin(i/128) + cos(i/128)]... don't ask me why :-/
where i ranges from 0 to 98765432
using namespace std;
#define MAXNUM 98765432
#pragma omp parallel for reduction(+:sum)
for (int i=0;i<MAXNUM;i++)
The #pragma statement automatically parallelizes the for loop. Also, since the 'sum' is updated each time in the loop, synchronization is ensured by the reduction(+:sum) statement. This says that the for loop is performing a reduction operation (like sum of n numbers), and that the operator used is Addition (+).
Pretty neat! And very very simple to use!
To compile this, all i needed to do was:
g++ -fopenmp main.cpp
In case I wanted a version without any of the parallel threading, a single threaded version is easily generated by not using the fopenmp compiler flag:
Amazing simplicity! And this works for all the complicated constructs that OpenMP provides.
So, how well did we fare in this parallelization endeavour? Lets find out:
Single Threaded run (1 core): 16 seconds
Multi threaded run (4 cores): 4.7 seconds
Thats almost a linear scaling from 1 to four cores. Now I cant wait to see what more OpenMP can do for me :)
Learn more about OpenMP at openmp.org or at this comprehensive tutorial.