Synchronous and Asynchronous Jacobi Iterative Solvers in Serial, Threaded, MPI, and OpenMP Implementations
From csi702
Synchronous and Asynchronous Jacobi Iterative Solvers in Serial, Threaded, MPI, and OpenMP Implementations
by Bob Sorensen
A system of 512 linear equations was solved using the Jacobi iteration method in various parallel implementations. Specifically, performance speed–up for the 6 parallel implementations tested were as follows, with the non-optimized serial version normalized to one.
Method Speed-Up Config Comment Sync/Serial 1 -O0 No Optimization Sync/Serial 1.22 -O3 Best Intel C Optimization Sync/pthreads 1.54 8 threads 18 loops to converge Sync/MPI 0.27 2 cpus Degrades rapidly as cpu count increases Sync/openmp 0.54 2 threads Constant for 2-32 thread, 18 loops to converge Async/pthreads 1.45 2 threads Errors for thread counts > 2 Async/MPI 0.0003 4 cpus 2728 times longer than baseline (7m. 22s.) Async/openmp 0.40 2 threads 8 loops to converge
- In the best case, a synchronous pthreads version realized a 1.54 speed up over the baseline serial case.
- The slowest implementation, the asynchronous MPI version, ran gloriously slow—over 2700 times slower than the baseline test case—clearly indicting that MPI and the shared memory scheme needed for this case do not mix well.
- All asynchronous versions converged with about 1/3 the iterations than their synchronous counterparts, but any gains in computational performance were ultimately masked by additional communication overhead.
- Asynchronous pthreads present an interesting case. For one and two threads, the process iterated to the right answer. But for increased thread counts, the process iterated to a stable, albeit wrong solution. Initial analysis suggests that the problem emerges where threads did not give up control within relatively small enough increments in time to ensure 'random' asynchronous behavior across all threads.
