Synchronous and Asynchronous Jacobi Iterative Solvers in Serial, Threaded, MPI, and OpenMP Implementations

From csi702

Jump to: navigation, search

Synchronous and Asynchronous Jacobi Iterative Solvers in Serial, Threaded, MPI, and OpenMP Implementations

by Bob Sorensen


A system of 512 linear equations was solved using the Jacobi iteration method in various parallel implementations. Specifically, performance speed–up for the 6 parallel implementations tested were as follows, with the non-optimized serial version normalized to one.

Method       Speed-Up    Config      Comment
Sync/Serial        1     -O0         No Optimization
Sync/Serial     1.22     -O3         Best Intel C Optimization
Sync/pthreads   1.54     8 threads   18 loops to converge
Sync/MPI        0.27     2 cpus      Degrades rapidly as cpu count increases
Sync/openmp     0.54     2 threads   Constant for 2-32 thread, 18 loops to converge
Async/pthreads  1.45     2 threads   Errors for thread counts > 2
Async/MPI     0.0003     4 cpus      2728 times longer than baseline (7m. 22s.)
Async/openmp    0.40     2 threads   8 loops to converge
  • In the best case, a synchronous pthreads version realized a 1.54 speed up over the baseline serial case.
  • The slowest implementation, the asynchronous MPI version, ran gloriously slow—over 2700 times slower than the baseline test case—clearly indicting that MPI and the shared memory scheme needed for this case do not mix well.
  • All asynchronous versions converged with about 1/3 the iterations than their synchronous counterparts, but any gains in computational performance were ultimately masked by additional communication overhead.
  • Asynchronous pthreads present an interesting case. For one and two threads, the process iterated to the right answer. But for increased thread counts, the process iterated to a stable, albeit wrong solution. Initial analysis suggests that the problem emerges where threads did not give up control within relatively small enough increments in time to ensure 'random' asynchronous behavior across all threads.


Personal tools