Basic Parallel Programming - A Few Paradigms
From csi702
Contents |
1 Parallel Computing Overview
Parallel computing is the process of performing multiple calculations concurrently. Many parallel computing algorithms operate via the divide and conquer paradigm. By dividing a large problem among many nodes (processes), problems that would be infeasible on a serial program become tractable. However, parallel programs are almost universally more difficult to program than a serial ones. This arises from the fact that now, instead of optimizing a single node for speed, one needs to concern themselves with the efficiency of multiple nodes.[1]
Load balancing and communication are the two largest problems that effect a parallel programs efficiency. The basic idea is to minimize communication between nodes while maximizing the time each node is being used. To address these problems the programmer needs to concern themselves with both data and task partitioning. Data partitioning involves assigning different parts of the data set to different nodes. Task partitioning involves assigning each node a different task. The specifics of how data and task partitioning are implemented is largely problem dependent. Fortunately, problems can be grouped into a taxonomy depending on their geometry.
2 Geometric Taxonomy
Problems in parallel computing can be described in terms of the geometry of the problem. Three main areas need to be considered to understand the complexity of the problem when it is to be implemented in parallel
2.1 Structured Versus Unstructured Grids
Problems are often grouped based on how the solution space is discretized. In general there are two basic types of grids; structured and unstructured.
Structured grids are setup to simplify the identification of neighboring nodes. In a structured grid all nodes are equally spaced, as a result, the grid size and indexing scheme are simple calculations. For example all grids in the image below are H / m distance apart in the y direction and L / n in the x direction. Furthermore, node(i,j) in the has adjacent nodes (i,j+1), (i,j-1), (i+1,j) and (i-1,j).
Unstructured grids do not have a simple calculation in order to determine neighboring nodes. Instead each node needs to have a list of neighboring nodes. The number of nodes in this list can vary from node to node. Another complication of unstructured grids is that the numbering of the node is no longer sufficient to determine is location. As a result, each node needs to store its location within the grid. Below is a sample unstructured grid. Its important to note that the location of the node in memory is dependent on its number. Thus, the choice of numbering is extremely important to limit the number cache misses that occur when adjacent nodes need to be referenced.
2.2 Dynamic Versus Static Grids
In some simulations the grid is static throughout. In this case the grids would be identical to those seen above.
In simulations it is sometimes necessary to alter structures within the solution space a the simulation progresses. As a result at each time step the position of the nodes in the system must be reevaluated. Below is an example of a dynamic grid system.
2.3 Implicit Versus Explicit Solvers
To illustrate the difference between explicit and implicit methods an example case will be used. The one dimensional heat equation is defined by the analytic function below.
This function can rewritten discretely using the explicit equation
Simplifying the notation yields
solving for
Notice that given initial conditions, the above equation can be solved. If
is taken at descreet points 1 to n, then the equation can be written in matrix form.
The above system can be computed using a simple matrix multiplication.
The implicit version of the same equation is
Simplifying the notation yields
Notice that in this case the right side of the equation contains
as well as
and
. Because the solution to
depends on variable we are trying to solve for, it is more illustrative to solve the equation for
. Doing so yields
Notice in this case the equation is still solvable given initial conditions. If uj + 1 is taken at descreet points 1 to n, then the equation can be written in matrix form.
Notice that in this case the matrix cannot be computed by a simple matrix multiplication. Instead, a linear system needs to be solved in order to find the resulting values
3 Problem Taxonomy
3.1 Structured - Explicit - Static
A structured, regular grid that doesn’t change between time-steps with simple neighbor to neighbor communications. Examples include:
- A diffusion problem solved on a regular rectangular grid with static connections
- Euler equation solution to fluids on regular grid
- Many finite difference codes, including point Jacobi method
3.2 Structured - Explicit - Dynamic
A structured, regular grid that changes between time-steps with simple neighbor to neighbor communications. Examples include:
- Finite difference Lagrangian codes
3.3 Structured - Implicit - Static
A structured grid that does not change between time-steps with all-to-all communications needed. Examples include:
- Radiative transfer problems with photon scattering
- Gravity calculations on a uniform grid
- Elliptical PDE’s and Poisson problems
3.4 Structured- Implicit - Dynamic
A structured grid that does not change between time-steps with all-to-all communications needed. Examples include:
- Radiative transfer on moving grids
3.5 Unstructured - Explicit - Static
A unstructured grid that doesn’t change between time-steps with simple neighbor to neighbor communications. Examples include:
- A diffusion problem solved using finite elements
- Euler equation solution to fluids solved with finite elements
3.6 Unstructured - Explicit - Dynamic
A unstructured grid that changes between time-steps with simple neighbor to neighbor communications. Examples include:
- SPH codes
3.7 Unstructured - Implicit - Static
A unstructured grid that doesn’t change between time-steps with simple neighbor to neighbor communications. Examples include:
- Finite element radiation and elliptical solvers
3.8 Unstructured- Implicit - Dynamic
An unstructured grid that changes between time-steps with all-to-all communications needed. Examples include:
- Gravitational particle codes
- Adaptive Mesh Refinement using Poisson Equation
4 Finding Concurrency in Algorithms
4.1 Organize by Task
- Task Parallelism
- multiple phyical effects, ray tracing
- each ray or event is processed separately
- Divide and Conquer
- sorts, trees, fast Fourier transforms
- create recursive algorithms to reduce workload
- Example--Photon Tracking
- Each photon in ghandled independently of evey other photon
- Each incoming photon can be broekn into a different task
4.2 Organize by Data Decomposition
- Geometric Decomposition
- Break grids up into 'equal" sets
- Scatter decompostion
- Each geometric region is processed separately
- Recursive Data
- Understanding and mapping relationships within complex data sructures
- Performing parallel operations on recursive data structures.
- An Example
- Irregular Decomposition
- The amount of computation is rougly equal to the area of the enclosed region
- The amount of communication to other processors is related to the surface area
- Highly irregular spaces have a large surface areas compared with its associated volume
- Unequal areas demand different amounts of computing and will likely lead to unbalanced loads
- An example of scattered decomposition on a processor by processor basis
4.3 Organize by Flow of Data
- Pipelining
- Signal Prrcessing
- Breal processing into different assembly stages
- The Henry Ford factory model
- Event-based Coordination
- Tasks are mostly indepement of each other, working in a coordinated fashion
- manging tasks becomes critical
5 Program Supporting Structures
- Different programming paradigms have different types of structures to support concurrency. Mattson et al. break these into four catagories
- SPMD--single program, multiple data
- Loop paralelism
- Master/Worker
- Fork/Join
- These support structure can be found to some degree in various languages
- Supporting Structures vs Algorithmic Structures
5.1 Simple Project Example
- The role of the manager is to keep track of parameters that have been completed and compiles results.
- The manager starts by forking a single worker.
- The manager initiates a server with sockets.
- When queried, the manager sends an initial value to be computed.
- The server waits for the response from the worker.
- Upon message delivery from worker, manager will store results on the local file system.
- If the is more data, the manager goes back to query-response state.
- The role of the worker is to execute a set of runs.
- The worker begins as a process forked from the main program.
- It sends a query to the main program for the first set of data.
- If the data is not an "end signal", then process it.
- When done, send the results to the server.
- Repeat until an "end signal" is received.
6 Underlying Technologies
- Threads
- Processes and Forks
- Check out
- http:/www.erlenstar.demon.co.uk/unix/faq 2.html
- http://www.osix.net/modules/article/?id=641
- http://www.yolinux.com/TUTORIALS/ForkExecProcesses.html
- Check out
- Sockets
- Check out


