This kernel code implements dynamic load balancing among neighbouring MPI ranks.
Here, the kernel starts with a uniform load for all ranks and constantly adds load to rank 0 after each iteration of the computation. The load imbalance is then handled by shifting work packages from those ranks that have more work to do to those ranks with less work.
This kernel assumes that the work per item is equal for all items and all ranks. This is probably not true for real world applications. Load, i.e. work items, is transfered among neighbouring ranks only while in a real application, load could be transfered among arbitrary ranks.
# Getting started
## Prerequisites
To build and run this kernel you will need a
* MPI library
* C++ compiler
## Building and running the kernel
The kernel can be compiled with the provided Makefile via
```
make
```
The resulting executable `rankdlb` can be run using mpirun. It takes three arguments:
The kernel continuously reports the rate at which it processes work items. A higher value indicates better load balance. It can be seen that the rate drops while rank 0 creates new work items. The rate increaes again when load balcing was performed.