A Nontrivial Case Of Code Parallelization For Beginners
Imagine a streamflow network, a river with several arms but no cycles. Small streams rise from springs, flow together and gather in a big stream. Then it rains locally and water rises at certain places. In the next time step that water will flow downstream and hence the water height will rise somewhere else. Imagine, we create a model and try to calculate the traversing water down the river network from the sources to a sink. This is our example model we would like to parallelize, meaning we want to calculate the outcome with the help of several computing nodes to decrease computing time. In lots of cases when it comes to parallelization, we have a data set and perform an independent operation on each entry in the same way. In that case we can divide our data set in several subdatasets and order different computing nodes to perform this operations. There has to be (almost) no communication and we can practically divide the sequential running time for the program for the whole data set by the number of computing nodes to get the new parallel running time. In our case some of the calculations are dependent on others, but there are still independent parts. Parallelization can achieve lower running time but has its limits.
In this talk I will
– explain, what a tree is in mathematical sense and describe, why a streamflow network and a tree is basically the same
– give a short overview over an implentation of that data structure
– explain the main idea on how to parallelize tree data structures
– describe how I cut down trees
– explain how we can create a schedule
– give a short introduction to openMP and MPI and the main differences
– discuss the limits of the parallelization
– give some insight into the implementation.
I will work with lots of images and vizual representations of the data structures.