Accelerating the Simulation of Cellular Automata with GPU Computing

Master's thesis by Bernhard Klein, September 2018

In my master's thesis I explore different types of synchronous cellular automata due to their implementability with GPU computing.

Deterministic Cellular Automata

The first type of cellular automata which was implemented in this thesis are deterministic ones. They match to the strict formulation of cellular automata which defines that a futuring cell state only depends on the cell state of the own cell and their neighboring cells.

Conway's Game of Life was implemented as an example for a synchronous deterministic cellular automaton. Fixed and periodic boundary conditions were tested together with different optimizations. The main optimization was to use extended boundaries; such that no boundary condition has to be checked and all cells can be updated with the same update rule. All boundary stuff was computed in a second step, only accessing the border cells.

With a simple global memory implementation using local work group sizes which match to the cache line size of the global memory, speeds which are up to 2000 times faster than the single core CPU implementation were achieved.

Stochastic Cellular Automata

Stochastic cellular automata were the next type which was examined due to their GPU computing ability. In contrast to deterministic ones, random numbers are needed and therefore the generation of pseudo random numbers is the main aspect.

The Drossel and Schwabl Forest Fire Model was used as example model. In a first approach the random numbers are generated by a Mersenne-Twister running on the CPU and then copied to the GPU. However, this approach was very slow since the GPU idles most of the time waiting for random numbers. In a second approach thousands of Xorshift+ pseudo random number generators were executed in parallel on the GPU. Before the normal cell upate rule gets executed all random numbers were generated. Therefore, the amount of pseudo random number generators which run in parallel can be set to the total count of available GPU cores. With this GPU based random number generation speedups up to 700 compared to a single core CPU reference were achieved.

Weak Formulated Cellular Automata

In the academic landscape many models are common which are simulated by so called cellular automata, but they do not fulfill the strict definition of a cellular automaton. The most common violation is a explicit write access to a neighboring cell. However, in the case of synchronously updated cells, conflicts occur and race conditions decide the final state of some cells. To avoid this the model description has to declare explicitly how such cases should be treated. In addition, the implementation needs to be adapted accordingly. In the case of a partial parallel execution on the GPU this is not trivial. With the Flow Field Approach a method was invented which allows to reformulate such a weak formulated cellular automaton to a strict fomulated one with multiple substeps.

Herbivore Plant Model as example of a population dynamics simulation which uses such a weak formulated cellular automaton was implemented with the flow field approach. With this speeds which are up to 1700 times faster than the single core CPU implementation were achieved.