Design and Implementation of Benchmarks for Systematic Fault Injection

Testing fault tolerance mechanisms is commonly done by performing extensive fault injection experiments on a system that try to mimic physical causes of radiation effects like soft errors/bit flips and then observing the system’s behaviour. There are many possibilities for such injections: Every bit in every cycle. This spans a so-called fault space and one of the first steps is determining equivalent sets of possible injection points which lead to the same system's behaviour to reduce the number of injections needed to test the functional reliability of the system.

To be able to quantify functional reliability of a system, it requires benchmarks that are affected with a bit flip during runtime in the context of the target systems hardware. The difficulty is to find the right mix of input size, general workload, semantic relevance and overall fault injection campaign runtime.

The main goal of this thesis is to find out which benchmarks are suitable, which are already known or are actively used to quantify the functional reliability of a hard ware system during the execution of a program. This includes the detailed analysis of the benchmarks and the implementation of individual benchmarks into an existing benchmark suite. Also included is the design and implementation of new benchmarks.