Experimental Evaluation of "Proving Termination and Memory Safety for Programs with Pointer Arithmetic"

While automated verification of imperative programs has been studied intensively, proving termination of programs with explicit pointer arithmetic is still an open problem. To close this gap, we introduce a novel abstract domain that can track allocated memory in detail. We use it to automatically construct a symbolic execution graph that represents all possible runs of the program. This graph is then transformed into an integer transition system, whose termination can be proved by standard techniques. We implemented this approach in the automated termination prover AProVE. On this webpage, we describe extensive experimental results which demonstrate the power of our approach compared to existing tools. In particular, we demonstrate its capability of analyzing C programs with pointer arithmetic that existing termination tools cannot handle.

Literature

Here, we provide links to our paper which describes our approach.


Implementation in AProVE

We integrated the analysis of C and LLVM programs into our termination tool AProVE. The tool is available both via a web interface and for download. An additional variant of AProVE which only handles C and LLVM programs is also available via a web interface and as a ZIP archive (also containing installation instructions for external software used by AProVE, see the README file in the archive). In particular, this variant of AProVE allows to repeat the experiments below. When using AProVE's web interface, please keep in mind that the computer running the web interface is considerably slower than the one used for the experiments. Moreover, the computer used for the web interface is used by several other applications as well, so the runtimes may vary.

As in the termination category of the SV-COMP 2014, AProVE analyzes termination of the main() function of a C program. To this end, AProVE uses the Clang compiler to translate the C program into a program in the intermediate representation of LLVM (without compiler optimizations as these are not sound for termination proving, i.e., the compiler may remove non-terminating code if this code does not contribute to the return value of a function). To compile C programs to LLVM programs, we use the following command:

clang program.c -S -emit-llvm -o program.llvm

This LLVM program is then analyzed for memory safety and termination. So in contrast to several other termination tools, AProVE only considers a program as "terminating" if it can also prove its memory safety.

If one only wants to analyze termination of a specific C function instead of main(), this is possible by adding a query as a comment in the first line. This query comment must be of the form /*query: <function_name>(<list_of_arguments>)*/. The function_name is the name of the function to analyze. The list_of_arguments is a (possibly empty) comma-separated list of types explained at the bottom of this page.


Limitations

Our implementation currently has the following restrictions and limitations:

Examples

For our experiments, we used a collection of 208 examples. These examples have been taken from the following sources: Note that these examples are all memory safe. This is needed to allow a meaningful comparison with the other termination tools, since most termination tools do not check memory safety of the analyzed algorithms. You can download the examples, sorted according to their origin, here.

Experiments for Termination

We compared AProVE to all tools that participated in the termination category of
SV-COMP 2014 and to the tool KITTeL, which also analyzes LLVM programs. While we used Clang in the compilation for AProVE (without any compiler optimizations), for KITTeL we used GCC instead (as suggested in KITTeL's instructions). Our experiments were performed on a computer with an Intel Core i7-950 processor and 6 GB of memory using a time-out of 300 seconds for each example.

The tables below summarize our experiments. They show that our approach is slightly more powerful than the other tools for integer programs and clearly the most powerful one for pointer programs. Here, T gives the number of examples where termination was proven, N gives the number of examples where non-termination was proven, F means that the proof failed in less than 300 seconds, TO gives the number of examples where the tool took longer than 300 seconds, and RT gives the average runtime of those examples where termination could either be proven or disproven. By clicking on the name of the example set, you can inspect the details of the respective experiments.

Example Set Size AProVE FuncTion KITTeL T2 TAN Ultimate
T N F TO RT T N F TO RT T N F TO RT T N F TO RT T N F TO RT T N F TO RT
Integer 79 67 0 11 1 19.6 11 0 66 2 23.1 58 0 12 9 0.2 55 0 23 1 1.8 31 0 37 11 2.4 57 4 12 6 3.2
Pointer 129 91 0 19 19 58.6 - - - - - 9 0 1 119 0.2 6 0 123 0 3.6 3 0 124 2 10.6 - - - - -


Experiments for Memory Safety

Although our approach is targeted toward termination and only analyzes memory safety as a prerequisite for termination, it turned out that on our collection, the power of AProVE is comparable to the power of the leading tools for proving memory safety. This is shown by the following experiments where we compared AProVE with the tools CPAchecker and Predator which reached the first and the third place in the category for memory safety at SV-COMP 2014. (The second place in this category was reached by the bounded model checker LLBMC. However, in general such tools only disprove, but cannot verify memory safety.)

To allow a sensible comparison, in this experiment, we used a variant of AProVE which only checks memory safety, but does not prove termination afterwards. For CPAchecker, we used the version that participated in the SV-COMP, but we called it in an "unbounded" configuration such that it only returns "true" if it has really proved memory-safety (in the SV-COMP, it was called in a configuration which stops with "true" whenever it has explored 5000 states of the search space without finding a violation of the analyzed property). Since CPAchecker does not handle allocation of memory via alloca, we created variants of our examples where we replaced alloca by malloc. Since Predator is able to handle alloca as well as malloc, we called this tool on both variants of the examples. Moreover, all three tools used the assumption that allocation of memory always succeeds. So in the table below, T (for "true") indicates that the tool proved memory safety and F indicates that it failed in proving memory safety. We did not distinguish between the results "unknown" and "false" here, since CPAchecker and Predator do not support the allocation of variable amounts of memory and therefore, they also often return "false" for such programs although they are in fact memory safe. Our experiments were again performed on a computer with an Intel Core i7-950 processor and 6 GB of memory using a time-out of 300 seconds for each example. By clicking on the name of the example set, you can inspect the details of the respective experiments.

Example Set Size AProVE CPAchecker Predator
T F TO RT T F TO RT T F TO RT
Alloca 129 102 15 12 47.9 - - - - 79 49 1 2.1
Malloc 129 - - - - 77 52 0 1.5 79 49 1 11.6

Argument Types for Queries

Our implementation currently supports the following argument types for queries: