Experimental Evaluation of "Proving Termination of Programs with Bitvector Arithmetic by Symbolic Execution"

In earlier work, we developed an approach for automated termination analysis of C programs with explicit pointer arithmetic, which is based on symbolic execution. However, similar to many other termination techniques, this approach assumed the program variables to range over mathematical integers instead of bitvectors. This eases mathematical reasoning but is unsound in general. In this paper, we extend our approach in order to handle fixed-width bitvector integers. Thus, we present the first technique for termination analysis of C programs that covers both byte-accurate pointer arithmetic and bit-precise modeling of integers. We implemented our approach in the automated termination prover AProVE and evaluate its power by extensive experiments.

Literature

Here, we provide links to our papers which describe our approach.


Implementation in AProVE

We integrated the analysis of C and LLVM programs into our termination tool AProVE. A custom version of AProVE which only handles C and LLVM programs is available via a web interface. In particular, this variant of AProVE allows to repeat the experiments below. When using AProVE's web interface, please keep in mind that the computer running the web interface is considerably slower than the one used for the experiments. Moreover, the computer used for the web interface is used by several other applications as well, so the runtimes may vary.

As in the termination category of the SV-COMP, AProVE analyzes termination of the main() function of a C program. To this end, AProVE uses the Clang compiler to translate the C program into a program in the intermediate representation of LLVM (without compiler optimizations as these are not sound for termination proving, i.e., the compiler may remove non-terminating code if this code does not contribute to the return value of a function). To compile C programs to LLVM programs, we use the following command:

clang program.c -S -emit-llvm -o program.llvm

This LLVM program is then analyzed for memory safety and termination. So in contrast to several other termination tools, AProVE only considers a program as "terminating" if it can also prove its memory safety.

If one only wants to analyze termination and memory safety of a specific C function instead of main(), this is possible by adding a query as a comment in the first line. This query comment must be of the form /*query: <function_name>(<list_of_arguments>)*/. The function_name is the name of the function to analyze. The list_of_arguments is a (possibly empty) comma-separated list of types explained at the bottom of this page.


Limitations

Our implementation currently has the following restrictions and limitations:

Examples

For our experiments, we used a collection of 118 C programs available here. This collection was obtained as follows: From these 137 examples, we excluded 19 examples which are known to be non-terminating.

Experiments for Termination

We compared AProVE to KITTeL, 2LS, Juggernaut, and Ultimate. For KITTeL and AProVE, we used Clang 3.5 in the compilation to LLVM IR. Our experiments were performed on a computer with an Intel Core i7-950 with 8 cores clocked at 3.07 GHz each and 6 GB of memory, using a time-out of 300 seconds for each example.

The tables below summarize our experiments. They show that on our collection (which mainly consists of the examples from the evaluations of the other tools), AProVE is most powerful. Here, T gives the number of examples where termination was proven, F means that the proof failed in less than 300 seconds, TO stands for "time-out" and means that the proof took longer than 300 seconds, and RT gives the average runtime of those examples where termination could be proven. By clicking on the name of the example set, you can inspect the details of the respective experiments. Since Ultimate does not support bitvector arithmetic for signed integers yet, we divided the examples such that programs for which the termination proof depends on the correct handling of signed integers only occur in the first row of the table.

Example Set Size AProVE 2LS KITTeL Juggernaut Ultimate
T F TO RT T F TO RT T F TO RT T F TO RT T F TO RT
Possibly Signed Overflows 52 34 9 9 10.23 23 29 0 0.37 27 4 21 1.81 10 19 23 34.12 -- -- -- --
No Signed Overflows 66 61 3 2 5.55 45 21 0 0.33 33 3 30 14.17 22 26 18 6.22 11 54 1 12.77

To evaluate the impact of our approach to represent both unsigned and signed values, we also ran AProVE in a mode where all values are represented as signed integers. Here, we performed equally good on programs with signed integers but lost 11 termination proofs for programs with unsigned integers. Among these 11 programs, 2LS was able to find termination proofs for 6 programs, Juggernaut proved termination for 5 programs, KITTeL (which treats all integers as signed) only found a single termination proof, and Ultimate could not prove any program to be terminating.

To evaluate the use of case analysis vs. modulo relations, we also tested a version of AProVE where we did not use overflow refinements. Here, we failed on 13 more examples. KITTeL (which also uses refinements) was able to prove termination for 9 of these 13 programs, whereas 2LS resp. Juggernaut only found termination proofs for 2 resp. 5 programs and Ultimate was not able to prove termination for any of these programs.

For our four programs that combine pointer and bitvector arithmetic, Ultimate was the only tool that was able to prove termination for three of them. The other tools did not find any proof.


Argument Types for Queries

Our implementation currently supports the following argument types for queries: