Experimental Evaluation of "Termination and Complexity Analysis for Programs with Bitvector Arithmetic by Symbolic Execution"

In earlier work, we developed an approach for automated termination analysis of C programs with explicit pointer arithmetic, which is based on symbolic execution. However, similar to many other termination techniques, this approach assumed the program variables to range over mathematical integers instead of bitvectors. This eases mathematical reasoning but is unsound in general. In this paper, we extend our approach in order to handle fixed-width bitvector integers. Thus, we present the first technique for termination analysis of C programs that covers both byte-accurate pointer arithmetic and bit-precise modeling of integers. Moreover, we show that our approach can also be used to analyze the runtime complexity of bitvector programs. We implemented our approach in the automated termination prover AProVE and evaluate its power by extensive experiments.

Literature

Here, we provide a link to our previous work on termination analysis of C programs with mathematical integers instead of bitvectors. This paper contains symbolic evaluation rules to handle pointer arithmetic, etc.

Thomas Ströder, Jürgen Giesl, Marc Brockschmidt, Florian Frohn, Carsten Fuhs, Jera Hensel, Peter Schneider-Kamp, and Cornelius Aschermann
Automatically Proving Termination and Memory Safety for Programs with Pointer Arithmetic
Journal of Automated Reasoning, 58(1):33-65, 2017. ©Springer-Verlag
The final publication is available at Springer via the DOI 10.1007/s10817-016-9389-x.
Preliminary Version


Implementation in AProVE

We integrated the analysis of C and LLVM programs into our termination tool AProVE. These variants of AProVE allow to repeat the experiments below. When using AProVE's web interface, please keep in mind that the computer running the web interface is considerably slower than the one used for the experiments. Moreover, the computer used for the web interface is used by several other applications as well, so the runtimes may vary.

Moreover, recent releases of AProVE are available for download here. Release 457e9b0 is able to prove termination for both bitvector programs and programs with unbounded mathematical integers. Use the flag -b to select bitvector semantics. Note that AProVE is constantly developed further and therefore, the results obtained by recent releases may differ from the experimental results below.

As in the termination category of the SV-COMP, AProVE analyzes termination of the main() function of a C program. To this end, AProVE uses the Clang compiler to translate the C program into a program in the intermediate representation of LLVM (without compiler optimizations as these are not sound for termination proving, i.e., the compiler may remove non-terminating code if this code does not contribute to the return value of a function). To compile C programs to LLVM programs, we use the following command:

clang program.c -S -emit-llvm -o program.llvm

This LLVM program is then analyzed for memory safety and termination. So in contrast to several other termination tools, AProVE only considers a program as "terminating" if it can also prove its memory safety.

If one only wants to analyze termination and memory safety of a specific C function instead of main(), this is possible by adding a query as a comment in the first line. This query comment must be of the form /*query: <function_name>(<list_of_arguments>)*/. The function_name is the name of the function to analyze. The list_of_arguments is a (possibly empty) comma-separated list of types explained at the bottom of this page.


Limitations

Our implementation currently has the following restrictions and limitations:

Examples

For our experiments, we used a collection of 118 C programs available here. This collection was obtained as follows: From these 137 examples, we excluded 19 examples which are known to be non-terminating.

Experiments for Termination

We compared AProVE to KITTeL, 2LS, Juggernaut, and Ultimate. For KITTeL and AProVE, we used Clang 3.5 in the compilation to LLVM IR. Our experiments were performed on a computer with an Intel Core i7-950 with 8 cores clocked at 3.07 GHz each and 6 GB of memory, using a time-out of 300 seconds for each example.

The tables below summarize our experiments. They show that on our collection (which mainly consists of the examples from the evaluations of the other tools), AProVE is most powerful. Here, T gives the number of examples where termination was proven, F means that the proof failed in less than 300 seconds, TO stands for "time-out" and means that the proof took longer than 300 seconds, and RT gives the average runtime of those examples where termination could be proven. By clicking on the name of the example set, you can inspect the details of the respective experiments. Since Ultimate does not support bitvector arithmetic for signed integers yet, we divided the examples such that programs for which the termination proof depends on the correct handling of signed integers only occur in the first row of the table.

Example Set Size AProVE 2LS KITTeL Juggernaut Ultimate
T F TO RT T F TO RT T F TO RT T F TO RT T F TO RT
Possibly Signed Overflows 52 34 9 9 10.23 23 29 0 0.37 27 4 21 1.81 10 19 23 34.12 -- -- -- --
No Signed Overflows 66 61 3 2 5.55 45 21 0 0.33 33 3 30 14.17 22 26 18 6.22 11 54 1 12.77

To evaluate the impact of our approach to represent both unsigned and signed values, we also ran AProVE in a mode where all values are represented as signed integers. Here, we performed equally good on programs with signed integers but lost 11 termination proofs for programs with unsigned integers. Among these 11 programs, 2LS was able to find termination proofs for 6 programs, Juggernaut proved termination for 5 programs, KITTeL (which treats all integers as signed) only found a single termination proof, and Ultimate could not prove any program to be terminating.

To evaluate the use of case analysis vs. modulo relations, we also tested a version of AProVE where we did not use overflow refinements. Here, we failed on 13 more examples. KITTeL (which also uses refinements) was able to prove termination for 9 of these 13 programs, whereas 2LS resp. Juggernaut only found termination proofs for 2 resp. 5 programs and Ultimate was not able to prove termination for any of these programs.

AProVE succeeds on all of our four programs that combine pointer and bitvector arithmetic, and Ultimate can prove termination for three of them. The other tools did not find any proof.


Experiments for Complexity

To evaluate our approach for runtime complexity, we ran an adapted version of AProVE that searches for upper runtime bounds using the tools KoAT and CoFloCo. The programs in our example collection were formulated according to the regulations of the SV-COMP, where there is a function main() that may call other functions and all input is modelled by non-deterministic values. To obtain meaningful results for runtime analysis, we modified the example programs in such a way that all non-deterministically initialized variables were moved to the parameter list of the respective functions. The modified C programs are available here.

Out of the 95 programs where termination could be shown, AProVE infers an upper bound for 60 programs within 300 seconds. For 7 of these programs, AProVE finds a small constant bound. For these 7 programs, the runtime indeed does not depend on the input variables or on the sizes of the types. For 38 programs, an upper bound is found that depends linearly on the input variable(s) and for 3 more programs, a quadratic upper bound is obtained. Thus, the runtime of these 41 programs is independent of the sizes of the integer types. For 4 programs, AProVE generates an upper bound that only depends on size bound variables. For the remaining 8 programs, the inferred runtime bound depends on both size bound variables and input variables of the function.

The exact concrete runtime bounds are listed separately in a table for programs without signed overflows and a table for programs possibly containing signed overflows. We always ran the back-end tools KoAT and CoFloCo in parallel and took the minimum of the bounds obtained by the two tools. In the tables, nat(x) is defined as x, if x is greater than zero, and zero otherwise. Note that KoAT always generates bounds in terms of the "sizes" (i.e., of the absolute values) of integers. Of course, one can replace |x| by x if x is a variable of an unsigned integer type or if x is a size bound variable that is guaranteed to be non-negative. Note also that to ease readability, in the tables, we wrote the program variables x instead of the corresponding symbolic variables vx that are assigned to them initially.


Argument Types for Queries

Our implementation currently supports the following argument types for queries: