Experimental Evaluation of "Termination and Complexity Analysis for Programs with Bitvector Arithmetic by Symbolic Execution"

In earlier work, we developed an approach for automated termination analysis of C programs with explicit pointer arithmetic, which is based on symbolic execution. However, similar to many other termination techniques, this approach assumed the program variables to range over mathematical integers instead of bitvectors. This eases mathematical reasoning but is unsound in general. In this paper, we extend our approach in order to handle fixed-width bitvector integers. Thus, we present the first technique for termination analysis of C programs that covers both byte-accurate pointer arithmetic and bit-precise modeling of integers. Moreover, we show that our approach can also be used to analyze the runtime complexity of bitvector programs. We implemented our approach in the automated termination prover AProVE and evaluate its power by extensive experiments.

Literature

Here, we provide a link to our previous work on termination analysis of C programs with mathematical integers instead of bitvectors. This paper contains symbolic evaluation rules to handle pointer arithmetic, etc.

Thomas Ströder, Jürgen Giesl, Marc Brockschmidt, Florian Frohn, Carsten Fuhs, Jera Hensel, Peter Schneider-Kamp, and Cornelius Aschermann
Automatically Proving Termination and Memory Safety for Programs with Pointer Arithmetic
Journal of Automated Reasoning, 58(1):33-65, 2017. ©Springer-Verlag
The final publication is available at Springer via the DOI 10.1007/s10817-016-9389-x.
Preliminary Version

Implementation in AProVE

We integrated the analysis of C and LLVM programs into our termination tool AProVE.

A custom version of AProVE which only proves termination of C and LLVM programs is available via a web interface for termination.
A version of AProVE which only searches for upper runtime bounds of C and LLVM programs is available via a web interface for complexity.

These variants of AProVE allow to repeat the experiments below. When using AProVE's web interface, please keep in mind that the computer running the web interface is considerably slower than the one used for the experiments. Moreover, the computer used for the web interface is used by several other applications as well, so the runtimes may vary.

Moreover, recent releases of AProVE are available for download here. Release 457e9b0 is able to prove termination for both bitvector programs and programs with unbounded mathematical integers. Use the flag -b to select bitvector semantics. Note that AProVE is constantly developed further and therefore, the results obtained by recent releases may differ from the experimental results below.

As in the termination category of the SV-COMP, AProVE analyzes termination of the main() function of a C program. To this end, AProVE uses the Clang compiler to translate the C program into a program in the intermediate representation of LLVM (without compiler optimizations as these are not sound for termination proving, i.e., the compiler may remove non-terminating code if this code does not contribute to the return value of a function). To compile C programs to LLVM programs, we use the following command:

clang program.c -S -emit-llvm -o program.llvm

This LLVM program is then analyzed for memory safety and termination. So in contrast to several other termination tools, AProVE only considers a program as "terminating" if it can also prove its memory safety.

If one only wants to analyze termination and memory safety of a specific C function instead of main(), this is possible by adding a query as a comment in the first line. This query comment must be of the form /*query: <function_name>(<list_of_arguments>)*/. The function_name is the name of the function to analyze. The list_of_arguments is a (possibly empty) comma-separated list of types explained at the bottom of this page.

Limitations

Our implementation currently has the following restrictions and limitations:

We cannot deal with types for floating point numbers at the moment.
So far, we did not present any methods to treat recursive programs or to prove non-termination of programs. We are currently working on extending our approach accordingly and our implementation already contains some support for recursion and non-termination.
Except for malloc and free, we cannot handle external functions at the moment. However, we allow to just declare functions and use them to produce non-deterministic values. We assume that functions which are only declared, but not defined, have no side effects (in particular, they do not modify their arguments).
We also do not deal with data structures (defined by struct) yet. Moreover, the handling of arrays is restricted to programs where arrays are compiled to pointers in LLVM. This means that statements like int* array = alloca(4*sizeof(int)) are supported, but statements like int array[4] are not, because these statements are compiled to an LLVM-specific array data type.
We currently support LLVM 3.5. Analysis of later versions of LLVM will probably not yet work, since LLVM is very actively developed. Moreover, the following LLVM instructions are not yet supported (most of them occur rarely if one does not use compiler optimizations, can be replaced by other instructions that we support, or deal with floats or complex data structures):
- switch
- indirectbr
- invoke
- unwind
- va_arg
- all vector and aggregate instructions
- all instructions on floats

Examples

For our experiments, we used a collection of 118 C programs available here. This collection was obtained as follows:

We took the 61 Windows Driver Development Kit examples that are used for the experimental results in related work (see here and here). We split them into c/signed/wdk-signed-overflow and c/unsigned/wdk-no-signed-overflow so that signed overflows, which are handled differently by different tools, only occur in the latter.
We included 61 of the 62 examples from the repository of Juggernaut where we excluded one example containing a float. These examples can be found in c/signed/juggernaut and c/unsigned/juggernaut.
We took 7 of the 9 examples that are introduced here where we excluded two examples containing floats. These examples can be found in c/unsigned/juggernaut-paper.
We added 4 own examples where the termination proof depends on the correct modeling of overflows during multiplication (c/unsigned/mult).
Moreover, we added 4 own examples which combine pointer and bitvector arithmetic (c/unsigned/pointer).

From these 137 examples, we excluded 19 examples which are known to be non-terminating.

Experiments for Termination

We compared AProVE to KITTeL, 2LS, Juggernaut, and Ultimate. For KITTeL and AProVE, we used Clang 3.5 in the compilation to LLVM IR. Our experiments were performed on a computer with an Intel Core i7-950 with 8 cores clocked at 3.07 GHz each and 6 GB of memory, using a time-out of 300 seconds for each example.

The tables below summarize our experiments. They show that on our collection (which mainly consists of the examples from the evaluations of the other tools), AProVE is most powerful. Here, T gives the number of examples where termination was proven, F means that the proof failed in less than 300 seconds, TO stands for "time-out" and means that the proof took longer than 300 seconds, and RT gives the average runtime of those examples where termination could be proven. By clicking on the name of the example set, you can inspect the details of the respective experiments. Since Ultimate does not support bitvector arithmetic for signed integers yet, we divided the examples such that programs for which the termination proof depends on the correct handling of signed integers only occur in the first row of the table.

Example Set Size AProVE 2LS KITTeL Juggernaut Ultimate

T F TO RT T F TO RT T F TO RT T F TO RT T F TO RT

Possibly Signed Overflows 52 34 9 9 10.23 23 29 0 0.37 27 4 21 1.81 10 19 23 34.12 -- -- -- --

No Signed Overflows 66 61 3 2 5.55 45 21 0 0.33 33 3 30 14.17 22 26 18 6.22 11 54 1 12.77

Example Set	Size	AProVE	2LS	KITTeL	Juggernaut	Ultimate
T	F	TO	RT	T	F	TO	RT	T	F	TO	RT	T	F	TO	RT	T	F	TO	RT
Possibly Signed Overflows	52	34	9	9	10.23	23	29	0	0.37	27	4	21	1.81	10	19	23	34.12	--	--	--	--
No Signed Overflows	66	61	3	2	5.55	45	21	0	0.33	33	3	30	14.17	22	26	18	6.22	11	54	1	12.77

To evaluate the impact of our approach to represent both unsigned and signed values, we also ran AProVE in a mode where all values are represented as signed integers. Here, we performed equally good on programs with signed integers but lost 11 termination proofs for programs with unsigned integers. Among these 11 programs, 2LS was able to find termination proofs for 6 programs, Juggernaut proved termination for 5 programs, KITTeL (which treats all integers as signed) only found a single termination proof, and Ultimate could not prove any program to be terminating.

To evaluate the use of case analysis vs. modulo relations, we also tested a version of AProVE where we did not use overflow refinements. Here, we failed on 13 more examples. KITTeL (which also uses refinements) was able to prove termination for 9 of these 13 programs, whereas 2LS resp. Juggernaut only found termination proofs for 2 resp. 5 programs and Ultimate was not able to prove termination for any of these programs.

AProVE succeeds on all of our four programs that combine pointer and bitvector arithmetic, and Ultimate can prove termination for three of them. The other tools did not find any proof.

Experiments for Complexity

To evaluate our approach for runtime complexity, we ran an adapted version of AProVE that searches for upper runtime bounds using the tools KoAT and CoFloCo. The programs in our example collection were formulated according to the regulations of the SV-COMP, where there is a function main() that may call other functions and all input is modelled by non-deterministic values. To obtain meaningful results for runtime analysis, we modified the example programs in such a way that all non-deterministically initialized variables were moved to the parameter list of the respective functions. The modified C programs are available here.

Out of the 95 programs where termination could be shown, AProVE infers an upper bound for 60 programs within 300 seconds. For 7 of these programs, AProVE finds a small constant bound. For these 7 programs, the runtime indeed does not depend on the input variables or on the sizes of the types. For 38 programs, an upper bound is found that depends linearly on the input variable(s) and for 3 more programs, a quadratic upper bound is obtained. Thus, the runtime of these 41 programs is independent of the sizes of the integer types. For 4 programs, AProVE generates an upper bound that only depends on size bound variables. For the remaining 8 programs, the inferred runtime bound depends on both size bound variables and input variables of the function.

The exact concrete runtime bounds are listed separately in a table for programs without signed overflows and a table for programs possibly containing signed overflows. We always ran the back-end tools KoAT and CoFloCo in parallel and took the minimum of the bounds obtained by the two tools. In the tables, nat(x) is defined as x, if x is greater than zero, and zero otherwise. Note that KoAT always generates bounds in terms of the "sizes" (i.e., of the absolute values) of integers. Of course, one can replace |x| by x if x is a variable of an unsigned integer type or if x is a size bound variable that is guaranteed to be non-negative. Note also that to ease readability, in the tables, we wrote the program variables x instead of the corresponding symbolic variables v_x that are assigned to them initially.

Argument Types for Queries

Our implementation currently supports the following argument types for queries:

Int
The argument is an integer.

Alloc
The argument is a pointer of type i8* to some allocated area. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL for a fresh symbolic variable w.

Alloc(c)
The argument is a pointer of type i8* to an allocated area of at least c bytes. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL and w ≥ v + c -1 to the knowledge base KB for a fresh symbolic variable w.

Alloc(#n)
The argument is a pointer of type i8* to an allocated area of at least as many bytes as indicated by the argument with index n (the first argument has index 0). This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL for a fresh symbolic variable w. Moreover, we add an arithmetic formula to the knowledge base KB of that initial state. The formula we add depends on the type of the argument with index n.

If the argument with index n is just a number whose associated symbolic variable is u, then we add the formula w ≥ v + u -1 to KB. So the size of the memory block described by alloc(v,w) is at least as large as the number u.

Otherwise, the argument with index n should be a pointer to some allocated memory area. So if u is the symbolic variable associated to the argument with index n, then in the initial state we should have alloc(u,u') in the allocation list AL. In this case, we add the formula w ≥ v + u' - u to KB. So the size of the memory block described by alloc(v,w) is at least as large as the length of the memory block described by alloc(u,u').

If none of the two cases applies, then the query is not valid.

String
The argument is a pointer of type i8* to a zero-terminated string. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL, w → z to the points-to list PT, and z = 0 to the knowledge base KB for fresh symbolic variables w and z.

String(c)
The argument is a pointer of type i8* to a zero-terminated string, where the allocated area after the string contains at least c - 1 more bytes. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL, w' → z to the points-to list PT, and the formulas z = 0, v ≤ w', w' ≤ w, and w ≥ w' + c to the knowledge base KB for fresh symbolic variables w, w', and z.

String(#n)
The argument is a pointer of type i8* to a zero-terminated string, where the allocated area after the string contains at least as many bytes as indicated by the argument with index n (the first argument has index 0). This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL, w' → z to the points-to list PT, and the formulas z = 0, v ≤ w', and w' ≤ w to the knowledge base KB of that initial state. Moreover, we add another arithmetic formula to KB which depends on the type of the argument with index n.

If the argument with index n is just a number whose associated symbolic variable is u, then we add the formula w ≥ w' + u to KB. So the memory block described by alloc(v,w) has at least u more bytes after the string.

Otherwise, the argument with index n should be a pointer to some allocated memory area. So if u is the symbolic variable associated to the argument with index n, then in the initial state we should have alloc(u,u') in the allocation list AL). In this case, we add the formula w ≥ w' + u' - u to KB. So the memory block described by alloc(v,w) consists of a string and a subsequent memory sub-block that is at least as large as the memory block described by alloc(u,u').

If none of the two cases applies, then the query is not valid.

<ty>[]
The argument is a pointer of type ty* to some allocated area. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL for a fresh symbolic variable w.

<ty>[c]
The argument is a pointer of type ty* to an allocated area of at least c entries of type ty. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL and w ≥ v + c * size(ty) -1 to the knowledge base KB for a fresh symbolic variable w.

<ty>[#n]
The argument is a pointer of type ty* to an allocated area of at least as many bytes as indicated by the argument with index n (the first argument has index 0). This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL for a fresh symbolic variable w. Moreover, we add an arithmetic formula to the knowledge base KB of that initial state. The formula we add depends on the type of the argument with index n.

If the argument with index n is just a number whose associated symbolic variable is u, then we add the formula w ≥ v + u* size(ty) -1 to KB. So the size of the memory block described by alloc(v,w) is large enough to contain u entries of type ty.

Otherwise, the argument with index n should be a pointer to some allocated memory area. So if u is the symbolic variable associated to the argument with index n, then in the initial state we should have alloc(u,u') in the allocation list AL). In this case, we add the formula w ≥ v + (u' - u + 1) * size(ty) - 1 to KB. So the size of the memory block described by alloc(v,w) is large enough to contain as many entries as the length of the memory block described by alloc(u,u').

If none of the two cases applies, then the query is not valid.