- Jera Hensel, Jürgen Giesl,
Florian Frohn, and Thomas Ströder

Proving Termination of Programs with Bitvector Arithmetic by Symbolic Execution, 26 pages.

(Extended version containing**symbolic evaluation rules for all interesting bitvector instructions**.) - Thomas Ströder, Jürgen Giesl, Marc Brockschmidt,
Florian Frohn, Carsten Fuhs, Jera Hensel, Peter Schneider-Kamp, and
Cornelius Aschermann

Automatically Proving Termination and Memory Safety for Programs with Pointer Arithmetic, 39 pages.

(Extended version of the IJCAR 2014 paper submitted to the*Journal of Automated Reasoning*, containing**symbolic evaluation rules to handle pointer arithmetic, etc.**) - Thomas Ströder, Jürgen Giesl, Marc Brockschmidt,
Florian Frohn, Carsten Fuhs, Jera Hensel, and Peter Schneider-Kamp

Proving Termination and Memory Safety for Programs with Pointer Arithmetic.

In*Proceedings of the 7th International Joint Conference on Automated Reasoning (IJCAR '14)*, Vienna, Austria, Lecture Notes in Artificial Intelligence 8562, pages 208-223, 2014. ©Springer-Verlag

As in the termination category of the
SV-COMP,
AProVE
analyzes termination of the `main()`

function of a **C**
program. To this end, AProVE uses the
Clang compiler to translate the
**C** program into a program in the intermediate representation of
**LLVM** (without compiler optimizations
as these are not sound for termination proving, i.e., the compiler may
remove non-terminating code if this code does not contribute to the return
value of a function). To compile **C** programs to **LLVM**
programs, we use the following command:

`clang program.c -S -emit-llvm -o program.llvm`

This **LLVM** program is then analyzed for memory safety and
termination. So in contrast to several other termination tools, AProVE
only considers a program as "terminating" if it can also prove its memory
safety.

If one only wants to analyze termination and memory safety of a specific
**C** function instead of `main()`

, this is possible by adding a **query**
as a comment in the first line. This query comment must be of the form
`/*query: <function_name>(<list_of_arguments>)*/`

.
The `function_name`

is the name of the function to analyze. The
`list_of_arguments`

is a (possibly empty) comma-separated list
of types explained at the bottom of this page.

- We cannot deal with types for floating point numbers at the moment.
- So far, we did not present any methods to treat recursive programs or to prove non-termination of programs. We are currently working on extending our approach accordingly and our implementation already contains some support for recursion and non-termination.
- Except for
`malloc`and`free`, we cannot handle external functions at the moment. However, we allow to just declare functions and use them to produce non-deterministic values. We assume that functions which are only declared, but not defined, have no side effects (in particular, they do not modify their arguments). - We also do not deal with data structures (defined by
`struct`

) yet. Moreover, the handling of arrays is restricted to programs where arrays are compiled to pointers in**LLVM**. This means that statements like`int* array = alloca(4*sizeof(int))`

are supported, but statements like`int array[4]`

are not, because these statements are compiled to an**LLVM**-specific array data type. - We currently support
**LLVM**3.5. Analysis of later versions of**LLVM**will probably not yet work, since**LLVM**is very actively developed. Moreover, the following**LLVM**instructions are not yet supported (most of them occur rarely if one does not use compiler optimizations, can be replaced by other instructions that we support, or deal with floats or complex data structures):`switch`

`indirectbr`

`invoke`

`unwind`

`va_arg`

- all vector and aggregate instructions
- all instructions on floats

- We took the 61 Windows Driver Development Kit examples that are
used for the experimental results in related work (see here and here). We split them into
`c/signed/wdk-signed-overflow`and`c/unsigned/wdk-no-signed-overflow`so that signed overflows, which are handled differently by different tools, only occur in the latter. - We included 61 of the 62 examples
from the repository of Juggernaut where we excluded one example containing a float. These examples can be found in
`c/signed/juggernaut`and`c/unsigned/juggernaut`. - We took
7 of the 9 examples that are introduced here where we excluded
two examples containing floats. These examples can be found in
`c/unsigned/juggernaut-paper`. - We added 4 own examples where the termination proof
depends on the correct modeling of overflows during multiplication (
`c/unsigned/mult`). - Moreover, we added 4 own examples which combine
pointer and bitvector arithmetic (
`c/unsigned/pointer`).

The tables below summarize our experiments. They show that on our collection (which mainly consists
of the examples from the evaluations of the other tools), AProVE is most powerful.
Here,
**T** gives the number of examples where termination was proven,
**F** means that the proof failed in less than 300 seconds,
**TO** stands for "time-out" and means that the proof took longer than 300 seconds, and
**RT** gives the average runtime of those examples where termination could be
proven. By clicking on the name of the example set, you can inspect the
details of the respective experiments.
Since Ultimate does not support bitvector arithmetic
for signed integers yet, we divided the examples such that programs for which the termination proof depends on the correct handling of signed integers only
occur in the first row of the table.

Example Set | Size | AProVE | 2LS | KITTeL | Juggernaut | Ultimate | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

T | F | TO | RT | T | F | TO | RT | T | F | TO | RT | T | F | TO | RT | T | F | TO | RT | ||

Possibly Signed Overflows | 52 | 34 | 9 | 9 | 10.23 | 23 | 29 | 0 | 0.37 | 27 | 4 | 21 | 1.81 | 10 | 19 | 23 | 34.12 | -- | -- | -- | -- |

No Signed Overflows | 66 | 61 | 3 | 2 | 5.55 | 45 | 21 | 0 | 0.33 | 33 | 3 | 30 | 14.17 | 22 | 26 | 18 | 6.22 | 11 | 54 | 1 | 12.77 |

To evaluate the impact of our approach to represent both unsigned and signed values, we also ran AProVE in a mode where all values are represented as signed integers. Here, we performed equally good on programs with signed integers but lost 11 termination proofs for programs with unsigned integers. Among these 11 programs, 2LS was able to find termination proofs for 6 programs, Juggernaut proved termination for 5 programs, KITTeL (which treats all integers as signed) only found a single termination proof, and Ultimate could not prove any program to be terminating.

To evaluate the use of case analysis vs. modulo relations, we also tested a version of AProVE where we did not use overflow refinements. Here, we failed on 13 more examples. KITTeL (which also uses refinements) was able to prove termination for 9 of these 13 programs, whereas 2LS resp. Juggernaut only found termination proofs for 2 resp. 5 programs and Ultimate was not able to prove termination for any of these programs.

For our four programs that combine pointer and bitvector arithmetic, Ultimate was the only tool that was able to prove termination for three of them. The other tools did not find any proof.

`Int`

The argument is an integer.`Alloc`

The argument is a pointer of type`i8*`

to some allocated area. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*.`Alloc(c)`

The argument is a pointer of type`i8*`

to an allocated area of at least`c`

bytes. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*and*w ≥ v + c -1*to the knowledge base*KB*for a fresh symbolic variable*w*.`Alloc(#n)`

The argument is a pointer of type`i8*`

to an allocated area of at least as many bytes as indicated by the argument with index`n`

(the first argument has index 0). This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*. Moreover, we add an arithmetic formula to the knowledge base*KB*of that initial state. The formula we add depends on the type of the argument with index`n`

.

If the argument with index`n`

is just a number whose associated symbolic variable is*u*, then we add the formula*w ≥ v + u -1*to*KB*. So the size of the memory block described by*alloc(v,w)*is at least as large as the number*u*.

Otherwise, the argument with index`n`

should be a pointer to some allocated memory area. So if*u*is the symbolic variable associated to the argument with index`n`

, then in the initial state we should have*alloc(u,u')*in the allocation list*AL*. In this case, we add the formula*w ≥ v + u' - u*to*KB*. So the size of the memory block described by*alloc(v,w)*is at least as large as the length of the memory block described by*alloc(u,u')*.

If none of the two cases applies, then the query is not valid.`String`

The argument is a pointer of type`i8*`

to a zero-terminated string. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*,*w → z*to the points-to list*PT*, and*z = 0*to the knowledge base*KB*for fresh symbolic variables*w*and*z*.`String(c)`

The argument is a pointer of type`i8*`

to a zero-terminated string, where the allocated area after the string contains at least`c`

- 1 more bytes. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*,*w' → z*to the points-to list*PT*, and the formulas*z = 0*,*v ≤ w'*,*w' ≤ w*, and*w ≥ w' + c*to the knowledge base*KB*for fresh symbolic variables*w*,*w'*, and*z*.`String(#n)`

The argument is a pointer of type`i8*`

to a zero-terminated string, where the allocated area after the string contains at least as many bytes as indicated by the argument with index`n`

(the first argument has index 0). This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*,*w' → z*to the points-to list*PT*, and the formulas*z = 0*,*v ≤ w'*, and*w' ≤ w*to the knowledge base*KB*of that initial state. Moreover, we add another arithmetic formula to*KB*which depends on the type of the argument with index`n`

.

If the argument with index`n`

is just a number whose associated symbolic variable is*u*, then we add the formula*w ≥ w' + u*to*KB*. So the memory block described by*alloc(v,w)*has at least*u*more bytes after the string.

Otherwise, the argument with index`n`

should be a pointer to some allocated memory area. So if*u*is the symbolic variable associated to the argument with index`n`

, then in the initial state we should have*alloc(u,u')*in the allocation list*AL*). In this case, we add the formula*w ≥ w' + u' - u*to*KB*. So the memory block described by*alloc(v,w)*consists of a string and a subsequent memory sub-block that is at least as large as the memory block described by*alloc(u,u')*.

If none of the two cases applies, then the query is not valid.`<ty>[]`

The argument is a pointer of type`ty*`

to some allocated area. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*.`<ty>[c]`

The argument is a pointer of type`ty*`

to an allocated area of at least`c`

entries of type`ty`

. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*and*w ≥ v + c * size(ty) -1*to the knowledge base*KB*for a fresh symbolic variable*w*.`<ty>[#n]`

The argument is a pointer of type`ty*`

to an allocated area of at least as many bytes as indicated by the argument with index`n`

(the first argument has index 0). This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*. Moreover, we add an arithmetic formula to the knowledge base*KB*of that initial state. The formula we add depends on the type of the argument with index`n`

.

If the argument with index`n`

is just a number whose associated symbolic variable is*u*, then we add the formula*w ≥ v + u* size(ty) -1*to*KB*. So the size of the memory block described by*alloc(v,w)*is large enough to contain*u*entries of type*ty*.

Otherwise, the argument with index`n`

should be a pointer to some allocated memory area. So if*u*is the symbolic variable associated to the argument with index`n`

, then in the initial state we should have*alloc(u,u')*in the allocation list*AL*). In this case, we add the formula*w ≥ v + (u' - u + 1) * size(ty) - 1*to*KB*. So the size of the memory block described by*alloc(v,w)*is large enough to contain as many entries as the length of the memory block described by*alloc(u,u')*.

If none of the two cases applies, then the query is not valid.