- Thomas Ströder, Jürgen Giesl, Marc Brockschmidt,
Florian Frohn, Carsten Fuhs, Jera Hensel, and Peter Schneider-Kamp

Automated Termination Analysis for Programs with Pointer Arithmetic, 15 pages

- Thomas Ströder, Jürgen Giesl, Marc Brockschmidt,
Florian Frohn, Carsten Fuhs, Jera Hensel, and Peter Schneider-Kamp

Automated Termination Analysis for Programs with Pointer Arithmetic (Extended Version), Technical Report AIB 2014-05, RWTH Aachen, 27 pages.

This version**includes all proofs of the theorems**.

As in the termination category of the
SV-COMP 2014, AProVE
analyzes termination of the `main()`

function of a **C**
program. To this end, AProVE uses the
Clang compiler to translate the
**C** program into a program in the intermediate representation of
**LLVM** (without compiler optimizations
as these are not sound for termination proving, i.e., the compiler may
remove non-terminating code if this code does not contribute to the return
value of a function). To compile **C** programs to **LLVM**
programs, we use the following command:

`clang program.c -S -emit-llvm -o program.llvm`

This **LLVM** program is then analyzed for memory safety and
termination. So in contrast to several other termination tools, AProVE
only considers a program as "terminating" if it can also prove its memory
safety.

If one only wants to analyze termination of a specific **C** function
instead of `main()`

, this is possible by adding a **query**
as a comment in the first line. This query comment must be of the form
`/*query: <function_name>(<list_of_arguments>)*/`

.
The `function_name`

is the name of the function to analyze. The
`list_of_arguments`

is a (possibly empty) comma-separated list
of types explained at the bottom of this page.

- The implementation can handle programs with several functions, but it cannot deal with recursion yet.
- We cannot deal with types for floating point numbers at the moment.
- Integers are treated as the infinite set ℤ. More precisely, we disregard integer overflows and assume that variables are only instantiated with signed integers appropriate for their type.
- We cannot handle external functions at the moment. This means that the
`alloca`function (which is also available as an instruction in**LLVM**) is supported by AProVE, but functions like`malloc`and`free`are currently not supported (since they are not available as**LLVM**instructions). However, we allow to just declare functions and use them to produce non-deterministic values. We assume that functions which are only declared, but not defined, have no side effects (in particular, they do not modify their arguments). - We also do not deal with data structures (defined by
`struct`

) yet. Moreover, the handling of arrays is restricted to programs where arrays are compiled to pointers in**LLVM**. This means that statements like`int* array = alloca(4*sizeof(int))`

are supported, but statements like`int array[4]`

are not, because these statements are compiled to an**LLVM**-specific array data type. - We do not treat certain special features that were used in the "ext" library of the
benchmarks for the SV-COMP termination
category (but which are not part of the usual
**C**language): More precisely, our implementation does not interpret assert- or assume-functions as intended in this library, but treats them like ordinary**C**functions or functions producing non-deterministic values as described above. Moreover, we have no special treatment for the label "ERROR", but consider loops like "ERROR: goto ERROR" as non-terminating. For that reason, a meaningful comparison of the termination tools on the examples of this library is not possible. Therefore, we restricted ourselves to the "termination-crafted" library of the SV-COMP in our experiments (where these special features do not occur). - Up to now, our parser reliably supports
**LLVM**2.9. Later versions of**LLVM**might work, but we cannot guarantee this as**LLVM**is very actively developed. Moreover, the following**LLVM**instructions are not yet supported (most of them occur rarely if one uses no compiler optimizations, can be replaced by other ones which we support, or deal with floats or complex data structures):`switch`

`indirectbr`

`invoke`

`unwind`

`va_arg`

- all vector and aggregate instructions
- all instructions on floats

- 46 integer programs provided to us by Stephan Falke, the developer of the termination analyzer KITTeL.
- 7 pointer programs provided to us by Stephan Falke.
- 46 examples which are variations of the integer programs by Stephan Falke.
We modified those examples such that they work with pointers.

The motivation for this modification is that when compiling integer programs to**LLVM**without optimizations, the resulting**LLVM**code allocates memory and uses pointers for all local variables. Thus, even for integer programs, proving memory safety in**LLVM**is non-trivial. To take this effect into account in our comparison, we also considered examples where we performed this compilation of local variables to pointers already on the**C**code level. Note that using a compilation with optimizations would not be sound for termination analysis, as the compiler might remove code which does not contribute to the result, including possibly non-terminating code. - 33 integer programs from the library used in the demonstration section for termination at the SV-COMP 2014. We included all integer programs from the "termination-crafted" library (revision 428) of the SV-COMP except those 7 examples that use recursion.
- 33 examples which are variations of the integer programs from the SV-COMP 2014. We modified those examples such that they work with pointers as for the integer programs by Stephan Falke above.
- 11 programs for string processing taken from the Wikibooks article C Programming/Strings. 7 of them have also been part of the "termination-crafted" library of the SV-COMP 2014. Together with the integer programs above, we have included all examples from this library in our example set except those 7 examples using recursion.
- 15 programs for string processing from the OpenBSD standard library for strings.
- 3 further programs from the OpenBSD standard library for strings manipulating plain memory (i.e., these functions work on allocated memory areas that are not zero-terminated).
- 14 new pointer examples containing features like accessing the same addresses with pointers of different types (2), array algorithms where termination depends on the contents of the arrays (6), further string and array algorithms (4), one example performing arithmetic operations on values stored in the memory, and one example deleting contents of the memory where memory safety is only ensured by comparisons of pointer addresses (and not by computing the addresses from the base address of the allocated memory area).

The tables below summarize our experiments. They show that our approach is slightly more powerful
than the other tools
for integer programs and clearly the most powerful one for
pointer programs. Here,
**T** gives the number of examples where termination was proven,
**N** gives the number of examples where non-termination was proven,
**F** means that the proof failed in less than 300 seconds,
**TO** gives the number of examples where the tool took longer than 300 seconds, and
**RT** gives the average runtime of those examples where termination could either be proven or disproven. By clicking on the name of the example set, you can inspect the details of the respective experiments.

Example Set | Size | AProVE | FuncTion | KITTeL | T2 | TAN | Ultimate | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

T | N | F | TO | RT | T | N | F | TO | RT | T | N | F | TO | RT | T | N | F | TO | RT | T | N | F | TO | RT | T | N | F | TO | RT | ||

Integer | 79 | 67 | 0 | 11 | 1 | 19.6 | 11 | 0 | 66 | 2 | 23.1 | 58 | 0 | 12 | 9 | 0.2 | 55 | 0 | 23 | 1 | 1.8 | 31 | 0 | 37 | 11 | 2.4 | 57 | 4 | 12 | 6 | 3.2 |

Pointer | 129 | 91 | 0 | 19 | 19 | 58.6 | - | - | - | - | - | 9 | 0 | 1 | 119 | 0.2 | 6 | 0 | 123 | 0 | 3.6 | 3 | 0 | 124 | 2 | 10.6 | - | - | - | - | - |

To allow a sensible comparison, in this experiment, we used a variant of AProVE which only checks memory safety, but does not prove termination afterwards. For CPAchecker, we used the version that participated in the SV-COMP, but we called it in an "unbounded" configuration such that it only returns "

Example Set | Size | AProVE | CPAchecker | Predator | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

T | F | TO | RT | T | F | TO | RT | T | F | TO | RT | ||

Alloca | 129 | 102 | 15 | 12 | 47.9 | - | - | - | - | 79 | 49 | 1 | 2.1 |

Malloc | 129 | - | - | - | - | 77 | 52 | 0 | 1.5 | 79 | 49 | 1 | 11.6 |

`Int`

The argument is an integer.`Pos`

The argument is a positive integer.`Neg`

The argument is a negative integer.`NonPos`

The argument is a non-positive integer.`NonNeg`

The argument is a non-negative integer.`Alloc`

The argument is a pointer of type`i8*`

to some allocated area. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*.`Alloc(c)`

The argument is a pointer of type`i8*`

to an allocated area of at least`c`

bytes. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*and*w ≥ v + c -1*to the knowledge base*KB*for a fresh symbolic variable*w*.`Alloc(#n)`

The argument is a pointer of type`i8*`

to an allocated area of at least as many bytes as indicated by the argument with index`n`

(the first argument has index 0). This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*. Moreover, we add an arithmetic formula to the knowledge base*KB*of that initial state. The formula we add depends on the type of the argument with index`n`

.

If the argument with index`n`

is just a number whose associated symbolic variable is*u*, then we add the formula*w ≥ v + u -1*to*KB*. So the size of the memory block described by*alloc(v,w)*is at least as large as the number*u*.

Otherwise, the argument with index`n`

should be a pointer to some allocated memory area. So if*u*is the symbolic variable associated to the argument with index`n`

, then in the initial state we should have*alloc(u,u')*in the allocation list*AL*. In this case, we add the formula*w ≥ v + u' - u*to*KB*. So the size of the memory block described by*alloc(v,w)*is at least as large as the length of the memory block described by*alloc(u,u')*.

If none of the two cases applies, then the query is not valid.`String`

The argument is a pointer of type`i8*`

to a zero-terminated string. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*,*w → z*to the points-to list*PT*, and*z = 0*to the knowledge base*KB*for fresh symbolic variables*w*and*z*.`String(c)`

The argument is a pointer of type`i8*`

to a zero-terminated string, where the allocated area after the string contains at least`c`

- 1 more bytes. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*,*w' → z*to the points-to list*PT*, and the formulas*z = 0*,*v ≤ w'*,*w' ≤ w*, and*w ≥ w' + c*to the knowledge base*KB*for fresh symbolic variables*w*,*w'*, and*z*.`String(#n)`

The argument is a pointer of type`i8*`

to a zero-terminated string, where the allocated area after the string contains at least as many bytes as indicated by the argument with index`n`

(the first argument has index 0). This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*,*w' → z*to the points-to list*PT*, and the formulas*z = 0*,*v ≤ w'*, and*w' ≤ w*to the knowledge base*KB*of that initial state. Moreover, we add another arithmetic formula to*KB*which depends on the type of the argument with index`n`

.

If the argument with index`n`

is just a number whose associated symbolic variable is*u*, then we add the formula*w ≥ w' + u*to*KB*. So the memory block described by*alloc(v,w)*has at least*u*more bytes after the string.

Otherwise, the argument with index`n`

should be a pointer to some allocated memory area. So if*u*is the symbolic variable associated to the argument with index`n`

, then in the initial state we should have*alloc(u,u')*in the allocation list*AL*). In this case, we add the formula*w ≥ w' + u' - u*to*KB*. So the memory block described by*alloc(v,w)*consists of a string and a subsequent memory sub-block that is at least as large as the memory block described by*alloc(u,u')*.

If none of the two cases applies, then the query is not valid.`<ty>[]`

The argument is a pointer of type`ty*`

to some allocated area. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*.`<ty>[c]`

The argument is a pointer of type`ty*`

to an allocated area of at least`c`

entries of type`ty`

. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*and*w ≥ v + c * size(ty) -1*to the knowledge base*KB*for a fresh symbolic variable*w*.`<ty>[#n]`

The argument is a pointer of type`ty*`

to an allocated area of at least as many bytes as indicated by the argument with index`n`

(the first argument has index 0). This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*. Moreover, we add an arithmetic formula to the knowledge base*KB*of that initial state. The formula we add depends on the type of the argument with index`n`

.

If the argument with index`n`

is just a number whose associated symbolic variable is*u*, then we add the formula*w ≥ v + u* size(ty) -1*to*KB*. So the size of the memory block described by*alloc(v,w)*is large enough to contain*u*entries of type*ty*.

Otherwise, the argument with index`n`

should be a pointer to some allocated memory area. So if*u*is the symbolic variable associated to the argument with index`n`

, then in the initial state we should have*alloc(u,u')*in the allocation list*AL*). In this case, we add the formula*w ≥ v + (u' - u + 1) * size(ty) - 1*to*KB*. So the size of the memory block described by*alloc(v,w)*is large enough to contain as many entries as the length of the memory block described by*alloc(u,u')*.

If none of the two cases applies, then the query is not valid.