- Thomas Ströder, Jürgen Giesl, Marc Brockschmidt,
Florian Frohn, Carsten Fuhs, Jera Hensel, Peter Schneider-Kamp, and
Cornelius Aschermann

Automatically Proving Termination and Memory Safety for Programs with Pointer Arithmetic,*Journal of Automated Reasoning*, to appear. - Thomas Ströder, Jürgen Giesl, Marc Brockschmidt,
Florian Frohn, Carsten Fuhs, Jera Hensel, Peter Schneider-Kamp, and
Cornelius Aschermann

Automatically Proving Termination and Memory Safety for Programs with Pointer Arithmetic, Technical Report AIB-2016-09, RWTH Aachen, Germany, 39 pages.

(Extended version containing**the correctness proofs of the concrete evaluation rules**) - Thomas Ströder, Jürgen Giesl, Marc Brockschmidt,
Florian Frohn, Carsten Fuhs, Jera Hensel, and Peter Schneider-Kamp

Proving Termination and Memory Safety for Programs with Pointer Arithmetic, 15 pages

(Conference version)

As in the termination category of the
SV-COMP 2016 and
TermComp 2015, AProVE
analyzes termination of the `main()`

function of a **C**
program. To this end, AProVE uses the
Clang compiler to translate the
**C** program into a program in the intermediate representation of
**LLVM** (without compiler optimizations
as these are not sound for termination proving, i.e., the compiler may
remove non-terminating code if this code does not contribute to the return
value of a function). To compile **C** programs to **LLVM**
programs, we use the following command:

`clang program.c -S -emit-llvm -o program.llvm`

This **LLVM** program is then analyzed for memory safety and
termination. So in contrast to several other termination tools, AProVE
only considers a program as "terminating" if it can also prove its memory
safety.

If one only wants to analyze termination and memory safety of a specific
**C** function instead of `main()`

, this is possible by adding a **query**
as a comment in the first line. This query comment must be of the form
`/*query: <function_name>(<list_of_arguments>)*/`

.
The `function_name`

is the name of the function to analyze. The
`list_of_arguments`

is a (possibly empty) comma-separated list
of types explained at the bottom of this page.

- We cannot deal with types for floating point numbers at the moment.
- Integers are treated as the infinite set ℤ. More precisely, we disregard integer overflows and assume that variables are only instantiated with signed integers appropriate for their type.
- So far, we did not present any methods to treat recursive programs or to prove non-termination of programs. We are currently working on extending our approach accordingly and our implementation already contains some support for recursion and non-termination.
- Except for
`malloc`and`free`, we cannot handle external functions at the moment. However, we allow to just declare functions and use them to produce non-deterministic values. We assume that functions which are only declared, but not defined, have no side effects (in particular, they do not modify their arguments). - We also do not deal with data structures (defined by
`struct`

) yet. Moreover, the handling of arrays is restricted to programs where arrays are compiled to pointers in**LLVM**. This means that statements like`int* array = alloca(4*sizeof(int))`

are supported, but statements like`int array[4]`

are not, because these statements are compiled to an**LLVM**-specific array data type. - Wie currently support
**LLVM**3.5. Later versions of**LLVM**might work, but we cannot guarantee this as**LLVM**is very actively developed. Moreover, the following**LLVM**instructions are not yet supported (most of them occur rarely if one uses no compiler optimizations, can be replaced by other ones which we support, or deal with floats or complex data structures):`switch`

`indirectbr`

`invoke`

`unwind`

`va_arg`

- all vector and aggregate instructions
- all instructions on floats

The tables below summarize our experiments. They show that our approach is more
powerful than the other tools for proving termination of the selected programs.
The second row of the table shows the results for only those 498 programs
of the example set which are not known to be non-terminating.
Here,
**T** gives the number of examples where termination was proven,
**N** gives the number of examples where non-termination was proven,
**F** means that the proof failed in less than 900 seconds or took longer than 900 seconds, and
**RT** gives the average runtime of those examples where termination could either be
proven or disproven. By clicking on the name of the example set, you can inspect the
details of the respective experiments.

Example Set | Size | AProVE | Ultimate | HipTNT+ | SeaHorn | KITTeL | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

T | N | F | RT | T | N | F | RT | T | N | F | RT | T | N | F | RT | T | N | F | RT | ||

All Programs | 631 | 409 | 91 | 131 | 34.4 | 392 | 111 | 128 | 23.1 | 312 | 107 | 212 | 1.4 | 247 | 78 | 306 | 12.7 | 221 | 0 | 410 | 0.2 |

Terminating Programs | 498 | 409 | - | 89 | 39.9 | 392 | - | 106 | 24.6 | 310 | 1 | 187 | 1.3 | 245 | - | 253 | 10.9 | 220 | - | 278 | 0.2 |

In this experiment, we used a variant of AProVE which only checks memory safety, but does not prove termination afterwards. All three tools used the assumption that allocation of memory always succeeds. So in the table below,

Example Set | Size | AProVE | Predator | Thor | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

T | N | F | RT | T | N | F | RT | T | N | F | RT | ||

Memory Safety | 631 | 547 | 0 | 84 | 12.5 | 431 | 154 | 46 | 2.0 | 336 | 0 | 295 | 0.7 |

The table shows that for our examples, AProVE is the most powerful tool to prove memory safety. However, this comparison is only meant to give a rough impression of AProVE's power for analyzing memory safety. The above experiments cannot be used for a detailed comparison of the tools, since Thor does not handle explicit byte-accurate pointer arithmetic and Predator considers bounded integers, whereas AProVE assumes integers to be unbounded. For that reason, the resulting notions of memory safety are incomparable. Moreover, Predator is able to disprove memory safety, while AProVE currently can only prove but not disprove memory safety.

`Int`

The argument is an integer.`Pos`

The argument is a positive integer.`Neg`

The argument is a negative integer.`NonPos`

The argument is a non-positive integer.`NonNeg`

The argument is a non-negative integer.`Alloc`

The argument is a pointer of type`i8*`

to some allocated area. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*.`Alloc(c)`

The argument is a pointer of type`i8*`

to an allocated area of at least`c`

bytes. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*and*w ≥ v + c -1*to the knowledge base*KB*for a fresh symbolic variable*w*.`Alloc(#n)`

The argument is a pointer of type`i8*`

to an allocated area of at least as many bytes as indicated by the argument with index`n`

(the first argument has index 0). This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*. Moreover, we add an arithmetic formula to the knowledge base*KB*of that initial state. The formula we add depends on the type of the argument with index`n`

.

If the argument with index`n`

is just a number whose associated symbolic variable is*u*, then we add the formula*w ≥ v + u -1*to*KB*. So the size of the memory block described by*alloc(v,w)*is at least as large as the number*u*.

Otherwise, the argument with index`n`

should be a pointer to some allocated memory area. So if*u*is the symbolic variable associated to the argument with index`n`

, then in the initial state we should have*alloc(u,u')*in the allocation list*AL*. In this case, we add the formula*w ≥ v + u' - u*to*KB*. So the size of the memory block described by*alloc(v,w)*is at least as large as the length of the memory block described by*alloc(u,u')*.

If none of the two cases applies, then the query is not valid.`String`

The argument is a pointer of type`i8*`

to a zero-terminated string. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*,*w → z*to the points-to list*PT*, and*z = 0*to the knowledge base*KB*for fresh symbolic variables*w*and*z*.`String(c)`

The argument is a pointer of type`i8*`

to a zero-terminated string, where the allocated area after the string contains at least`c`

- 1 more bytes. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*,*w' → z*to the points-to list*PT*, and the formulas*z = 0*,*v ≤ w'*,*w' ≤ w*, and*w ≥ w' + c*to the knowledge base*KB*for fresh symbolic variables*w*,*w'*, and*z*.`String(#n)`

The argument is a pointer of type`i8*`

to a zero-terminated string, where the allocated area after the string contains at least as many bytes as indicated by the argument with index`n`

(the first argument has index 0). This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*,*w' → z*to the points-to list*PT*, and the formulas*z = 0*,*v ≤ w'*, and*w' ≤ w*to the knowledge base*KB*of that initial state. Moreover, we add another arithmetic formula to*KB*which depends on the type of the argument with index`n`

.

If the argument with index`n`

is just a number whose associated symbolic variable is*u*, then we add the formula*w ≥ w' + u*to*KB*. So the memory block described by*alloc(v,w)*has at least*u*more bytes after the string.

Otherwise, the argument with index`n`

should be a pointer to some allocated memory area. So if*u*is the symbolic variable associated to the argument with index`n`

, then in the initial state we should have*alloc(u,u')*in the allocation list*AL*). In this case, we add the formula*w ≥ w' + u' - u*to*KB*. So the memory block described by*alloc(v,w)*consists of a string and a subsequent memory sub-block that is at least as large as the memory block described by*alloc(u,u')*.

If none of the two cases applies, then the query is not valid.`<ty>[]`

The argument is a pointer of type`ty*`

to some allocated area. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*.`<ty>[c]`

The argument is a pointer of type`ty*`

to an allocated area of at least`c`

entries of type`ty`

. This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*and*w ≥ v + c * size(ty) -1*to the knowledge base*KB*for a fresh symbolic variable*w*.`<ty>[#n]`

The argument is a pointer of type`ty*`

to an allocated area of at least as many bytes as indicated by the argument with index`n`

(the first argument has index 0). This means that if*v*is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add*alloc(v,w)*to the allocation list*AL*for a fresh symbolic variable*w*. Moreover, we add an arithmetic formula to the knowledge base*KB*of that initial state. The formula we add depends on the type of the argument with index`n`

.

If the argument with index`n`

is just a number whose associated symbolic variable is*u*, then we add the formula*w ≥ v + u* size(ty) -1*to*KB*. So the size of the memory block described by*alloc(v,w)*is large enough to contain*u*entries of type*ty*.

Otherwise, the argument with index`n`

should be a pointer to some allocated memory area. So if*u*is the symbolic variable associated to the argument with index`n`

, then in the initial state we should have*alloc(u,u')*in the allocation list*AL*). In this case, we add the formula*w ≥ v + (u' - u + 1) * size(ty) - 1*to*KB*. So the size of the memory block described by*alloc(v,w)*is large enough to contain as many entries as the length of the memory block described by*alloc(u,u')*.

If none of the two cases applies, then the query is not valid.