Experimental Evaluation of "Automatically Proving Termination and Memory Safety for Programs with Pointer Arithmetic"

While automated verification of imperative programs has been studied intensively, proving termination of programs with explicit pointer arithmetic fully automatically was still an open problem. To close this gap, we introduce a novel abstract domain that can track allocated memory in detail. We use it to automatically construct a symbolic execution graph that over-approximates all possible runs of the program and that can be used to prove memory safety. This graph is then transformed into an integer transition system, whose termination can be proved by standard techniques. We implemented this approach in the automated termination prover AProVE. On this webpage, we describe extensive experimental results which demonstrate the power of our approach compared to existing tools. In particular, we demonstrate its capability of analyzing C programs with pointer arithmetic that existing termination tools cannot handle.

Literature

Here, we provide links to our papers which describe our approach.

Thomas Ströder, Jürgen Giesl, Marc Brockschmidt, Florian Frohn, Carsten Fuhs, Jera Hensel, Peter Schneider-Kamp, and Cornelius Aschermann
Automatically Proving Termination and Memory Safety for Programs with Pointer Arithmetic, Journal of Automated Reasoning, to appear.

Thomas Ströder, Jürgen Giesl, Marc Brockschmidt, Florian Frohn, Carsten Fuhs, Jera Hensel, Peter Schneider-Kamp, and Cornelius Aschermann
Automatically Proving Termination and Memory Safety for Programs with Pointer Arithmetic, Technical Report AIB-2016-09, RWTH Aachen, Germany, 39 pages.
(Extended version containing the correctness proofs of the concrete evaluation rules)

Thomas Ströder, Jürgen Giesl, Marc Brockschmidt, Florian Frohn, Carsten Fuhs, Jera Hensel, and Peter Schneider-Kamp
Proving Termination and Memory Safety for Programs with Pointer Arithmetic, 15 pages
(Conference version)

Implementation in AProVE

We integrated the analysis of C and LLVM programs into our termination tool AProVE. The tool is available both via a web interface and for download. An additional variant of AProVE which only handles C and LLVM programs is also available via a web interface and as a ZIP archive (also containing installation instructions for external software used by AProVE, see the README file in the archive). In particular, this variant of AProVE allows to repeat the experiments below. When using AProVE's web interface, please keep in mind that the computer running the web interface is considerably slower than the one used for the experiments. Moreover, the computer used for the web interface is used by several other applications as well, so the runtimes may vary.

As in the termination category of the SV-COMP 2016 and TermComp 2015, AProVE analyzes termination of the main() function of a C program. To this end, AProVE uses the Clang compiler to translate the C program into a program in the intermediate representation of LLVM (without compiler optimizations as these are not sound for termination proving, i.e., the compiler may remove non-terminating code if this code does not contribute to the return value of a function). To compile C programs to LLVM programs, we use the following command:

clang program.c -S -emit-llvm -o program.llvm

This LLVM program is then analyzed for memory safety and termination. So in contrast to several other termination tools, AProVE only considers a program as "terminating" if it can also prove its memory safety.

If one only wants to analyze termination and memory safety of a specific C function instead of main(), this is possible by adding a query as a comment in the first line. This query comment must be of the form /*query: <function_name>(<list_of_arguments>)*/. The function_name is the name of the function to analyze. The list_of_arguments is a (possibly empty) comma-separated list of types explained at the bottom of this page.

Limitations

Our implementation currently has the following restrictions and limitations:

We cannot deal with types for floating point numbers at the moment.
Integers are treated as the infinite set ℤ. More precisely, we disregard integer overflows and assume that variables are only instantiated with signed integers appropriate for their type.
So far, we did not present any methods to treat recursive programs or to prove non-termination of programs. We are currently working on extending our approach accordingly and our implementation already contains some support for recursion and non-termination.
Except for malloc and free, we cannot handle external functions at the moment. However, we allow to just declare functions and use them to produce non-deterministic values. We assume that functions which are only declared, but not defined, have no side effects (in particular, they do not modify their arguments).
We also do not deal with data structures (defined by struct) yet. Moreover, the handling of arrays is restricted to programs where arrays are compiled to pointers in LLVM. This means that statements like int* array = alloca(4*sizeof(int)) are supported, but statements like int array[4] are not, because these statements are compiled to an LLVM-specific array data type.
Wie currently support LLVM 3.5. Later versions of LLVM might work, but we cannot guarantee this as LLVM is very actively developed. Moreover, the following LLVM instructions are not yet supported (most of them occur rarely if one uses no compiler optimizations, can be replaced by other ones which we support, or deal with floats or complex data structures):
- switch
- indirectbr
- invoke
- unwind
- va_arg
- all vector and aggregate instructions
- all instructions on floats

Examples

For our experiments, we used a collection of 631 examples. These examples have been taken from the termination category of SV-COMP 2016. You can download the examples here .

Experiments for Termination

We compared AProVE to the most powerful other tools from the termination category of SV-COMP 2016 and TermComp 2015. In addition, we included the tool KITTeL, which also analyzes LLVM programs. For the participants of the termination category of SV-COMP 2016, we used the final competition results. For HipTNT+ and KITTeL, we performed experiments on the same set of programs using a computer with an Intel Xeon with 4 cores clocked at 2.33 GHz each and 16 GB of RAM. For both, a time-out of 900 seconds for each example was used.

The tables below summarize our experiments. They show that our approach is more powerful than the other tools for proving termination of the selected programs. The second row of the table shows the results for only those 498 programs of the example set which are not known to be non-terminating. Here, T gives the number of examples where termination was proven, N gives the number of examples where non-termination was proven, F means that the proof failed in less than 900 seconds or took longer than 900 seconds, and RT gives the average runtime of those examples where termination could either be proven or disproven. By clicking on the name of the example set, you can inspect the details of the respective experiments.

Example Set Size AProVE Ultimate HipTNT+ SeaHorn KITTeL

T N F RT T N F RT T N F RT T N F RT T N F RT

All Programs 631 409 91 131 34.4 392 111 128 23.1 312 107 212 1.4 247 78 306 12.7 221 0 410 0.2

Terminating Programs 498 409 - 89 39.9 392 - 106 24.6 310 1 187 1.3 245 - 253 10.9 220 - 278 0.2

Example Set	Size	AProVE	Ultimate	HipTNT+	SeaHorn	KITTeL
T	N	F	RT	T	N	F	RT	T	N	F	RT	T	N	F	RT	T	N	F	RT
All Programs	631	409	91	131	34.4	392	111	128	23.1	312	107	212	1.4	247	78	306	12.7	221	0	410	0.2
Terminating Programs	498	409	-	89	39.9	392	-	106	24.6	310	1	187	1.3	245	-	253	10.9	220	-	278	0.2

Experiments for Memory Safety

Although our approach is targeted toward termination and only analyzes memory safety as a prerequisite for termination, it turned out that on our collection, the power of AProVE is comparable to the power of the leading tools for proving memory safety. This is shown by the following experiments, where we compared AProVE with the tools Predator and Thor. We did not run experiments with SLAyer due to the differences in the assumed C semantics. The focus of SLAyer is reasoning about the shape of mutable linked data structures. Therefore, SLAyer treats memory safety at a per-object granularity and does not handle byte-accurate pointer arithmetic. We also excluded the bounded model checker LLBMC from our experiments, as bounded model checkers only disprove, but cannot verify memory safety.

In this experiment, we used a variant of AProVE which only checks memory safety, but does not prove termination afterwards. All three tools used the assumption that allocation of memory always succeeds. So in the table below, T indicates that the tool proved memory safety, N gives the number of examples where memory safety was disproven, F means that the proof failed in less than 300 seconds or took longer than 300 seconds, and RT gives the average runtime of those examples where memory safety could either be proven or disproven. Our experiments were again performed on a computer with an Intel Xeon with 4 cores clocked at 2.33 GHz each and 16 GB of RAM, using a time-out of 300 seconds for each example. By clicking on the name of the example set, you can inspect the details of the respective experiments.

Example Set	Size	AProVE				Predator				Thor
Example Set	Size	T	N	F	RT	T	N	F	RT	T	N	F	RT
Memory Safety	631	547	0	84	12.5	431	154	46	2.0	336	0	295	0.7

The table shows that for our examples, AProVE is the most powerful tool to prove memory safety. However, this comparison is only meant to give a rough impression of AProVE's power for analyzing memory safety. The above experiments cannot be used for a detailed comparison of the tools, since Thor does not handle explicit byte-accurate pointer arithmetic and Predator considers bounded integers, whereas AProVE assumes integers to be unbounded. For that reason, the resulting notions of memory safety are incomparable. Moreover, Predator is able to disprove memory safety, while AProVE currently can only prove but not disprove memory safety.

Argument Types for Queries

Our implementation currently supports the following argument types for queries:

Int
The argument is an integer.

Pos
The argument is a positive integer.

Neg
The argument is a negative integer.

NonPos
The argument is a non-positive integer.

NonNeg
The argument is a non-negative integer.

Alloc
The argument is a pointer of type i8* to some allocated area. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL for a fresh symbolic variable w.

Alloc(c)
The argument is a pointer of type i8* to an allocated area of at least c bytes. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL and w ≥ v + c -1 to the knowledge base KB for a fresh symbolic variable w.

Alloc(#n)
The argument is a pointer of type i8* to an allocated area of at least as many bytes as indicated by the argument with index n (the first argument has index 0). This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL for a fresh symbolic variable w. Moreover, we add an arithmetic formula to the knowledge base KB of that initial state. The formula we add depends on the type of the argument with index n.

If the argument with index n is just a number whose associated symbolic variable is u, then we add the formula w ≥ v + u -1 to KB. So the size of the memory block described by alloc(v,w) is at least as large as the number u.

Otherwise, the argument with index n should be a pointer to some allocated memory area. So if u is the symbolic variable associated to the argument with index n, then in the initial state we should have alloc(u,u') in the allocation list AL. In this case, we add the formula w ≥ v + u' - u to KB. So the size of the memory block described by alloc(v,w) is at least as large as the length of the memory block described by alloc(u,u').

If none of the two cases applies, then the query is not valid.

String
The argument is a pointer of type i8* to a zero-terminated string. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL, w → z to the points-to list PT, and z = 0 to the knowledge base KB for fresh symbolic variables w and z.

String(c)
The argument is a pointer of type i8* to a zero-terminated string, where the allocated area after the string contains at least c - 1 more bytes. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL, w' → z to the points-to list PT, and the formulas z = 0, v ≤ w', w' ≤ w, and w ≥ w' + c to the knowledge base KB for fresh symbolic variables w, w', and z.

String(#n)
The argument is a pointer of type i8* to a zero-terminated string, where the allocated area after the string contains at least as many bytes as indicated by the argument with index n (the first argument has index 0). This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL, w' → z to the points-to list PT, and the formulas z = 0, v ≤ w', and w' ≤ w to the knowledge base KB of that initial state. Moreover, we add another arithmetic formula to KB which depends on the type of the argument with index n.

If the argument with index n is just a number whose associated symbolic variable is u, then we add the formula w ≥ w' + u to KB. So the memory block described by alloc(v,w) has at least u more bytes after the string.

Otherwise, the argument with index n should be a pointer to some allocated memory area. So if u is the symbolic variable associated to the argument with index n, then in the initial state we should have alloc(u,u') in the allocation list AL). In this case, we add the formula w ≥ w' + u' - u to KB. So the memory block described by alloc(v,w) consists of a string and a subsequent memory sub-block that is at least as large as the memory block described by alloc(u,u').

If none of the two cases applies, then the query is not valid.

<ty>[]
The argument is a pointer of type ty* to some allocated area. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL for a fresh symbolic variable w.

<ty>[c]
The argument is a pointer of type ty* to an allocated area of at least c entries of type ty. This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL and w ≥ v + c * size(ty) -1 to the knowledge base KB for a fresh symbolic variable w.

<ty>[#n]
The argument is a pointer of type ty* to an allocated area of at least as many bytes as indicated by the argument with index n (the first argument has index 0). This means that if v is the symbolic variable associated to the corresponding parameter, then in the initial state of the symbolic execution graph, we add alloc(v,w) to the allocation list AL for a fresh symbolic variable w. Moreover, we add an arithmetic formula to the knowledge base KB of that initial state. The formula we add depends on the type of the argument with index n.

If the argument with index n is just a number whose associated symbolic variable is u, then we add the formula w ≥ v + u* size(ty) -1 to KB. So the size of the memory block described by alloc(v,w) is large enough to contain u entries of type ty.

Otherwise, the argument with index n should be a pointer to some allocated memory area. So if u is the symbolic variable associated to the argument with index n, then in the initial state we should have alloc(u,u') in the allocation list AL). In this case, we add the formula w ≥ v + (u' - u + 1) * size(ty) - 1 to KB. So the size of the memory block described by alloc(v,w) is large enough to contain as many entries as the length of the memory block described by alloc(u,u').

If none of the two cases applies, then the query is not valid.