When using rewrite techniques
for termination analysis of programs,
a main problem are pre-defined data types like integers.
In the paper "Proving Termination of Integer Term Rewriting" we extend term rewriting by built-in integers and adapt the dependency pair framework to prove termination of integer term
rewrite systems (ITRSs) automatically.
In our experiments, we want to assess the power of our new termination method
by running it on large collections of examples from the literature and from applications.
Moreover, we compare our method
with the naive approach
which simply translates integers into a term representation using the function symbols 0, s, pos, and neg.
Implementation in AProVE
We integrated our approach into a new version of the
termination tool AProVE. In particular, this
new version of AProVE allows to repeat the experiments
below. It can be accessed via the following web
interface.
Tools
In our experiments we use two versions of AProVE (with our
new method and with a simple translation).
- AProVE 08
This version of our AProVE tool incorporates
all the techniques used by AProVE in the International Competition of Termination Provers 2008. With these techniques,
AProVE was by far the most powerful termination analyzer for term rewriting in this competition. (It succeeded to prove
termination of 995 examples,
whereas the second best tool TTT2 only succeeded on 792 examples.) To apply this version of AProVE to ITRSs, integers are automatically translated into terms using the function symbols
pos and neg for the sign
and 0 and s for zero and the successor function.
- AProVE Integer:
This version contains all the techniques of AProVE 08 and in addition,
it implements all contributions of the paper "Proving Termination of Integer Term Rewriting".
Moreover, we also compare with the tool TTT2, which showed
termination of 792 examples at the competition. Here,
we use the same translation as for AProVE 08.
Examples and Settings
In our experiments, we tested the tools on 117 ITRSs collected from the literature and from applications.
Of these 117 examples, at least 2 examples (CADE07/Ex04.itrs, new/countUpNo.itrs)
are non-terminating and one example (classic/collatz.itrs) corresponds to an
open termination problem. Thus, a correct analyzer should be able to prove termination of at most
115 examples and to be realistic, we can
expect it to solve at most 114. It turns out that the new version of AProVE solves 104 examples, where we use a timeout of 60 seconds for each example. This is the same timeout that is usually used in the International Competitions of Termination Provers.
For our experiments, the tools were run on a machine with an Intel Core 2 Quad CPU Q9450 @ 2.66GHz.
Our collection of 117 ITRSs is organized in 17 directories that correspond to the respective sources of the examples.
The following list give a short description of these sources.
- Beerendonk:
This collection is due to Chantal Beerendonk who collected 24 term rewrite systems corresponding
to typical loops in imperative programs. These 24 TRSs are part of the Termination Problem Data Base (TPDB) that is used in the International Competition of Termination Provers. We translated the TRSs in order to obtain ITRSs operating on integers.
Note that the loops corresponding to the TRSs
12.trs, 13.trs and 14.trs perform the same computations when adapted to integers. Therefore,
our translation yielded only 22 ITRSs.
All of these 22 systems are terminating.
- CADE07:
These are the 17 examples from the experimental evaluation of
Giesl, Thiemann, Swiderski, and Schneider-Kamp's paper Proving Termination by Bounded Increase as well as the 2 further examples mentioned in that paper. Originally, these TRSs operated on natural numbers. We translated the TRSs in order to obtain ITRSs operating on integers. Of these 19 ITRSs, one example (CADE07/Ex04.itrs) is known to be non-terminating.
- CAV02:
These 3 examples are from Colon and Sipma's paper Practical Methods for Proving Program Termination. Similar to that paper,
we ignored the array access in these
examples.
Here, practical1.itrs is Bubblesort, practical2.itrs is derived from McCarthy's 91 function but uses loops, and
practical3.itrs is Mergesort.
- CAV05:
This terminating example is from Bradley, Manna, and Sipma's paper Linear Ranking with Reachability.
- classic:
These are 3 classic examples:
- collatz.itrs is an open termination problem (also known as the "3*x+1" or Syracuse problem).
- f91.itrs is McCarthy's famous 91 function (whose result is 91 if the input n is at most 101 and n - 10 otherwise).
- maxsort.itrs is an implementation of the maximum sort algorithm. To show its termination,
one has to prove that each non-empty list contains its maximum.
- csharp:
These 3 terminating
ITRSs are the results of 3 different translations of an imperative procedure found in a C# program.
- ESOP08:
This terminating ITRS is from an imperative example in Chawdhary, Cook, Gulwani, Sagiv, and Yang's paper Ranking Abstractions.
- LICS04:
These 2 terminating examples are from imperative programs in Podelski and Rybalchenko's paper Transition Invariants. We omitted the other 2 examples from that paper,
since they contain parallel processes and only terminate under the
assumption of fairness.
- new:
This collection of 13 ITRSs was developed by the authors of this paper to evaluate their implementation.
It contains arithmetic functions as well as some test cases. Of these 13 ITRSs, 12 are terminating and 1 is non-terminating.
- paper:
These 7 terminating ITRSs are from our new paper.
- patrs:
These 29 ITRSs result from collections of Falke and Kapur
accompanying their paper Dependency Pairs for Rewriting with Built-in Numbers and Semantic Data Structures at RTA08. They were developed to evaluate their method for termination analysis
with Presburger arithmetic. We translated their original examples from natural to integer numbers.
In these examples, (multi)sets
were replaced by lists and those examples where the use of (multi)sets was
essential were omitted.
- PLDI06:
These 2 terminating ITRSs are from imperative examples in Cook, Podelski and Rybalchenko's paper Termination Proofs for Systems Code. We omitted the other example from that paper, since it contains pointers.
- RTA08:
These 2 terminating examples are from
Fuhs, Giesl, Middeldorp, Schneider-Kamp, Thiemann, and Zankl's
paper Maximal Termination. The original TRSs which operated on natural numbers were adapted to integers.
- SAS05:
This terminating example is from an imperative procedure in Cook, Podelski, and Rybalchenko's paper Abstraction Refinement for Termination.
- TACAS01:
This terminating ITRS is from an imperative example in Colon and Sipma's paper Synthesis of Linear Ranking Functions.
- VMCAI04:
These 4 terminating examples are from imperative programs in Podelski and Rybalchenko's paper A Complete Method for the Synthesis of Linear Ranking Functions.
- VMCAI05:
These 4 examples are from imperative examples in Bradley, Manna, and Sipma's paper Termination of Polynomial Programs.
Experiments
The table below summarizes our experiments and clearly shows the need for our contribution
when analyzing termination of integer term rewrite systems.
With our new contributions, AProVE Integer can prove termination
of 104 examples (i.e., of 88.9 %). In particular, it succeeds on all ITRSs mentioned
in the paper. In contrast, AProVE 08, which won the last
International Competition of Termination Provers 2008 for term rewriting,
performed very poorly on the examples and could only prove termination of 24
of them (20.5 %).
Similarly, also the tool TTT2, which scored second at the Competition
of 2008, could only show termination of 6 examples (i.e., of 5.1 %).
This clearly shows the enormous benefits of built-in integers in
term rewriting.
In the following tables, each row shows the behavior of the two implementations on one example. The entry "YES" means that (innermost) termination of that example could
be proved by the corresponding tool while "MAYBE" states that the tool gave up
without success. Finally, "KILLED" indicates that the tool exceeded the given time limit and was therefore stopped. By clicking on the respective runtime, one can inspect the output produced by the tool. To load an example into our web interface,
just click on the corresponding button in the first column. Then you can run the two versions of AProVE on the respective example yourself.
In the above table we can see that without our new contributions, even with the technique of Proving Termination By Bounded Increase, AProVE 08 only manages to show termination for 6 out of the 22 examples. The reason is that these examples operate on integers.
For the original formulation of these examples that operated on natural numbers,
AProVE 08 was able to show termination for all 24 examples. On the other hand, AProVE Integer is able
to handle all 22 examples on integers easily (and faster where applicable). This demonstrates that moving from natural numbers to integers makes the problems much harder for previous termination techniques in term rewriting.
The advantages of our contribution are even more pronounced in the table for the CADE07 collection. AProVE08 cannot
show termination of any example within the time limit while AProVE Integer shows termination for 16 out of 19.
Note that CADE07/Ex04.itrs is non-terminating.
For the examples in the CAV02 collection, AProVE 08 fails to show termination even of Bubblesort (practical1.itrs).
AProVE Integer can show termination of Bubblesort, but not of the loop variant of McCarthy's 91 function (practical2.itrs -- this would probably require interval considerations) and not of Mergesort (practical3.itrs -- this example contains
many function symbols of high arity and, thus, presents a challenge to the efficiency of the search for max-polynomials).
The CAV05 example is challenging to AProVE 08, but easily handled by AProVE Integer. This once more demonstrates the
need for our contributions.
As explained above, we did not expect to be able to solve any of these 3 classic
termination problems.
Once more, AProVE Integer can easily handle all 3 examples from C# while AProVE 08 fails to prove
termination.
On the ESOP08 example, AProVE 08 has to give up because of the extra variable on the
right-hand side while AProVE Integer can solve it easily.
For LICS04, AProVE 08 can handle one example while AProVE Integer handles both.
AProVE08 can handle 7 of the 13 new examples while AProVE Integer can handle all but the non-terminating
new/countUpNo.itrs.
Of the 7 examples from the paper, AProVE 08 can only prove termination of the one example where
termination does not depend on integer arguments but on a list argument. AProVE Integer can
easily show termination of all examples including the one which uses the Ackermann function
for a gigantic bound.
For the patrs collection, AProVE08 can show termination for 9 out 29 examples while AProVE Integer
shows termination for 28 of the 29 examples.
All the examples from PLDI06, RTA08, SAS05, TACAS01, and VMCAI04 can be shown terminating by AProVE Integer
while AProVE 08 fails.
AProVE08 cannot show termination of any of the 4 examples from VMCAI05 while
AProVE Integer at least is able to show termination of VMCAI05/poly4.itrs.