Empirical Evaluation of "Proving Termination of Integer Term Rewriting"

When using rewrite techniques for termination analysis of programs, a main problem are pre-defined data types like integers. In the paper "Proving Termination of Integer Term Rewriting" we extend term rewriting by built-in integers and adapt the dependency pair framework to prove termination of integer term rewrite systems (ITRSs) automatically.

In our experiments, we want to assess the power of our new termination method by running it on large collections of examples from the literature and from applications. Moreover, we compare our method with the naive approach which simply translates integers into a term representation using the function symbols 0, s, pos, and neg.

Implementation in AProVE

We integrated our approach into a new version of the termination tool AProVE. In particular, this new version of AProVE allows to repeat the experiments below. It can be accessed via the following web interface.

Tools

In our experiments we use two versions of AProVE (with our new method and with a simple translation).

AProVE 08
This version of our AProVE tool incorporates all the techniques used by AProVE in the International Competition of Termination Provers 2008. With these techniques, AProVE was by far the most powerful termination analyzer for term rewriting in this competition. (It succeeded to prove termination of 995 examples, whereas the second best tool TTT2 only succeeded on 792 examples.) To apply this version of AProVE to ITRSs, integers are automatically translated into terms using the function symbols pos and neg for the sign and 0 and s for zero and the successor function.
AProVE Integer:
This version contains all the techniques of AProVE 08 and in addition, it implements all contributions of the paper "Proving Termination of Integer Term Rewriting".

Moreover, we also compare with the tool TTT2, which showed termination of 792 examples at the competition. Here, we use the same translation as for AProVE 08.

Examples and Settings

In our experiments, we tested the tools on 117 ITRSs collected from the literature and from applications. Of these 117 examples, at least 2 examples (CADE07/Ex04.itrs, new/countUpNo.itrs) are non-terminating and one example (classic/collatz.itrs) corresponds to an open termination problem. Thus, a correct analyzer should be able to prove termination of at most 115 examples and to be realistic, we can expect it to solve at most 114. It turns out that the new version of AProVE solves 104 examples, where we use a timeout of 60 seconds for each example. This is the same timeout that is usually used in the International Competitions of Termination Provers. For our experiments, the tools were run on a machine with an Intel Core 2 Quad CPU Q9450 @ 2.66GHz.

Our collection of 117 ITRSs is organized in 17 directories that correspond to the respective sources of the examples. The following list give a short description of these sources.

Beerendonk:
This collection is due to Chantal Beerendonk who collected 24 term rewrite systems corresponding to typical loops in imperative programs. These 24 TRSs are part of the Termination Problem Data Base (TPDB) that is used in the International Competition of Termination Provers. We translated the TRSs in order to obtain ITRSs operating on integers. Note that the loops corresponding to the TRSs 12.trs, 13.trs and 14.trs perform the same computations when adapted to integers. Therefore, our translation yielded only 22 ITRSs. All of these 22 systems are terminating.
CADE07:
These are the 17 examples from the experimental evaluation of Giesl, Thiemann, Swiderski, and Schneider-Kamp's paper Proving Termination by Bounded Increase as well as the 2 further examples mentioned in that paper. Originally, these TRSs operated on natural numbers. We translated the TRSs in order to obtain ITRSs operating on integers. Of these 19 ITRSs, one example (CADE07/Ex04.itrs) is known to be non-terminating.
CAV02:
These 3 examples are from Colon and Sipma's paper Practical Methods for Proving Program Termination. Similar to that paper, we ignored the array access in these examples. Here, practical1.itrs is Bubblesort, practical2.itrs is derived from McCarthy's 91 function but uses loops, and practical3.itrs is Mergesort.
CAV05:
This terminating example is from Bradley, Manna, and Sipma's paper Linear Ranking with Reachability.
classic:
These are 3 classic examples:
- collatz.itrs is an open termination problem (also known as the "3*x+1" or Syracuse problem).
- f91.itrs is McCarthy's famous 91 function (whose result is 91 if the input n is at most 101 and n - 10 otherwise).
- maxsort.itrs is an implementation of the maximum sort algorithm. To show its termination, one has to prove that each non-empty list contains its maximum.
csharp:
These 3 terminating ITRSs are the results of 3 different translations of an imperative procedure found in a C# program.
ESOP08:
This terminating ITRS is from an imperative example in Chawdhary, Cook, Gulwani, Sagiv, and Yang's paper Ranking Abstractions.
LICS04:
These 2 terminating examples are from imperative programs in Podelski and Rybalchenko's paper Transition Invariants. We omitted the other 2 examples from that paper, since they contain parallel processes and only terminate under the assumption of fairness.
new:
This collection of 13 ITRSs was developed by the authors of this paper to evaluate their implementation. It contains arithmetic functions as well as some test cases. Of these 13 ITRSs, 12 are terminating and 1 is non-terminating.
paper:
These 7 terminating ITRSs are from our new paper.
patrs:
These 29 ITRSs result from collections of Falke and Kapur accompanying their paper Dependency Pairs for Rewriting with Built-in Numbers and Semantic Data Structures at RTA08. They were developed to evaluate their method for termination analysis with Presburger arithmetic. We translated their original examples from natural to integer numbers. In these examples, (multi)sets were replaced by lists and those examples where the use of (multi)sets was essential were omitted.
PLDI06:
These 2 terminating ITRSs are from imperative examples in Cook, Podelski and Rybalchenko's paper Termination Proofs for Systems Code. We omitted the other example from that paper, since it contains pointers.
RTA08:
These 2 terminating examples are from Fuhs, Giesl, Middeldorp, Schneider-Kamp, Thiemann, and Zankl's paper Maximal Termination. The original TRSs which operated on natural numbers were adapted to integers.
SAS05:
This terminating example is from an imperative procedure in Cook, Podelski, and Rybalchenko's paper Abstraction Refinement for Termination.
TACAS01:
This terminating ITRS is from an imperative example in Colon and Sipma's paper Synthesis of Linear Ranking Functions.
VMCAI04:
These 4 terminating examples are from imperative programs in Podelski and Rybalchenko's paper A Complete Method for the Synthesis of Linear Ranking Functions.
VMCAI05:
These 4 examples are from imperative examples in Bradley, Manna, and Sipma's paper Termination of Polynomial Programs.

Experiments

The table below summarizes our experiments and clearly shows the need for our contribution when analyzing termination of integer term rewrite systems. With our new contributions, AProVE Integer can prove termination of 104 examples (i.e., of 88.9 %). In particular, it succeeds on all ITRSs mentioned in the paper. In contrast, AProVE 08, which won the last International Competition of Termination Provers 2008 for term rewriting, performed very poorly on the examples and could only prove termination of 24 of them (20.5 %). Similarly, also the tool TTT2, which scored second at the Competition of 2008, could only show termination of 6 examples (i.e., of 5.1 %). This clearly shows the enormous benefits of built-in integers in term rewriting.

Tool YES MAYBE TIMEOUT

AProVE Integer 104 0 13 Full Table

AProVE 08 24 6 87 Full Table

TTT2 6 110 1 Full Table

Tool	YES	MAYBE	TIMEOUT
AProVE Integer	104	0	13	Full Table
AProVE 08	24	6	87	Full Table
TTT2	6	110	1	Full Table

In the following tables, each row shows the behavior of the two implementations on one example. The entry "YES" means that (innermost) termination of that example could be proved by the corresponding tool while "MAYBE" states that the tool gave up without success. Finally, "KILLED" indicates that the tool exceeded the given time limit and was therefore stopped. By clicking on the respective runtime, one can inspect the output produced by the tool. To load an example into our web interface, just click on the corresponding button in the first column. Then you can run the two versions of AProVE on the respective example yourself.

AProVE Integer		AProVE 08		TTT2
YES	1.36	KILLED	60	MAYBE	5.19
YES	1.38	YES	13.12	MAYBE	5.13
YES	1.48	KILLED	60	MAYBE	5.06
YES	1.37	KILLED	60	MAYBE	5.03
YES	1.42	YES	10.43	MAYBE	5.17
YES	1.36	YES	2.54	YES	3.68
YES	1.39	YES	2.27	YES	3.78
YES	1.4	YES	8.18	MAYBE	5.08
YES	1.53	KILLED	60	MAYBE	5.2
YES	2.58	KILLED	60	MAYBE	5.7
YES	2.73	KILLED	60	MAYBE	5.27
YES	2.44	KILLED	60	MAYBE	5.26
YES	5.78	KILLED	60	MAYBE	5.14
YES	5.03	KILLED	60	MAYBE	5.2
YES	8.18	KILLED	60	MAYBE	5.3
YES	11.18	KILLED	60	MAYBE	5.19
YES	13.04	KILLED	60	MAYBE	5.22
YES	2.63	KILLED	60	MAYBE	5.33
YES	3.5	KILLED	60	MAYBE	5.27
YES	2.31	YES	10.0	MAYBE	5.18
YES	3.1	KILLED	60	MAYBE	5.18
YES	5.75	KILLED	60	MAYBE	5.27

In the above table we can see that without our new contributions, even with the technique of Proving Termination By Bounded Increase, AProVE 08 only manages to show termination for 6 out of the 22 examples. The reason is that these examples operate on integers. For the original formulation of these examples that operated on natural numbers, AProVE 08 was able to show termination for all 24 examples. On the other hand, AProVE Integer is able to handle all 22 examples on integers easily (and faster where applicable). This demonstrates that moving from natural numbers to integers makes the problems much harder for previous termination techniques in term rewriting.

AProVE Integer		AProVE 08		TTT2
YES	1.34	KILLED	60	MAYBE	5.04
YES	1.39	KILLED	60	MAYBE	5.08
YES	1.41	KILLED	60	MAYBE	5.04
YES	1.62	KILLED	60	MAYBE	5.03
YES	1.58	KILLED	60	MAYBE	5.08
YES	5.86	KILLED	60	MAYBE	5.05
YES	2.2	KILLED	60	MAYBE	5.03
YES	4.4	KILLED	60	MAYBE	5.03
YES	5.83	KILLED	60	MAYBE	5.22
YES	5.76	KILLED	60	MAYBE	5.23
YES	1.51	KILLED	60	MAYBE	5.12
YES	1.42	KILLED	60	MAYBE	5.08
YES	1.74	KILLED	60	MAYBE	5.17
YES	10.26	KILLED	60	MAYBE	5.1
KILLED	60	KILLED	60	MAYBE	5.2
KILLED	60	KILLED	60	MAYBE	5.22
YES	1.95	KILLED	60	MAYBE	5.14
KILLED	60	KILLED	60	MAYBE	5.1
YES	6.51	KILLED	60	MAYBE	5.17

The advantages of our contribution are even more pronounced in the table for the CADE07 collection. AProVE08 cannot show termination of any example within the time limit while AProVE Integer shows termination for 16 out of 19. Note that CADE07/Ex04.itrs is non-terminating.

AProVE Integer	AProVE 08	TTT2
YES	2.22	KILLED	60	MAYBE	5.18
KILLED	60	KILLED	60	MAYBE	5.24
KILLED	60	KILLED	60	MAYBE	5.87

For the examples in the CAV02 collection, AProVE 08 fails to show termination even of Bubblesort (practical1.itrs). AProVE Integer can show termination of Bubblesort, but not of the loop variant of McCarthy's 91 function (practical2.itrs -- this would probably require interval considerations) and not of Mergesort (practical3.itrs -- this example contains many function symbols of high arity and, thus, presents a challenge to the efficiency of the search for max-polynomials).

CAV05	AProVE Integer	AProVE 08	TTT2
	YES	4.96	KILLED	60	MAYBE	5.11

The CAV05 example is challenging to AProVE 08, but easily handled by AProVE Integer. This once more demonstrates the need for our contributions.

AProVE Integer	AProVE 08	TTT2
KILLED	60	KILLED	60	MAYBE	5.24
KILLED	60	KILLED	60	MAYBE	5.16
KILLED	60	KILLED	60	MAYBE	5.1

As explained above, we did not expect to be able to solve any of these 3 classic termination problems.

AProVE Integer	AProVE 08	TTT2
YES	1.7	KILLED	60	MAYBE	5.14
YES	1.63	KILLED	60	MAYBE	5.12
YES	1.58	KILLED	60	MAYBE	5.19

Once more, AProVE Integer can easily handle all 3 examples from C# while AProVE 08 fails to prove termination.

ESOP08	AProVE Integer	AProVE 08	TTT2
	YES	2.52	MAYBE	0.89	MAYBE	0

On the ESOP08 example, AProVE 08 has to give up because of the extra variable on the right-hand side while AProVE Integer can solve it easily.

LICS04	AProVE Integer	AProVE 08	TTT2
	YES	2.28	KILLED	60	MAYBE	5.19
	YES	39.34	YES	15.37	MAYBE	5.1

For LICS04, AProVE 08 can handle one example while AProVE Integer handles both.

AProVE Integer		AProVE 08		TTT2
YES	1.6	YES	8.61	MAYBE	5.22
YES	1.4	YES	2.24	MAYBE	5.09
YES	1.57	KILLED	60	MAYBE	5.16
YES	1.36	MAYBE	0.95	KILLED	60
KILLED	60	KILLED	60	MAYBE	5.27
YES	1.38	YES	9.23	MAYBE	5.11
YES	1.5	KILLED	60	MAYBE	5.06
YES	2.11	YES	4.24	YES	5.21
YES	1.39	YES	7.45	MAYBE	5.19
YES	31.66	YES	8.62	MAYBE	5.26
YES	2.14	KILLED	60	MAYBE	5.13
YES	1.27	KILLED	60	MAYBE	5.02
YES	1.3	YES	1.71	MAYBE	5.14

AProVE08 can handle 7 of the 13 new examples while AProVE Integer can handle all but the non-terminating new/countUpNo.itrs.

AProVE Integer		AProVE 08		TTT2
YES	11.38	KILLED	60	MAYBE	5.17
YES	1.93	MAYBE	0.89	MAYBE	0
YES	1.46	MAYBE	0.88	MAYBE	0
YES	1.27	YES	1.32	YES	0.22
YES	1.71	KILLED	60	MAYBE	5.17
YES	1.5	KILLED	60	MAYBE	5.1
YES	1.39	KILLED	60	MAYBE	5.86

Of the 7 examples from the paper, AProVE 08 can only prove termination of the one example where termination does not depend on integer arguments but on a list argument. AProVE Integer can easily show termination of all examples including the one which uses the Ackermann function for a gigantic bound.

AProVE Integer		AProVE 08		TTT2
YES	1.46	YES	9.64	MAYBE	5.11
YES	10.49	KILLED	60	MAYBE	5.12
YES	3.61	KILLED	60	MAYBE	5.17
YES	2.45	YES	11.88	MAYBE	5.26
YES	1.41	KILLED	60	MAYBE	5.19
YES	2.78	KILLED	60	MAYBE	5.18
YES	3.02	KILLED	60	MAYBE	5.25
YES	1.41	KILLED	60	MAYBE	5.16
YES	1.98	YES	2.41	YES	3.67
YES	2.92	YES	7.49	MAYBE	5.14
YES	25.81	YES	2.7	YES	4.9
YES	29.92	YES	8.36	MAYBE	5.08
YES	30.14	YES	8.41	MAYBE	5.16
YES	1.42	YES	8.51	MAYBE	5.11
YES	1.39	YES	8.3	MAYBE	5.08
YES	1.57	KILLED	60	MAYBE	5.2
YES	1.34	KILLED	60	MAYBE	5.13
YES	1.41	KILLED	60	MAYBE	5.11
YES	2.5	KILLED	60	MAYBE	5.11
KILLED	60	KILLED	60	MAYBE	5.16
YES	31.48	KILLED	60	MAYBE	5.17
YES	1.41	KILLED	60	MAYBE	5.08
YES	1.41	KILLED	60	MAYBE	5.16
YES	1.57	KILLED	60	MAYBE	5.14
YES	1.57	KILLED	60	MAYBE	5.26
YES	1.41	KILLED	60	MAYBE	5.1
YES	1.48	KILLED	60	MAYBE	5.09
YES	27.96	KILLED	60	MAYBE	5.08
YES	3.54	KILLED	60	MAYBE	5.28

For the patrs collection, AProVE08 can show termination for 9 out 29 examples while AProVE Integer shows termination for 28 of the 29 examples.

PLDI06	AProVE Integer	AProVE 08	TTT2
	YES	2.74	KILLED	60	MAYBE	5.3
	YES	3.15	KILLED	60	MAYBE	5.15

RTA08	AProVE Integer	AProVE 08	TTT2
	YES	2.21	KILLED	60	MAYBE	5.17
	YES	1.42	KILLED	60	MAYBE	5.14

SAS05	AProVE Integer	AProVE 08	TTT2
	YES	2.52	KILLED	60	MAYBE	5.18

TACAS01	AProVE Integer	AProVE 08	TTT2
	YES	1.72	KILLED	60	MAYBE	5.45

AProVE Integer	AProVE 08	TTT2
YES	1.58	MAYBE	0.89	MAYBE	0
YES	11.17	KILLED	60	MAYBE	5.2
YES	2.07	KILLED	60	MAYBE	5.14
YES	1.66	MAYBE	0.86	MAYBE	0

All the examples from PLDI06, RTA08, SAS05, TACAS01, and VMCAI04 can be shown terminating by AProVE Integer while AProVE 08 fails.

AProVE Integer	AProVE 08	TTT2
KILLED	60	KILLED	60	MAYBE	5.13
KILLED	60	KILLED	60	MAYBE	5.25
KILLED	60	KILLED	60	MAYBE	5.19
YES	16.11	KILLED	60	MAYBE	5.31

AProVE08 cannot show termination of any of the 4 examples from VMCAI05 while AProVE Integer at least is able to show termination of VMCAI05/poly4.itrs.