Empirical Evaluation of "Proving Termination of Integer Term Rewriting"

When using rewrite techniques for termination analysis of programs, a main problem are pre-defined data types like integers. In the paper "Proving Termination of Integer Term Rewriting" we extend term rewriting by built-in integers and adapt the dependency pair framework to prove termination of integer term rewrite systems (ITRSs) automatically.

In our experiments, we want to assess the power of our new termination method by running it on large collections of examples from the literature and from applications. Moreover, we compare our method with the naive approach which simply translates integers into a term representation using the function symbols 0, s, pos, and neg.


Implementation in AProVE

We integrated our approach into a new version of the termination tool AProVE. In particular, this new version of AProVE allows to repeat the experiments below. It can be accessed via the following web interface.


Tools

In our experiments we use two versions of AProVE (with our new method and with a simple translation). Moreover, we also compare with the tool TTT2, which showed termination of 792 examples at the competition. Here, we use the same translation as for AProVE 08.

Examples and Settings

In our experiments, we tested the tools on 117 ITRSs collected from the literature and from applications. Of these 117 examples, at least 2 examples (CADE07/Ex04.itrs, new/countUpNo.itrs) are non-terminating and one example (classic/collatz.itrs) corresponds to an open termination problem. Thus, a correct analyzer should be able to prove termination of at most 115 examples and to be realistic, we can expect it to solve at most 114. It turns out that the new version of AProVE solves 104 examples, where we use a timeout of 60 seconds for each example. This is the same timeout that is usually used in the International Competitions of Termination Provers. For our experiments, the tools were run on a machine with an Intel Core 2 Quad CPU Q9450 @ 2.66GHz.

Our collection of 117 ITRSs is organized in 17 directories that correspond to the respective sources of the examples. The following list give a short description of these sources.




Experiments

The table below summarizes our experiments and clearly shows the need for our contribution when analyzing termination of integer term rewrite systems. With our new contributions, AProVE Integer can prove termination of 104 examples (i.e., of 88.9 %). In particular, it succeeds on all ITRSs mentioned in the paper. In contrast, AProVE 08, which won the last International Competition of Termination Provers 2008 for term rewriting, performed very poorly on the examples and could only prove termination of 24 of them (20.5 %). Similarly, also the tool TTT2, which scored second at the Competition of 2008, could only show termination of 6 examples (i.e., of 5.1 %). This clearly shows the enormous benefits of built-in integers in term rewriting.

ToolYESMAYBETIMEOUT
AProVE Integer1040 13 Full Table
AProVE 0824687 Full Table
TTT26 110 1 Full Table

In the following tables, each row shows the behavior of the two implementations on one example. The entry "YES" means that (innermost) termination of that example could be proved by the corresponding tool while "MAYBE" states that the tool gave up without success. Finally, "KILLED" indicates that the tool exceeded the given time limit and was therefore stopped. By clicking on the respective runtime, one can inspect the output produced by the tool. To load an example into our web interface, just click on the corresponding button in the first column. Then you can run the two versions of AProVE on the respective example yourself.

BeerendonkAProVE IntegerAProVE 08TTT2
YES1.36KILLED60MAYBE5.19
YES1.38YES13.12MAYBE5.13
YES1.48KILLED60MAYBE5.06
YES1.37KILLED60MAYBE5.03
YES1.42YES10.43MAYBE5.17
YES1.36YES2.54YES3.68
YES1.39YES2.27YES3.78
YES1.4YES8.18MAYBE5.08
YES1.53KILLED60MAYBE5.2
YES2.58KILLED60MAYBE5.7
YES2.73KILLED60MAYBE5.27
YES2.44KILLED60MAYBE5.26
YES5.78KILLED60MAYBE5.14
YES5.03KILLED60MAYBE5.2
YES8.18KILLED60MAYBE5.3
YES11.18KILLED60MAYBE5.19
YES13.04KILLED60MAYBE5.22
YES2.63KILLED60MAYBE5.33
YES3.5KILLED60MAYBE5.27
YES2.31YES10.0MAYBE5.18
YES3.1KILLED60MAYBE5.18
YES5.75KILLED60MAYBE5.27

In the above table we can see that without our new contributions, even with the technique of Proving Termination By Bounded Increase, AProVE 08 only manages to show termination for 6 out of the 22 examples. The reason is that these examples operate on integers. For the original formulation of these examples that operated on natural numbers, AProVE 08 was able to show termination for all 24 examples. On the other hand, AProVE Integer is able to handle all 22 examples on integers easily (and faster where applicable). This demonstrates that moving from natural numbers to integers makes the problems much harder for previous termination techniques in term rewriting.

CADE07AProVE IntegerAProVE 08TTT2
YES1.34KILLED60MAYBE5.04
YES1.39KILLED60MAYBE5.08
YES1.41KILLED60MAYBE5.04
YES1.62KILLED60MAYBE5.03
YES1.58KILLED60MAYBE5.08
YES5.86KILLED60MAYBE5.05
YES2.2KILLED60MAYBE5.03
YES4.4KILLED60MAYBE5.03
YES5.83KILLED60MAYBE5.22
YES5.76KILLED60MAYBE5.23
YES1.51KILLED60MAYBE5.12
YES1.42KILLED60MAYBE5.08
YES1.74KILLED60MAYBE5.17
YES10.26KILLED60MAYBE5.1
KILLED60KILLED60MAYBE5.2
KILLED60KILLED60MAYBE5.22
YES1.95KILLED60MAYBE5.14
KILLED60KILLED60MAYBE5.1
YES6.51KILLED60MAYBE5.17

The advantages of our contribution are even more pronounced in the table for the CADE07 collection. AProVE08 cannot show termination of any example within the time limit while AProVE Integer shows termination for 16 out of 19. Note that CADE07/Ex04.itrs is non-terminating.

CAV02AProVE IntegerAProVE 08TTT2
YES2.22KILLED60MAYBE5.18
KILLED60KILLED60MAYBE5.24
KILLED60KILLED60MAYBE5.87

For the examples in the CAV02 collection, AProVE 08 fails to show termination even of Bubblesort (practical1.itrs). AProVE Integer can show termination of Bubblesort, but not of the loop variant of McCarthy's 91 function (practical2.itrs -- this would probably require interval considerations) and not of Mergesort (practical3.itrs -- this example contains many function symbols of high arity and, thus, presents a challenge to the efficiency of the search for max-polynomials).

CAV05AProVE IntegerAProVE 08TTT2
YES4.96KILLED60MAYBE5.11

The CAV05 example is challenging to AProVE 08, but easily handled by AProVE Integer. This once more demonstrates the need for our contributions.

classicAProVE IntegerAProVE 08TTT2
KILLED60KILLED60MAYBE5.24
KILLED60KILLED60MAYBE5.16
KILLED60KILLED60MAYBE5.1

As explained above, we did not expect to be able to solve any of these 3 classic termination problems.

csharpAProVE IntegerAProVE 08TTT2
YES1.7KILLED60MAYBE5.14
YES1.63KILLED60MAYBE5.12
YES1.58KILLED60MAYBE5.19

Once more, AProVE Integer can easily handle all 3 examples from C# while AProVE 08 fails to prove termination.

ESOP08AProVE IntegerAProVE 08TTT2
YES2.52MAYBE0.89MAYBE0

On the ESOP08 example, AProVE 08 has to give up because of the extra variable on the right-hand side while AProVE Integer can solve it easily.

LICS04AProVE IntegerAProVE 08TTT2
YES2.28KILLED60MAYBE5.19
YES39.34YES15.37MAYBE5.1

For LICS04, AProVE 08 can handle one example while AProVE Integer handles both.

newAProVE IntegerAProVE 08TTT2
YES1.6YES8.61MAYBE5.22
YES1.4YES2.24MAYBE5.09
YES1.57KILLED60MAYBE5.16
YES1.36MAYBE0.95KILLED60
KILLED60KILLED60MAYBE5.27
YES1.38YES9.23MAYBE5.11
YES1.5KILLED60MAYBE5.06
YES2.11YES4.24YES5.21
YES1.39YES7.45MAYBE5.19
YES31.66YES8.62MAYBE5.26
YES2.14KILLED60MAYBE5.13
YES1.27KILLED60MAYBE5.02
YES1.3YES1.71MAYBE5.14

AProVE08 can handle 7 of the 13 new examples while AProVE Integer can handle all but the non-terminating new/countUpNo.itrs.

paperAProVE IntegerAProVE 08TTT2
YES11.38KILLED60MAYBE5.17
YES1.93MAYBE0.89MAYBE0
YES1.46MAYBE0.88MAYBE0
YES1.27YES1.32YES0.22
YES1.71KILLED60MAYBE5.17
YES1.5KILLED60MAYBE5.1
YES1.39KILLED60MAYBE5.86

Of the 7 examples from the paper, AProVE 08 can only prove termination of the one example where termination does not depend on integer arguments but on a list argument. AProVE Integer can easily show termination of all examples including the one which uses the Ackermann function for a gigantic bound.

patrsAProVE IntegerAProVE 08TTT2
YES1.46YES9.64MAYBE5.11
YES10.49KILLED60MAYBE5.12
YES3.61KILLED60MAYBE5.17
YES2.45YES11.88MAYBE5.26
YES1.41KILLED60MAYBE5.19
YES2.78KILLED60MAYBE5.18
YES3.02KILLED60MAYBE5.25
YES1.41KILLED60MAYBE5.16
YES1.98YES2.41YES3.67
YES2.92YES7.49MAYBE5.14
YES25.81YES2.7YES4.9
YES29.92YES8.36MAYBE5.08
YES30.14YES8.41MAYBE5.16
YES1.42YES8.51MAYBE5.11
YES1.39YES8.3MAYBE5.08
YES1.57KILLED60MAYBE5.2
YES1.34KILLED60MAYBE5.13
YES1.41KILLED60MAYBE5.11
YES2.5KILLED60MAYBE5.11
KILLED60KILLED60MAYBE5.16
YES31.48KILLED60MAYBE5.17
YES1.41KILLED60MAYBE5.08
YES1.41KILLED60MAYBE5.16
YES1.57KILLED60MAYBE5.14
YES1.57KILLED60MAYBE5.26
YES1.41KILLED60MAYBE5.1
YES1.48KILLED60MAYBE5.09
YES27.96KILLED60MAYBE5.08
YES3.54KILLED60MAYBE5.28

For the patrs collection, AProVE08 can show termination for 9 out 29 examples while AProVE Integer shows termination for 28 of the 29 examples.

PLDI06AProVE IntegerAProVE 08TTT2
YES2.74KILLED60MAYBE5.3
YES3.15KILLED60MAYBE5.15

RTA08AProVE IntegerAProVE 08TTT2
YES2.21KILLED60MAYBE5.17
YES1.42KILLED60MAYBE5.14

SAS05AProVE IntegerAProVE 08TTT2
YES2.52KILLED60MAYBE5.18

TACAS01AProVE IntegerAProVE 08TTT2
YES1.72KILLED60MAYBE5.45

VMCAI04AProVE IntegerAProVE 08TTT2
YES1.58MAYBE0.89MAYBE0
YES11.17KILLED60MAYBE5.2
YES2.07KILLED60MAYBE5.14
YES1.66MAYBE0.86MAYBE0

All the examples from PLDI06, RTA08, SAS05, TACAS01, and VMCAI04 can be shown terminating by AProVE Integer while AProVE 08 fails.

VMCAI05AProVE IntegerAProVE 08TTT2
KILLED60KILLED60MAYBE5.13
KILLED60KILLED60MAYBE5.25
KILLED60KILLED60MAYBE5.19
YES16.11KILLED60MAYBE5.31

AProVE08 cannot show termination of any of the 4 examples from VMCAI05 while AProVE Integer at least is able to show termination of VMCAI05/poly4.itrs.