Empirical Evaluation of "Proving Termination by Dependency Pairs and Inductive Theorem Proving"

Current techniques and tools for automated termination analysis of term rewrite systems (TRSs) are already very powerful. However, they fail for algorithms whose termination is essentially due to an inductive argument. Therefore, in the paper "Proving Termination by Dependency Pairs and Inductive Theorem Proving" we show how to couple the dependency pair method for termination of TRSs with inductive theorem proving. As confirmed by the implementation of our new approach in the tool AProVE, now TRS termination techniques are also successful on this important class of algorithms.

In our experiments, we want to assess the power of our new termination method by running it on a collection of typical TRSs where an inductive argument is needed for the termination proof.

Implementation in AProVE

We integrated our approach into a new version of the termination tool AProVE. More precisely, in our experiments we use two new versions of AProVE (AProVE Induction Internal and AProVE Induction ACL2) that both feature our new technique of combining dependency pairs with inductive theorem proving. The version AProVE Induction Internal uses the built-in theorem prover of AProVE while AProVE Induction ACL2 uses the well-known ACL2 prover. The version AProVE Induction Internal can be accessed via the following web interface. It allows to repeat the experiments below.

Note that the computer behind the web interface is slower than the machine used for our experiments and that it is used concurrently by multiple users.

Examples and Settings

In our experiments, we used a collection of 19 typical TRSs where an inductive argument ist needed to prove termination. This collection contains several TRSs computing classical arithmetical algorithms as well as many TRSs with standard algorithms for list manipulation like sorting, reversing, etc. The tools were run on a machine with a 2.67 GHz Intel Core i7 running Linux x86_64. We used ACL2 3.6 with SBCL 1.0.29.11 as the Lisp backend. The previous version of AProVE was already the most powerful tool for termination of term rewriting at the international Termination Competition. Nevertheless, this previous AProVE version as well as all other tools in the competition failed on all 19 examples. In contrast, with a time limit of 60 seconds per example, both AProVE Induction Internal and AProVE Induction ACL2 automatically prove termination of 16 of them. At the same time, the new versions of AProVE are as successful as the previous one on the remaining examples of the Termination Problem Data Base, which is the collection of examples used in the termination competition.

Experiments

The table below summarizes our experiments. It clearly shows the need for our contribution when analyzing termination of TRSs that terminate due to an inductive argument.

Tool YES TIMEOUT

AProVE Induction Internal 16 3

AProVE Induction ACL2 16 3

Tool	YES	TIMEOUT
AProVE Induction Internal	16	3
AProVE Induction ACL2	16	3

In the following tables, each row shows the behavior of AProVE Induction Internal and AProVE Induction ACL2 on one example. The entry "YES" means that (innermost) termination of that example could be proved while "KILLED" indicates that the tool exceeded the given time limit and was therefore stopped. By clicking on the respective runtime, one can inspect the output produced by the tool (including the proof of the theorem prover). To obtain the inputs given to ACL2, click on "acl" and for ACL2's output on "out". To load an example into our web interface, just click on the corresponding button in the first column. Then you can run AProVE Induction Internal on the respective example yourself.

Arithmetical algorithms

AProVE Induction Internal		AProVE Induction ACL2
YES	12.45	YES	13.31	acl out
KILLED	70	KILLED	70
KILLED	70	KILLED	70
YES	11.53	YES	12.82	acl out
YES	14.38	YES	15.75	acl out
YES	30.09	YES	32.41	acl out	acl out
KILLED	70	KILLED	70

The algorithm div is used to compute division. The main termination argument is that if ge(x,y) and gt(y,0) hold, then minus(x,y) is smaller than x. While this (inductive) argument can be discovered and proved by AProVE, our tool fails on divhard. Here, the additional difficulty is that the minus-rules are defined via a selector function p. This could be solved if narrowing were applicable on rules instead of just dependency pairs.

For a similar reason, AProVE also fails on the algorithm log where the termination argument would be that half(s(x)) is always smaller than s(x). Again, the failure is due to the fact that half is defined via the selector p and a narrowing on rules instead of dependency pairs would be needed here.

The algorithm mod from [Walther 94] computes the modulo operation and the algorithm gcd computes the greatest common divisor. Their main termination arguments are similar to that of div. However, for gcd, we have the additional difficulty that gcd exchanges its arguments if the second argument is greater than the first one. The algorithm gcd2 is a variant from [Walther 94] where instead, one always subtracts the smaller argument from the larger argument. The algorithm gcdhard uses the mod-function instead of the minus-function in the recursive arguments of gcd. This last TRS cannot be proved terminating by AProVE, because essentially, one would have to find a suitable reduction pair which can conclude that mod is less than or equal to both of its arguments, which is hardly possible with the orderings available in AProVE.

Algorithms on lists

AProVE Induction Internal		AProVE Induction ACL2
YES	6.74	YES	6.78	acl out
YES	6.74	YES	6.67	acl out
YES	7.09	YES	6.29	acl out
YES	5.03	YES	3.99	acl out
YES	7.67	YES	7.09	acl out
YES	6.67	YES	8.92	acl out
YES	13.92	YES	5.22	acl out	acl out
YES	14.37	YES	8.97	acl out	acl out
YES	59.69	YES	53.12	acl out	acl out	acl out	acl out
YES	6.31	YES	4.34	acl out
YES	7.68	YES	3.35	acl out
YES	2.38	YES	2.65	acl out

The Termination Problem Data Base (TPDB) that is used in the International Competition of Termination Tools already featured five examples where an inductive argument is needed for the termination proof. These examples are parting01_reverse, ..., parting05_maxsort. The algorithm maxsort is the leading example from our paper. Its main termination argument is that every non-empty list contains its maximum. It is almost the same as parting05_maxsort (it only differs in the ge-rules). The algorithm minsort is analogous to maxsort, but it works via the minimum function instead of the maximum function. So here, the main argument for termination is that every non-empty list contains its minimum. Moreover, in contrast to maxsort, minsort also removes duplicates from the list (i.e., instead of delete, one uses a function remove which removes all occurrences of an element from a list). Here, minsort is the formulation of [Walther 94] and [Giesl 95], whereas parting03_minsort is from the TPDB (and this variant does not remove duplicates).

We also investigated the behavior of AProVE on two variants of maxsort. In the TRS parting04_maxsort_h, the termination proof is more complicated, since it uses a recursive identity function h in the recursive argument. The algorithm maxsortcondition uses a more natural formulation of the maxsort algorithm with conditions (which however makes the termination proof a bit harder).

Our collection also contains several variants of the quicksort algorithm. In the TRS qsort, the first element of the list is used as pivot element. Hence, here the main termination argument is that every non-empty list contains its first argument. In contrast, qsortlast takes the last element of the list as pivot element. Its main termination argument is that every non-empty list contains its last argument. Finally, qsortmiddle takes the middle element of the list as pivot element and its termination argument is that every non-empty list contains its middle argument. The proof of all these quicksort variants is possible with AProVE.

The algorithm parting01_reverse reverses a list by repeatedly adding the last element of the list in front. It terminates because every non-empty list contains its last argument. The TRS parting02_doublelist from the TPDB doubles every number in a list and terminates, since every non-empty list contains its first argument. The algorithm zerolist terminates because of the inductive property that the subtraction of x and x always yields 0.