Research Group Computer Science II  Department of Computer Science RWTH

Experiments on Proving and Disproving Termination of Higher-Order Functions

LuFG Informatik II

In the following, we describe our experiments to evaluate the results of the paper "Proving and Disproving Termination of Higher-Order Functions". In this paper, there are two main contributions:



Overview on the Experiments

These two contributions are implemented in the most recent version of AProVE (AProVE 1.2) which can be downloaded here. Moreover, AProVE 1.2 can also be accessed via its web interface. To evaluate these contributions, we performed four experiments. Each of these experiments was performed both with AProVE 1.2 and with AProVE 1.1d-gamma.

AProVE 1.1d-gamma participated in the Annual International Competition of Termination Tools 2004 at the International Workshop on Termination (WST '04) and it was the most powerful termination tool for TRSs in this competition. In this version of AProVE, almost all of our new results on non-termination proving were already integrated, but our contributions on handling higher-order functions were missing.

The newest version of our system, AProVE 1.2, participated in the Annual International Competition of Termination Tools 2005 and again proved to be the most powerful tool for both termination and non-termination proofs of TRSs. The collection of examples used for the competition was the Termination Problem Data Base (TPDB). It represents the current standard benchmark for termination problems in the field of term rewriting.

We performed the following four experiments:

For each of the experiments, we provide tables summarizing the results and tables with the detailed runtimes for each example. In these detailed ("full version") tables, one can:

Computers used

All empirical data was obtained by running the command line interface of AProVE with a timeout of 60 seconds for each file. This corresponds to the way that termination tools were evaluated in the competitions for termination tools in 2004 and 2005. The computer used was an AMD Athlon 64 at 2.2 GHz running Sun's J2SE 1.5.0 under GNU/Linux 2.6.10 which is similar in speed to the computer used in the 2005 competition. Indeed, the results of AProVE 1.2 in our experiments correspond to the ones obtained during the competition 2005. The web interface runs on a different computer which might result in different runtimes.

Note that the experiments were performed in a setting which is particularly unfortunate for a system like AProVE which is written in Java. The reason is that AProVE was re-started on each example. Therefore, the runtimes given in the tables include the overhead for starting the Java virtual machine as well as for compiling the required techniques by the just-in-time compiler again and again for each example. Nevertheless, we chose this method for evaluating our contributions, since it corresponds to the setting of the competitions for termination tools. There is a much more efficient batch mode built into the graphical user interface of AProVE 1.2 which can run the system on directories of examples without re-starting AProVE on each example.




Performance on Higher-Order Functions


To evaluate the performance of our contributions on termination of higher-order functions, we tested AProVE on 61 such examples from the literature.


AProVE 1.2AProVE1.1d-gamma
YES4313
NO87
MAYBE1041
TOTAL6161
For details on individual examples and runtimes please see the full version.


So due to our new contributions AProVE 1.2 can prove or disprove termination of 51 examples, while it only fails on 10 examples from the collection. In contrast, AProVE 1.1d-gamma, which lacks our contributions for handling higher-order functions performs very poorly on this collection. (For non-termination, the two versions are almost equally powerful, since most of our contributions on disproving termination were already implemented in AProVE 1.1d-gamma.) The experiments also show that our component for disproving termination can, of course, also be used for higher-order functions.

Our experiments also illustrate the differences between our approach and related work on termination proofs of higher-order functions represented in applicative first-order form. One difference is that our approach is at least as powerful as the original dependency pair technique on first-order functions (as demonstrated by the experiments on the whole TPDB). The other main difference is that we handle untyped TRSs, whereas most other work on termination of applicative TRSs deals with monomorphically typed TRSs. Obviously, if an untyped TRS terminates, then it also terminates if it is typed, but not vice versa. This indicates that proving termination of untyped TRSs is much harder than proving termination of typed TRSs. For this reason, other techniques can sometimes handle monomorphically typed versions of the functions in this collection where we fail. However, such termination proofs cannot be used to verify termination of the functions in a setting with polymorphic or no types. In other words, we regard a completely different problem, which is however very relevant, since most modern functional languages either have polymorphic or no types.

The difference between termination proofs of untyped and (monomorphically) typed TRSs is illustrated by the following two examples from this collection:

Untyped termination is stronger than (monomorphically) typed termination

The following example shows that termination of a monomorphically typed TRS does not imply that the TRS is terminating if one uses polymorphic (or no) types. Here, x, y, and z are variables:
f ' x -> x
g ' x ' y ' (s ' z) -> g ' x ' y ' (x ' y ' 0)
For the following simple types, the above TRS is terminating:
f :: N -> N
g :: (N -> N -> N) -> N -> N -> N
s :: N -> N
0 :: N
h :: N -> N -> N
In its untyped (or polymorphically typed) version, though, there is the following infinite reduction which is easily detected by AProVE's non-termination component (c.f. the example termMonTypes in the full version of the above table):
g ' f ' s ' (f ' s ' 0) ->rule1 g ' f ' s' (s ' 0) ->rule2 g ' f ' s ' (f ' s ' 0) ->rule1 ...


Untyped termination is harder to prove than (monomorphically) typed termination

To see why termination proofs for untyped functions are harder, consider the well-known higher-order function foldr together with a terminating function ff (where g, h, x, and xs are variables):
ff ' x ' x -> x ' (ff ' x) ' (cons ' x ' nil)
foldr ' g ' h ' nil -> h
foldr ' g ' h ' (cons ' x ' xs) -> g ' x ' (foldr ' g ' h ' xs)
Although ff terminates on its own (cf. the example TypeEx1), the combination of foldr with ff does not terminate (cf. the example nonTerminatingFold). In other words, combining foldr with a terminating function that only shares "constructors" with foldr can introduce non-termination. (Here, "constructors" are regarded w.r.t. the higher-order formulation of foldr and ff where there is no application symbol '.) This indicates why proving termination of foldr in an untyped setting is extremely difficult, while it is very easy in a setting with monomorphic types. An example for a cyclic derivation with the above TRS is the following:
foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' (cons ' foldr ' nil) ->
ff ' foldr ' foldr ' (foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' nil) ->
ff ' foldr ' foldr ' (cons ' foldr ' nil) ->
foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' (cons ' foldr ' nil)




Performance when Proving Non-Termination

To evaluate our contributions for proving non-termination of TRSs we performed the following test. Many of the TRSs in the TPDB have an evaluation strategy (e.g., "innermost") associated with them and most of them terminate w.r.t.\ their associated strategy. To evaluate the non-termination component of AProVE, we therefore also ran the system on these TRSs where we deleted the strategy information (i.e., now one had to prove or disprove full termination).

The non-termination component is indeed very powerful: for 61 of the 65 examples where AProVE cannot show full termination, it proves non-termination.


AProVE 1.2AProVE1.1d-gamma
YES2524
NO6160
MAYBE46
TOTAL9090
For details on individual examples and runtimes please see the full version.




Performance on First-Order Functions in Applicative and Functional Representation


With our contributions, it does not matter anymore for the termination proof whether first-order functions are represented in applicative or ordinary functional form. To demonstrate this, we tested AProVE on standard first-order TRSs from the TPDB, both in applicative and ordinary functional representation.

The first table summarizes our results for the original formulation of these TRSs. Here, AProVE 1.2 which contains our contributions for higher-order functions is more or less equally successful as AProVE 1.1d-gamma.


AProVE 1.2AProVE 1.1d-gamma
YES5251
NO00
MAYBE23
TOTAL5454
For details on individual examples and runtimes please see the full version.


Then we translated all examples to fully curried form using a binary application function. While the AProVE 1.1d-gamma performs poorly on the applicative versions of these TRSs, AProVE 1.2 is as successful as before. Thus, using our improvements, applicative TRSs are as easy to handle as TRSs in functional form.


AProVE 1.2AProVE 1.1d-gamma
YES5221
NO00
MAYBE233
TOTAL138138
For details on individual examples and runtimes please see the full version.


Note that the performance of AProVE 1.2 is not affected if we add the well-known higher-order function map to each of the examples. This is not at all true for AProVE 1.1d-gamma which lacks our new contributions.



Performance on the Whole Termination Problem Data Base


Our extension which can also handle higher-order functions is at least as powerful as the original DP framework. To demonstrate this empirically, we tested AProVE on all TRSs from the TPDB where one has to prove full or innermost termination. Indeed, the following table shows that AProVE 1.2 performs significantly better than AProVE 1.1d-gamma.


AProVE 1.2AProVE1.1d-gamma
YES639486
NO9592
MAYBE104260
TOTAL838838
For details on individual examples and runtimes please see the full version.


Our collections from the previous sections were integrated into the most recent version of the TPDB. To measure the effect of our improvements also for ordinary (non-applicative) TRSs, in the following table we only regard those TRSs of the TPDB which do not come from the collections of the previous sections, i.e., the TPDB without our new examples for higher-order functions, non-termination analysis, and curried first-order functions.


AProVE 1.2AProVE 1.1d-gamma
YES519428
NO2625
MAYBE88180
TOTAL633633
For details on individual examples and runtimes please see the full version.


The above table demonstrates that our improvements for handling higher-order functions also have a very positive impact for ordinary first-order TRSs. This is in contrast to most previous extensions of dependency pairs to the higher-order setting.

Disclaimer Last modified: Fri Mar 04 17:52:18 CET 2005