Experiments on Proving and Disproving Termination of Higher-Order Functions

In the following, we describe our experiments to evaluate the results of the paper "Proving and Disproving Termination of Higher-Order Functions". In this paper, there are two main contributions:

We developed a technique to handle higher-order functions in the DP framework, where these functions are represented as untyped applicative first-order TRSs.
We presented a method to prove non-termination within the DP framework.

Overview on the Experiments

These two contributions are implemented in the most recent version of AProVE (AProVE 1.2) which can be downloaded here. Moreover, AProVE 1.2 can also be accessed via its web interface. To evaluate these contributions, we performed four experiments. Each of these experiments was performed both with AProVE 1.2 and with AProVE 1.1d-gamma.

AProVE 1.1d-gamma participated in the Annual International Competition of Termination Tools 2004 at the International Workshop on Termination (WST '04) and it was the most powerful termination tool for TRSs in this competition. In this version of AProVE, almost all of our new results on non-termination proving were already integrated, but our contributions on handling higher-order functions were missing.

The newest version of our system, AProVE 1.2, participated in the Annual International Competition of Termination Tools 2005 and again proved to be the most powerful tool for both termination and non-termination proofs of TRSs. The collection of examples used for the competition was the Termination Problem Data Base (TPDB). It represents the current standard benchmark for termination problems in the field of term rewriting.

We performed the following four experiments:

Performance on Higher-Order Functions
To evaluate the performance of our contributions on termination of higher-order functions, we tested AProVE on many examples from the literature.

Performance when Proving Non-Termination
To evaluate the non-termination component of AProVE, we ran the system on TRSs from the Termination Problem Data Base (TPDB) where we deleted the strategy information (i.e., now one had to prove or disprove full termination instead of "innermost"' or "context-sensitive" termination).

Performance on First-Order Functions in Applicative and Functional Representation
To evaluate our performance on first-order functions in applicative form, we tested AProVE on standard first-order TRSs from the Termination Problem Data Base (TPDB), both in applicative and ordinary functional representation.

Performance on the Whole Termination Problem Data Base
Our extension which can also handle higher-order functions is at least as powerful as the original DP framework on ordinary first-order TRSs. To demonstrate this, we tested AProVE on all termination and innermost termination problems for TRSs from the Termination Problem Data Base (TPDB).

For each of the experiments, we provide tables summarizing the results and tables with the detailed runtimes for each example. In these detailed ("full version") tables, one can:

download all examples as a tar.gz-file
click and inspect each example individually
click on the stars in the table to see the proof generated by AProVE automatically
submit each example to the web frontend of AProVE and let AProVE run on it online
edit each example in the web interface of AProVE and let AProVE run on it online afterwards

Computers used

All empirical data was obtained by running the command line interface of AProVE with a timeout of 60 seconds for each file. This corresponds to the way that termination tools were evaluated in the competitions for termination tools in 2004 and 2005. The computer used was an AMD Athlon 64 at 2.2 GHz running Sun's J2SE 1.5.0 under GNU/Linux 2.6.10 which is similar in speed to the computer used in the 2005 competition. Indeed, the results of AProVE 1.2 in our experiments correspond to the ones obtained during the competition 2005. The web interface runs on a different computer which might result in different runtimes.

Note that the experiments were performed in a setting which is particularly unfortunate for a system like AProVE which is written in Java. The reason is that AProVE was re-started on each example. Therefore, the runtimes given in the tables include the overhead for starting the Java virtual machine as well as for compiling the required techniques by the just-in-time compiler again and again for each example. Nevertheless, we chose this method for evaluating our contributions, since it corresponds to the setting of the competitions for termination tools. There is a much more efficient batch mode built into the graphical user interface of AProVE 1.2 which can run the system on directories of examples without re-starting AProVE on each example.

Performance on Higher-Order Functions

To evaluate the performance of our contributions on termination of higher-order functions, we tested AProVE on 61 such examples from the literature.

AProVE 1.2 AProVE1.1d-gamma

YES 43 13

NO 8 7

MAYBE 10 41

TOTAL 61 61

For details on individual examples and runtimes please see the full version.

	AProVE 1.2	AProVE1.1d-gamma
YES	43	13
NO	8	7
MAYBE	10	41

TOTAL	61	61

For details on individual examples and runtimes please see the full version.

So due to our new contributions AProVE 1.2 can prove or disprove termination of 51 examples, while it only fails on 10 examples from the collection. In contrast, AProVE 1.1d-gamma, which lacks our contributions for handling higher-order functions performs very poorly on this collection. (For non-termination, the two versions are almost equally powerful, since most of our contributions on disproving termination were already implemented in AProVE 1.1d-gamma.) The experiments also show that our component for disproving termination can, of course, also be used for higher-order functions.

Our experiments also illustrate the differences between our approach and related work on termination proofs of higher-order functions represented in applicative first-order form. One difference is that our approach is at least as powerful as the original dependency pair technique on first-order functions (as demonstrated by the experiments on the whole TPDB). The other main difference is that we handle untyped TRSs, whereas most other work on termination of applicative TRSs deals with monomorphically typed TRSs. Obviously, if an untyped TRS terminates, then it also terminates if it is typed, but not vice versa. This indicates that proving termination of untyped TRSs is much harder than proving termination of typed TRSs. For this reason, other techniques can sometimes handle monomorphically typed versions of the functions in this collection where we fail. However, such termination proofs cannot be used to verify termination of the functions in a setting with polymorphic or no types. In other words, we regard a completely different problem, which is however very relevant, since most modern functional languages either have polymorphic or no types.

The difference between termination proofs of untyped and (monomorphically) typed TRSs is illustrated by the following two examples from this collection:

Untyped termination is stronger than (monomorphically) typed termination

The following example shows that termination of a monomorphically typed TRS does not imply that the TRS is terminating if one uses polymorphic (or no) types. Here, x, y, and z are variables:

f ' x -> x
g ' x ' y ' (s ' z) -> g ' x ' y ' (x ' y ' 0)

For the following simple types, the above TRS is terminating:

f :: N -> N
g :: (N -> N -> N) -> N -> N -> N
s :: N -> N
0 :: N
h :: N -> N -> N

In its untyped (or polymorphically typed) version, though, there is the following infinite reduction which is easily detected by AProVE's non-termination component (c.f. the example termMonTypes in the full version of the above table):

g ' f ' s ' (f ' s ' 0) ->_rule1 g ' f ' s' (s ' 0) ->_rule2 g ' f ' s ' (f ' s ' 0) ->_rule1 ...

Untyped termination is harder to prove than (monomorphically) typed termination

To see why termination proofs for untyped functions are harder, consider the well-known higher-order function foldr together with a terminating function ff (where g, h, x, and xs are variables):

ff ' x ' x -> x ' (ff ' x) ' (cons ' x ' nil)
foldr ' g ' h ' nil -> h
foldr ' g ' h ' (cons ' x ' xs) -> g ' x ' (foldr ' g ' h ' xs)

Although ff terminates on its own (cf. the example TypeEx1), the combination of foldr with ff does not terminate (cf. the example nonTerminatingFold). In other words, combining foldr with a terminating function that only shares "constructors" with foldr can introduce non-termination. (Here, "constructors" are regarded w.r.t. the higher-order formulation of foldr and ff where there is no application symbol '.) This indicates why proving termination of foldr in an untyped setting is extremely difficult, while it is very easy in a setting with monomorphic types. An example for a cyclic derivation with the above TRS is the following:

foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' (cons ' foldr ' nil) ->
ff ' foldr ' foldr ' (foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' nil) ->
ff ' foldr ' foldr ' (cons ' foldr ' nil) ->
foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' (cons ' foldr ' nil)

Performance when Proving Non-Termination

To evaluate our contributions for proving non-termination of TRSs we performed the following test. Many of the TRSs in the TPDB have an evaluation strategy (e.g., "innermost") associated with them and most of them terminate w.r.t.\ their associated strategy. To evaluate the non-termination component of AProVE, we therefore also ran the system on these TRSs where we deleted the strategy information (i.e., now one had to prove or disprove full termination).

The non-termination component is indeed very powerful: for 61 of the 65 examples where AProVE cannot show full termination, it proves non-termination.

AProVE 1.2 AProVE1.1d-gamma

YES 25 24

NO 61 60

MAYBE 4 6

TOTAL 90 90

For details on individual examples and runtimes please see the full version.

	AProVE 1.2	AProVE1.1d-gamma
YES	25	24
NO	61	60
MAYBE	4	6

TOTAL	90	90

For details on individual examples and runtimes please see the full version.

Performance on First-Order Functions in Applicative and Functional Representation

With our contributions, it does not matter anymore for the termination proof whether first-order functions are represented in applicative or ordinary functional form. To demonstrate this, we tested AProVE on standard first-order TRSs from the TPDB, both in applicative and ordinary functional representation.

The first table summarizes our results for the original formulation of these TRSs. Here, AProVE 1.2 which contains our contributions for higher-order functions is more or less equally successful as AProVE 1.1d-gamma.

AProVE 1.2 AProVE 1.1d-gamma

YES 52 51

NO 0 0

MAYBE 2 3

TOTAL 54 54

For details on individual examples and runtimes please see the full version.

	AProVE 1.2	AProVE 1.1d-gamma
YES	52	51
NO	0	0
MAYBE	2	3

TOTAL	54	54

For details on individual examples and runtimes please see the full version.

Then we translated all examples to fully curried form using a binary application function. While the AProVE 1.1d-gamma performs poorly on the applicative versions of these TRSs, AProVE 1.2 is as successful as before. Thus, using our improvements, applicative TRSs are as easy to handle as TRSs in functional form.

AProVE 1.2 AProVE 1.1d-gamma

YES 52 21

NO 0 0

MAYBE 2 33

TOTAL 138 138

For details on individual examples and runtimes please see the full version.

	AProVE 1.2	AProVE 1.1d-gamma
YES	52	21
NO	0	0
MAYBE	2	33

TOTAL	138	138

For details on individual examples and runtimes please see the full version.

Note that the performance of AProVE 1.2 is not affected if we add the well-known higher-order function map to each of the examples. This is not at all true for AProVE 1.1d-gamma which lacks our new contributions.

Performance on the Whole Termination Problem Data Base

Our extension which can also handle higher-order functions is at least as powerful as the original DP framework. To demonstrate this empirically, we tested AProVE on all TRSs from the TPDB where one has to prove full or innermost termination. Indeed, the following table shows that AProVE 1.2 performs significantly better than AProVE 1.1d-gamma.

AProVE 1.2 AProVE1.1d-gamma

YES 639 486

NO 95 92

MAYBE 104 260

TOTAL 838 838

For details on individual examples and runtimes please see the full version.

	AProVE 1.2	AProVE1.1d-gamma
YES	639	486
NO	95	92
MAYBE	104	260

TOTAL	838	838

For details on individual examples and runtimes please see the full version.

Our collections from the previous sections were integrated into the most recent version of the TPDB. To measure the effect of our improvements also for ordinary (non-applicative) TRSs, in the following table we only regard those TRSs of the TPDB which do not come from the collections of the previous sections, i.e., the TPDB without our new examples for higher-order functions, non-termination analysis, and curried first-order functions.

AProVE 1.2 AProVE 1.1d-gamma

YES 519 428

NO 26 25

MAYBE 88 180

TOTAL 633 633

For details on individual examples and runtimes please see the full version.

	AProVE 1.2	AProVE 1.1d-gamma
YES	519	428
NO	26	25
MAYBE	88	180

TOTAL	633	633

For details on individual examples and runtimes please see the full version.

The above table demonstrates that our improvements for handling higher-order functions also have a very positive impact for ordinary first-order TRSs. This is in contrast to most previous extensions of dependency pairs to the higher-order setting.

Disclaimer Last modified: Fri Mar 04 17:52:18 CET 2005