Research Group Computer Science II  Department of Computer Science 
Experiments on Proving and Disproving Termination of HigherOrder Functions

In the following, we describe our experiments to evaluate the results of the paper "Proving and Disproving Termination of HigherOrder Functions". In this paper, there are two main contributions:
AProVE 1.1dgamma participated in the Annual International Competition of Termination Tools 2004 at the International Workshop on Termination (WST '04) and it was the most powerful termination tool for TRSs in this competition. In this version of AProVE, almost all of our new results on nontermination proving were already integrated, but our contributions on handling higherorder functions were missing.
The newest version of our system, AProVE 1.2, participated in the Annual International Competition of Termination Tools 2005 and again proved to be the most powerful tool for both termination and nontermination proofs of TRSs. The collection of examples used for the competition was the Termination Problem Data Base (TPDB). It represents the current standard benchmark for termination problems in the field of term rewriting.
We performed the following four experiments:
Note that the experiments were performed in a setting which is
particularly unfortunate for a system like AProVE which is written in Java. The
reason is that AProVE was restarted on each
example. Therefore, the runtimes given in the tables
include the overhead for starting the Java virtual machine as well as for compiling
the required techniques by the justintime compiler again and again for each
example.
Nevertheless, we chose this method for evaluating our contributions, since it corresponds
to the setting of the competitions for termination tools.
There is a much more efficient batch mode built into the graphical user
interface of AProVE 1.2 which can
run the system on directories of examples without restarting AProVE on each example.
AProVE 1.2  AProVE1.1dgamma  

YES  43  13 
NO  8  7 
MAYBE  10  41 
TOTAL  61  61 
For details on individual examples and runtimes please see the full version. 
So due to our new contributions AProVE 1.2 can prove or disprove termination
of 51 examples, while it only fails on 10 examples from the
collection. In contrast, AProVE 1.1dgamma,
which lacks our contributions for
handling higherorder functions performs very poorly on this
collection. (For nontermination, the two versions are almost equally
powerful, since most of our contributions on disproving termination
were already implemented in AProVE 1.1dgamma.) The experiments also
show that our component for disproving termination can, of course, also be
used for higherorder functions.
Our experiments also illustrate the differences between our approach and related work on termination proofs of higherorder functions represented in applicative firstorder form. One difference is that our approach is at least as powerful as the original dependency pair technique on firstorder functions (as demonstrated by the experiments on the whole TPDB). The other main difference is that we handle untyped TRSs, whereas most other work on termination of applicative TRSs deals with monomorphically typed TRSs. Obviously, if an untyped TRS terminates, then it also terminates if it is typed, but not vice versa. This indicates that proving termination of untyped TRSs is much harder than proving termination of typed TRSs. For this reason, other techniques can sometimes handle monomorphically typed versions of the functions in this collection where we fail. However, such termination proofs cannot be used to verify termination of the functions in a setting with polymorphic or no types. In other words, we regard a completely different problem, which is however very relevant, since most modern functional languages either have polymorphic or no types.
The difference between termination proofs of untyped and (monomorphically)
typed TRSs is illustrated by the following two examples
from this collection:
f ' x > xFor the following simple types, the above TRS is terminating:
g ' x ' y ' (s ' z) > g ' x ' y ' (x ' y ' 0)
f :: N > NIn its untyped (or polymorphically typed) version, though, there is the following infinite reduction which is easily detected by AProVE's nontermination component (c.f. the example termMonTypes in the full version of the above table):
g :: (N > N > N) > N > N > N
s :: N > N
0 :: N
h :: N > N > N
g ' f ' s ' (f ' s ' 0) >_{rule1} g ' f ' s' (s ' 0) >_{rule2} g ' f ' s ' (f ' s ' 0) >_{rule1} ...
ff ' x ' x > x ' (ff ' x) ' (cons ' x ' nil)Although ff terminates on its own (cf. the example TypeEx1), the combination of foldr with ff does not terminate (cf. the example nonTerminatingFold). In other words, combining foldr with a terminating function that only shares "constructors" with foldr can introduce nontermination. (Here, "constructors" are regarded w.r.t. the higherorder formulation of foldr and ff where there is no application symbol '.) This indicates why proving termination of foldr in an untyped setting is extremely difficult, while it is very easy in a setting with monomorphic types. An example for a cyclic derivation with the above TRS is the following:
foldr ' g ' h ' nil > h
foldr ' g ' h ' (cons ' x ' xs) > g ' x ' (foldr ' g ' h ' xs)
foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' (cons ' foldr ' nil) >
ff ' foldr ' foldr ' (foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' nil) >
ff ' foldr ' foldr ' (cons ' foldr ' nil) >
foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' (cons ' foldr ' nil)
The nontermination component is indeed very powerful: for 61 of the 65 examples where AProVE cannot show full termination, it proves nontermination.
AProVE 1.2  AProVE1.1dgamma  

YES  25  24 
NO  61  60 
MAYBE  4  6 
TOTAL  90  90 
For details on individual examples and runtimes please see the full version. 
The first table summarizes our results for the original formulation of these TRSs. Here, AProVE 1.2 which contains our contributions for higherorder functions is more or less equally successful as AProVE 1.1dgamma.
AProVE 1.2  AProVE 1.1dgamma  

YES  52  51 
NO  0  0 
MAYBE  2  3 
TOTAL  54  54 
For details on individual examples and runtimes please see the full version. 
Then we translated all examples
to fully curried form using a binary application function.
While the AProVE 1.1dgamma
performs poorly on the applicative versions of these TRSs, AProVE 1.2
is as successful as before. Thus, using our improvements, applicative TRSs
are as
easy to handle as TRSs in functional form.
AProVE 1.2  AProVE 1.1dgamma  

YES  52  21 
NO  0  0 
MAYBE  2  33 
TOTAL  138  138 
For details on individual examples and runtimes please see the full version. 
Note that the performance of AProVE
1.2
is not affected
if we add the wellknown higherorder function map to each of the examples.
This is not at all true for AProVE
1.1dgamma which lacks our new contributions.
AProVE 1.2  AProVE1.1dgamma  

YES  639  486 
NO  95  92 
MAYBE  104  260 
TOTAL  838  838 
For details on individual examples and runtimes please see the full version. 
Our collections from the previous sections
were integrated into the most recent version of the TPDB. To
measure the effect of our improvements also for ordinary (nonapplicative) TRSs,
in the following table we only regard those TRSs of the TPDB
which do not come from the collections of the previous sections, i.e., the
TPDB
without our
new examples for higherorder functions, nontermination analysis, and curried
firstorder functions.
AProVE 1.2  AProVE 1.1dgamma  

YES  519  428 
NO  26  25 
MAYBE  88  180 
TOTAL  633  633 
For details on individual examples and runtimes please see the full version. 
The above table
demonstrates that our improvements for handling higherorder
functions also have a very positive impact for ordinary
firstorder
TRSs. This is in contrast to most previous extensions of
dependency pairs to the higherorder setting.
Disclaimer  Last modified: Fri Mar 04 17:52:18 CET 2005 