Research Group Computer Science II | Department of Computer Science |
Experiments on Proving and Disproving Termination of Higher-Order Functions
|
In the following, we describe our experiments to evaluate the results of the paper "Proving and Disproving Termination of Higher-Order Functions". In this paper, there are two main contributions:
AProVE 1.1d-gamma participated in the Annual International Competition of Termination Tools 2004 at the International Workshop on Termination (WST '04) and it was the most powerful termination tool for TRSs in this competition. In this version of AProVE, almost all of our new results on non-termination proving were already integrated, but our contributions on handling higher-order functions were missing.
The newest version of our system, AProVE 1.2, participated in the Annual International Competition of Termination Tools 2005 and again proved to be the most powerful tool for both termination and non-termination proofs of TRSs. The collection of examples used for the competition was the Termination Problem Data Base (TPDB). It represents the current standard benchmark for termination problems in the field of term rewriting.
We performed the following four experiments:
Note that the experiments were performed in a setting which is
particularly unfortunate for a system like AProVE which is written in Java. The
reason is that AProVE was re-started on each
example. Therefore, the runtimes given in the tables
include the overhead for starting the Java virtual machine as well as for compiling
the required techniques by the just-in-time compiler again and again for each
example.
Nevertheless, we chose this method for evaluating our contributions, since it corresponds
to the setting of the competitions for termination tools.
There is a much more efficient batch mode built into the graphical user
interface of AProVE 1.2 which can
run the system on directories of examples without re-starting AProVE on each example.
AProVE 1.2 | AProVE1.1d-gamma | |
---|---|---|
YES | 43 | 13 |
NO | 8 | 7 |
MAYBE | 10 | 41 |
TOTAL | 61 | 61 |
For details on individual examples and runtimes please see the full version. |
So due to our new contributions AProVE 1.2 can prove or disprove termination
of 51 examples, while it only fails on 10 examples from the
collection. In contrast, AProVE 1.1d-gamma,
which lacks our contributions for
handling higher-order functions performs very poorly on this
collection. (For non-termination, the two versions are almost equally
powerful, since most of our contributions on disproving termination
were already implemented in AProVE 1.1d-gamma.) The experiments also
show that our component for disproving termination can, of course, also be
used for higher-order functions.
Our experiments also illustrate the differences between our approach and related work on termination proofs of higher-order functions represented in applicative first-order form. One difference is that our approach is at least as powerful as the original dependency pair technique on first-order functions (as demonstrated by the experiments on the whole TPDB). The other main difference is that we handle untyped TRSs, whereas most other work on termination of applicative TRSs deals with monomorphically typed TRSs. Obviously, if an untyped TRS terminates, then it also terminates if it is typed, but not vice versa. This indicates that proving termination of untyped TRSs is much harder than proving termination of typed TRSs. For this reason, other techniques can sometimes handle monomorphically typed versions of the functions in this collection where we fail. However, such termination proofs cannot be used to verify termination of the functions in a setting with polymorphic or no types. In other words, we regard a completely different problem, which is however very relevant, since most modern functional languages either have polymorphic or no types.
The difference between termination proofs of untyped and (monomorphically)
typed TRSs is illustrated by the following two examples
from this collection:
f ' x -> xFor the following simple types, the above TRS is terminating:
g ' x ' y ' (s ' z) -> g ' x ' y ' (x ' y ' 0)
f :: N -> NIn its untyped (or polymorphically typed) version, though, there is the following infinite reduction which is easily detected by AProVE's non-termination component (c.f. the example termMonTypes in the full version of the above table):
g :: (N -> N -> N) -> N -> N -> N
s :: N -> N
0 :: N
h :: N -> N -> N
g ' f ' s ' (f ' s ' 0) ->rule1 g ' f ' s' (s ' 0) ->rule2 g ' f ' s ' (f ' s ' 0) ->rule1 ...
ff ' x ' x -> x ' (ff ' x) ' (cons ' x ' nil)Although ff terminates on its own (cf. the example TypeEx1), the combination of foldr with ff does not terminate (cf. the example nonTerminatingFold). In other words, combining foldr with a terminating function that only shares "constructors" with foldr can introduce non-termination. (Here, "constructors" are regarded w.r.t. the higher-order formulation of foldr and ff where there is no application symbol '.) This indicates why proving termination of foldr in an untyped setting is extremely difficult, while it is very easy in a setting with monomorphic types. An example for a cyclic derivation with the above TRS is the following:
foldr ' g ' h ' nil -> h
foldr ' g ' h ' (cons ' x ' xs) -> g ' x ' (foldr ' g ' h ' xs)
foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' (cons ' foldr ' nil) ->
ff ' foldr ' foldr ' (foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' nil) ->
ff ' foldr ' foldr ' (cons ' foldr ' nil) ->
foldr ' (ff ' foldr) ' (cons ' foldr ' nil) ' (cons ' foldr ' nil)
The non-termination component is indeed very powerful: for 61 of the 65 examples where AProVE cannot show full termination, it proves non-termination.
AProVE 1.2 | AProVE1.1d-gamma | |
---|---|---|
YES | 25 | 24 |
NO | 61 | 60 |
MAYBE | 4 | 6 |
TOTAL | 90 | 90 |
For details on individual examples and runtimes please see the full version. |
The first table summarizes our results for the original formulation of these TRSs. Here, AProVE 1.2 which contains our contributions for higher-order functions is more or less equally successful as AProVE 1.1d-gamma.
AProVE 1.2 | AProVE 1.1d-gamma | |
---|---|---|
YES | 52 | 51 |
NO | 0 | 0 |
MAYBE | 2 | 3 |
TOTAL | 54 | 54 |
For details on individual examples and runtimes please see the full version. |
Then we translated all examples
to fully curried form using a binary application function.
While the AProVE 1.1d-gamma
performs poorly on the applicative versions of these TRSs, AProVE 1.2
is as successful as before. Thus, using our improvements, applicative TRSs
are as
easy to handle as TRSs in functional form.
AProVE 1.2 | AProVE 1.1d-gamma | |
---|---|---|
YES | 52 | 21 |
NO | 0 | 0 |
MAYBE | 2 | 33 |
TOTAL | 138 | 138 |
For details on individual examples and runtimes please see the full version. |
Note that the performance of AProVE
1.2
is not affected
if we add the well-known higher-order function map to each of the examples.
This is not at all true for AProVE
1.1d-gamma which lacks our new contributions.
AProVE 1.2 | AProVE1.1d-gamma | |
---|---|---|
YES | 639 | 486 |
NO | 95 | 92 |
MAYBE | 104 | 260 |
TOTAL | 838 | 838 |
For details on individual examples and runtimes please see the full version. |
Our collections from the previous sections
were integrated into the most recent version of the TPDB. To
measure the effect of our improvements also for ordinary (non-applicative) TRSs,
in the following table we only regard those TRSs of the TPDB
which do not come from the collections of the previous sections, i.e., the
TPDB
without our
new examples for higher-order functions, non-termination analysis, and curried
first-order functions.
AProVE 1.2 | AProVE 1.1d-gamma | |
---|---|---|
YES | 519 | 428 |
NO | 26 | 25 |
MAYBE | 88 | 180 |
TOTAL | 633 | 633 |
For details on individual examples and runtimes please see the full version. |
The above table
demonstrates that our improvements for handling higher-order
functions also have a very positive impact for ordinary
first-order
TRSs. This is in contrast to most previous extensions of
dependency pairs to the higher-order setting.
Disclaimer | Last modified: Fri Mar 04 17:52:18 CET 2005 |