Empirical Evaluation of "Automated Termination Proofs for Logic Programs by Term Rewriting"

In the paper Automated Termination Proofs for Logic Programs by Term Rewriting we present a novel non-termination preserving transformation from logic programs to term rewriting systems. In our experiments, we want to assess the power of our new transformational method compared to direct approaches and to the classical transformational method for well-moded logic programs.

Implementation in AProVE

We integrated our approach in the termination tool AProVE which implements the DP framework. It can be accessed via its web interface.

Tools

We compare our implementation of the new method with four other representative tools for termination analysis of logic programs. Thus, in our experiments we use the following five tools:

AProVE 07:
This is the version of AProVE which implements all results of this paper and was used in the termination competition 2007. It is also available through our web interface.
Polytool:
The Polytool termination analyser implements a direct approach. We ran Polytool in the configuration of the termination competition 2007.
TerminWeb 1.1:
The TerminWeb termination analyser also implements a direct approach. Thanks to Samir Genaim and Michael Codish we were able to run TerminWeb locally without interaction using a combination of the termsize (edges) and the list-length norm.
cTI 1.1pre1:
The cTI Termination Inference Engine also implements a direct approach. Thanks to Roberto Bagnara we were able to use cTI 1.1pre1 in combination with the Parma Polyhedra library 0.8. We allowed 5 iterations before widening (-n 5).
TALP:
The TALP system is a frontend to CiME implementing the classical transformational approach for well-moded logic programs. Enno Ohlebusch kindly provided us with the frontend. As a backend we used the most current version of CiME, CiME 2.02.

Examples

We ran the tools on a set of 296 examples in fully automatic mode. This set includes all logic programming examples from the Termination Problem Data Base 4.0 (TPDB 2007) which is used in the annual international termination competition. It contains collections provided by the developers of several different tools including all examples from the experimental evaluation of the paper Termination Analysis of Logic Programs through Combination of Type-Based Norms by Maurice Bruynooghe, Michael Codish, John Gallagher, Samir Genaim and Wim Vanhoof (directory "BCGGV05"). In order to evaluate the behavior of our new transformation as well as of our implementation we collected and adapted 59 further examples which are now also part of the TPDB (directory "SGST06").

To eliminate the influence of the translation from Prolog to logic programs, we removed all examples from the collections that use non-trivial built-in predicates or that are not definite logic programs after ignoring the cut operator. This was necessary as all four tools handle unknown predicates differently. TALP just removes atoms containing unknown predicates, thereby assuming that they always terminate and succeed. cTI assumes finite failure for all unknown predicates. TerminWeb throws an error when encountering unknown predicates. AProVE handles some built-ins by translation, others by removal. Where possible, atoms containing trivially terminating predicates like write and nl were simply removed from the bodies of the clauses.

The examples from the paper can be experimented with by clicking on one of the four buttons below:

Experiments & Discussion

All tools were run locally on an AMD Athlon 64 at 2.2 GHz under GNU/Linux 2.6. For each example we used a time limit of 60 seconds. This is similar to the way tools are evaluated in the annual termination competitions.

In the tables, we give the number of examples where termination could be proved ("YES"), where the tools could not prove termination within the time limit of 60 seconds ("MAYBE"), and where the proof failed due to a time-out ("KILLED"). In square brackets, we give the average runtime (in seconds) needed where the tool could prove termination and where it failed in proving termination within the time limit. Clicking on "Full Table" gives the full table with detailed proof output of the respective tools. The full table also allows to load an individual example into our web interface.

Tool YES MAYBE TIMEOUT

AProVE 232 [6.3] 57 [19.1] 7 Full Table

Polytool 204 [3.1] 82 [3.0] 10 Full Table

TerminWeb 177 [0.5] 118 [0.6] 1 Full Table

cTI 167 [0.1] 129 [0.0] 0 Full Table

TALP 163 [2.5] 112 [1.4] 21 Full Table

Tool	YES	MAYBE	TIMEOUT
AProVE	232 [6.3]	57 [19.1]	7	Full Table
Polytool	204 [3.1]	82 [3.0]	10	Full Table
TerminWeb	177 [0.5]	118 [0.6]	1	Full Table
cTI	167 [0.1]	129 [0.0]	0	Full Table
TALP	163 [2.5]	112 [1.4]	21	Full Table

As shown in the table above, AProVE succeeds on more examples than any other tool. The comparison of AProVE and TALP shows that our approach improves significantly upon the previous transformational method that TALP is based on. In particular, TALP fails for all non-well-moded programs.

The comparison with Polytool, TerminWeb, and cTI demonstrates that our new transformational approach is not only comparable in power, but usually more powerful than direct approaches. In fact, there is only a single example where one of the other tools (namely Polytool) succeeds and AProVE fails. This is a rather contrived example from the paper which we developed to demonstrate the limitations of our method. Polytool is only able to handle this example via a pre-processing step based on partial evaluation. In this example, this pre-processing step results in a trivially terminating logic program. Thus, if one combined this pre-processing with any of the other tools, then they would also be able to prove termination of this particular example. Integrating some form of partial evaluation into AProVE might be an interesting possibility for further improvement. For all other examples, AProVE can show termination whenever at least one of the other tools succeeds. Moreover, there are several examples where AProVE succeeds whereas no other tool shows the termination. These include examples where the termination proof requires a more complex orderings. For instance, termination of the example SGST06/hbal_tree.pl can be proved using a recursive path order with status and termination of talp/apt/mergesort_ap.pl is shown using matrix orders.

Note that 52 examples in this collection are known to be non-terminating, i.e., there are at most 244 terminating examples. In other words, there are only at most 12 terminating examples where AProVE did not manage to prove termination. With this performance, AProVE won the termination competition with Polytool being the second most powerful tool. The best tool for non-termination analysis of logic programs was NTI.

However, from the experiments above one should not draw the conclusion that the transformational approach is always better than the direct approach to termination analysis of logic programs. There are several extensions (e.g., termination inference, non-termination analysis, handling numerical data structures) that can currently only be handled by direct techniques and tools.

Regarding the use of term rewriting techniques for termination analysis of logic programs, it is interesting to note that the currently most powerful tool for direct termination analysis of logic programs (Polytool) implements a framework for applying techniques from term rewriting (most notably polynomial interpretations) to logic programs directly. This framework forms the basis for further extensions to other TRS-termination techniques. For example, it can be extended further by adapting also basic results of the dependency pair method to the logic programming setting. Preliminary investigations with a prototypical implementation indicate that in this way, one can prove termination of several examples where the transformational approach with AProVE currently fails.

So transformational and direct approaches both have their advantages and the most powerful solution might be to combine direct tools like Polytool with a transformational prover like AProVE which is based on the contributions of this paper. But it is clear that it is indeed beneficial to use termination techniques from TRSs for logic programs, both for direct and for transformational approaches.

In addition to the experiments described above (which compare different termination provers), we also performed experiments with several versions of AProVE in order to evaluate the different heuristics and algorithms for the computation of argument filters from the paper. The following table shows that indeed our improved type-based refinement heuristic (tb') significantly outperforms the simple improved outermost (om') and innermost (im) heuristics. In fact, all examples that could be proved terminating by any of the simple heuristics can also be proved terminating by the type-based heuristic.

Tool YES MAYBE TIMEOUT

AProVE tb' 232 [6.3] 57 [19.1] 7 Full Table

AProVE om' 219 [6.1] 68 [18.4] 9 Full Table

AProVE im 196 [6.0] 89 [19.1] 11 Full Table

Tool	YES	MAYBE	TIMEOUT
AProVE tb'	232 [6.3]	57 [19.1]	7	Full Table
AProVE om'	219 [6.1]	68 [18.4]	9	Full Table
AProVE im	196 [6.0]	89 [19.1]	11	Full Table

So far, for all experiments we used Algorithm 2 in order to compute a refined argument filter from the initial one. To evaluate the advantage of this improved algorithm over Algorithm 1, we performed experiments with the two algorithms (again using the type-based refinement heuristic tb'. The following table shows that Algorithm 2 is indeed significantly more powerful than Algorithm 1.

Tool YES MAYBE TIMEOUT

AProVE Algorithm 2 232 [6.3] 57 [19.1] 7 Full Table

AProVE Algorithm 1 212 [5.6] 74 [17.8] 0 Full Table

Tool	YES	MAYBE	TIMEOUT
AProVE Algorithm 2	232 [6.3]	57 [19.1]	7	Full Table
AProVE Algorithm 1	212 [5.6]	74 [17.8]	0	Full Table

Preliminary versions of parts of this paper appeared in the proceedings of LOPSTR '06. However, the table below clearly shows that the new results of this paper improve the power of termination analysis substantially. To this end, we compare our new implementation that uses the improved type-based refinement heuristic (tb') and the improved refinement algorithm (Algorithm 2) with the version of AProVE from the termination competition 2006 that only contains the results of the preliminary paper. To find argument filters, it uses a simple ad-hoc heuristic which turns out to be clearly disadvantageous to the new sophisticated techniques from this paper.

Tool YES MAYBE TIMEOUT

AProVE tb' 232 [6.3] 57 [19.1] 7 Full Table

AProVE [LOPSTR '06] 208 [3.9] 69 [15.4] 19 Full Table