Empirical Evaluation of "Lazy Abstraction for Size-Change Termination"

Size-change termination is a widely used means of proving termination where source programs are first abstracted to size-change graphs which are then analyzed to determine if they satisfy the size-change termination property. Here, the choice of the abstraction is crucial to the success of the method, and it is an open problem how to choose an abstraction such that no critical loss of precision occurs. In the paper "Lazy Abstraction for Size-Change Termination" we show how to couple the search for a suitable abstraction and the test for size-change termination via an encoding to a single SAT instance. In this way, the problem of choosing the right abstraction is solved en passant by a SAT solver. We show that for the setting of term rewriting, the integration of this approach into the dependency pair framework works smoothly and gives rise to a new class of size-change reduction pairs.

In our experiments, we want to assess the power of size-change reduction pairs by extensive experiments on a large collection of examples from the literature and from applications. Moreover, we compare our new approach of size-change termination in the term rewriting context to the previous work of Thiemann and Giesl.

Implementation in AProVE

We have implemented our contributions in the automated termination prover AProVE. To experiment with this new version of AProVE, please use the special web interface.

Tools

To assess the impact of our contributions, in our experiments we compare three configurations of AProVE.

AProVE SCNP
In the first, we use SCNP reduction pairs in the reduction pair processor of the DP framework. This configuration is parameterized by the choice whether we allow just max comparisons of multisets or all four multiset extensions from Def. 7 of our paper. Moreover, the configuration is also parameterized by the choice whether we use classical size-change graphs or extended size-change graphs. In an extended size-change graph, to compare s = F(s₁,...,s_n) with t = G(t₁,...,t_m), the source and target vertices {s₁, ..., s_n} and {t₁, ..., t_m} are extended by additional vertices s and t, respectively. Now an edge from s to t_j indicates that the whole term s is greater (or equal) to t_j, etc. So these additional vertices also allow us to compare the whole terms s and t. By adding these vertices, size-change termination incorporates the standard comparison of terms as well.
AProVE reduction pairs:
In the second configuration, we use the base orders directly in the reduction pair processor (i.e., here we disregard SCNP reduction pairs).
AProVE SCT:
In the third configuration, we use the implementation of the SCT method as described in previous work of Thiemann and Giesl. For a fair comparison, we updated that old implementation from the DP approach to the modular DP framework and used SAT encodings for the base orders. While this approach only uses the embedding order and argument filters as the base order for the construction of size-change graphs, it uses more complex orders (containing the base order) to weakly orient the rules from the TRS.

Examples and Settings

In our experiments, we considered all 1381 examples from the standard TRS category of the Termination Problem Data Base (TPDB version 7.0.2) as used in the International Termination Competition 2009.

The experiments were run on a 2.66 GHz Intel Core 2 Quad and as in the termination competition, we used a time limit of 60 seconds per example. We applied SAT4J to transform propositional formulas to conjunctive normal form and the SAT solver MiniSAT2 to check the satisfiability of the resulting formulas.

Experiments

The experiments in the table below compare the power and runtimes of the three configurations depending on the choice of the base order. The column "order" indicates the base order: embedding order with argument filters (EMB), lexicographic path order with arbitrary permutations and argument filters (LPO), recursive path order with argument filters (RPO), and linear polynomial interpretations with coefficients from {0,1} (POLO). For the first configuration, we used three different settings: full SCNP reduction pairs with extended size-change graphs ("SCNP all"), SCNP reduction pairs restricted to max comparisons with extended size-change graphs ("SCNP max"), and SCNP reduction pairs restricted to max comparisons and non-extended size-change graphs ("SCNP fast"). The second and third configuration are called "reduction pairs" and "SCT", respectively. For each experiment, we give the number of TRSs which could be proved terminating ("proved") and the analysis time in seconds for running AProVE on all 1381 TRSs ("runtime"). The "best" numbers are always printed in bold. Click on the column headers to view a detailed table with the individual results of the column. Click on "[Details]" to the right of a row to view a detailed table with the individual results of the row.

order SCNP fast SCNP max SCNP all reduction pairs SCT

EMB proved
runtime 346
2882.6 346
3306.4 347
3628.5 325
2891.3 341
10065.4 [Details]

LPO proved
runtime 500
3093.7 530
5985.5 527
7739.2 505
3698.4 385
10015.5 [Details]

RPO proved
runtime 501
3222.2 531
6384.1 531
8118.0 527
4027.5 385
10053.4 [Details]

POLO proved
runtime 477
3153.6 514
5273.6 514
7124.4 511
2941.7 378
9974.0 [Details]

order		SCNP fast	SCNP max	SCNP all	reduction pairs	SCT
EMB	proved runtime	346 2882.6	346 3306.4	347 3628.5	325 2891.3	341 10065.4	[Details]
LPO	proved runtime	500 3093.7	530 5985.5	527 7739.2	505 3698.4	385 10015.5	[Details]
RPO	proved runtime	501 3222.2	531 6384.1	531 8118.0	527 4027.5	385 10053.4	[Details]
POLO	proved runtime	477 3153.6	514 5273.6	514 7124.4	511 2941.7	378 9974.0	[Details]

The table allows for the following main observations:

Our SCNP reduction pairs are much more powerful and significantly faster than the implementation of Thiemann and Giesl's approach (column "SCT"). By integrating the search for the base order with SCNP, our new implementation can use a much larger class of base orders and thus, SCNP reduction pairs can prove significantly more examples. The reason for the relatively low speed of "SCT" is that this approach iterates through argument filters and then generates and analyzes size-change graphs for each of these argument filters. (So the low speed is not due to the repeated composition of size-change graphs in the SCT criterion.)
Our new implementation of SCNP reduction pairs is more powerful than using the reduction pairs directly. Note that when using extended size-change graphs, every reduction pair can be simulated by an SCNP reduction pair.
SCNP reduction pairs add significant power when used for simple orders like EMB and LPO. The difference is less dramatic for RPO and POLO. Intuitively, the reason is that SCNP allows for multiset comparisons which are lacking in EMB and LPO, while RPO contains multiset comparisons and POLO can often simulate them. Nevertheless, SCNP also adds some power to RPO and POLO, e.g., by extending them by a concept like "maximum". This even holds for more powerful base orders like matrix orders. See Appendix B of the paper for a TRS where all existing termination tools fail, but where termination can easily be proved automatically by an SCNP reduction pair with a matrix base order.

To summarize, the implementation in AProVE confirms the usefulness of our contribution. Our experiments indicate that the automation of our technique is more powerful than both the direct use of reduction pairs and the SCT adaptation from previous work.