Empirical Evaluation of "SAT Solving for Argument Filterings"

In the paper SAT Solving for Argument Filterings we present a novel encoding from LPO with argument filterings to propositional logic. This extends the results of Solving Partial Order Constraints for LPO Termination from pure LPO to the more practically relevant setting of LPO with argument filterings as found in the widely used DP Framework. In our experiments, we want to assess the power of our new SAT-based implementation of LPO with argument filterings. Therefore, we compare our implementation to existing non-SAT-based implementations.

Implementation in AProVE

We integrated our approach in the termination tool AProVE which implements the DP framework. A special version of AProVE that allows to repeat the experiments below can be accessed via a special web interface.

Tools

We compare our SAT-based implementation with the only two other implementations of LPO with argument filterings that we are aware of:

AProVE 2.0-prealpha2:
This is a pre-alpha version of AProVE 2.0 which is available through our web interface.
TTT:
The Tyrolean Termination Tool is a powerful and very efficient termination analyzer. It features two algorithms for LPO with argument filterings, one called "enumeration" and another termed "divide-and-conquer". We ran the experiments below with both algorithms. The divide-and-conquer-algorithm turned out to be virtually always more efficient and, thus, we abstain from presenting both. The presented results generalize to both algorithms.

Examples

In our experiments, we tested the three implementations on the examples of the Termination Problem Data Base (TPDB). This is the collection of problems used in the annual termination competition. It contains 773 TRSs from different sources as a benchmark for termination analysis of term rewriting.

Experiments

Apart from the reduction pair processor, we also used the dependency graph processor, which is the other main processor of the dependency pair framework. This processor is used to split up dependency pair problems into smaller ones. As AProVE and TTT use slightly different techniques for estimating dependency graphs in the dependency graph processor and they run on different machines, their runtimes are not directly comparable.

The following tables summarize the results using the DP processors based on Theorem 5 and Theorem 12 respectively. The tools are indicated as: TTT, APR (AProVE) and SAT (AProVE with our SAT-based encoding).

For each of the experiments we consider reduction pairs based on strict- and quasi-LPO. Each of the experiments was performed with a time-out of 60 seconds (corresponding to the way tools are evaluated in the annual competition) and with a time-out of 10 minutes. We indicate by "Yes", "Fail", and "RL" the number of TRSs for which proving termination with the given technique succeeds, fails, or encounters a resource limit (time-out or exhausts memory). Finally, we give the total time in seconds for analyzing all 773 examples. Individual runtimes and proof details are available by clicking on "full version".

The first table compares all three implementations for strict--LPO using the processor of Theorem 5.

1 LPO - 60 seconds timeout LPO - 10 minutes timeout

Tool Yes Fail RL Time Yes Fail RL Time

TTT 268 448 57 4201.8 269 465 39 28030.4

APR 310 358 105 6936.1 310 365 98 60402.3

SAT 327 446 0 81.7 327 446 0 81.7

For details on individual examples and runtimes please see the full version.

1	LPO - 60 seconds timeout		LPO - 10 minutes timeout
Tool	Yes	Fail	RL	Time	Yes	Fail	RL	Time
TTT	268	448	57	4201.8	269	465	39	28030.4
APR	310	358	105	6936.1	310	365	98	60402.3
SAT	327	446	0	81.7	327	446	0	81.7
	For details on individual examples and runtimes please see the full version.

The next table compares all three implementations for quasi--LPO using the processor of Theorem 5.

2 QLPO - 60 seconds timeout QLPO - 10 minutes timeout

Tool Yes Fail RL Time Yes Fail RL Time

TTT 297 395 81 6240.8 297 408 68 43539.5

APR 320 331 122 7913.1 326 341 106 67764.1

SAT 359 414 0 182.6 359 414 0 182.6

For details on individual examples and runtimes please see the full version.

2	QLPO - 60 seconds timeout		QLPO - 10 minutes timeout
Tool	Yes	Fail	RL	Time	Yes	Fail	RL	Time
TTT	297	395	81	6240.8	297	408	68	43539.5
APR	320	331	122	7913.1	326	341	106	67764.1
SAT	359	414	0	182.6	359	414	0	182.6
	For details on individual examples and runtimes please see the full version.

The comparison of the corresponding SAT-based and non-SAT-based configurations in the above tables shows that the analyzers based on SAT solving with our proposed encoding are faster by orders of magnitude. Moreover, the power (i.e., the number of examples where termination can be proved) also increases substantially in the SAT based configurations. It is also interesting to note that there are almost no time-outs in the SAT-based configurations, whereas the non-SAT-based configurations have many time-outs.

To evaluate the optimizations on page 12, we also tested the SAT-based configuration with strict-LPO and the 10-minute time-out in a version where optimizations (2) and (3) are switched off. Here, the total runtime increases from 81.7 to 1967.5. Thus, optimizations (2) and (3) already decrease the total runtime by a factor of more than 20.

The following two tables provide results using the improved reduction pair processor of Theorem 12. Again, the SAT-based configuration is much faster than the corresponding non-SAT-based one.

Table 3 compares AProVE and the new SAT-based implementation for strict-LPO using the processor of Theorem 12.

3 LPO - 60 seconds timeout LPO - 10 minutes timeout

Tool Yes Fail RL Time Yes Fail RL Time

APR 338 368 67 4776.8 341 383 49 33328.7

SAT 348 425 0 82.3 348 425 0 82.3

For details on individual examples and runtimes please see the full version.

3	LPO - 60 seconds timeout		LPO - 10 minutes timeout
Tool	Yes	Fail	RL	Time	Yes	Fail	RL	Time
APR	338	368	67	4776.8	341	383	49	33328.7
SAT	348	425	0	82.3	348	425	0	82.3
	For details on individual examples and runtimes please see the full version.

The fourth table compares AProVE and the new SAT-based implementation for quasi-LPO using the processor of Theorem 12.

4 QLPO - 60 seconds timeout QLPO - 10 minutes timeout

Tool Yes Fail RL Time Yes Fail RL Time

APR 357 323 93 6100.3 359 336 78 49934.1

SAT 380 393 0 192.8 380 393 0 192.8

For details on individual examples and runtimes please see the full version.

4	QLPO - 60 seconds timeout		QLPO - 10 minutes timeout
Tool	Yes	Fail	RL	Time	Yes	Fail	RL	Time
APR	357	323	93	6100.3	359	336	78	49934.1
SAT	380	393	0	192.8	380	393	0	192.8
	For details on individual examples and runtimes please see the full version.

The comparison with the first two tables shows that, replacing the processor of Theorem 5 by the one of Theorem 12 increases power significantly and for SAT-based analyses has no negative influence on runtimes.

Comparing Tables 1 and 3 to Tables 2 and 4 respectively shows that quasi-LPO is more powerful but also slower than strict-LPO. However, for the SAT-based analyses, the overall runtimes are still extremely fast in comparison to the non-SAT-based configurations.

Table 5 highlights 5 examples which could not be solved by any tool in the International Termination Competition 2005, whereas the SAT-based configuration proves termination for all 5 in a total of 4.3 seconds. In fact, except for the second example, neither TTT nor AProVE are able to prove termination in their fully automatic mode within 10 minutes. The third example could have been proven terminating by TTT or AProVE if they had used a different argument filtering algorithm respectively employed a strategy which applies LPO earlier. Both tools did not do this in their fully automatic mode, though, due to efficiency considerations. This demonstrates that our encoding advances the state of the art of automated termination analysis. The columns labeled TTT, APR, and SAT indicate for the three tools analysis times in seconds (including parsing, producing proofs, computing dependency graphs, etc.) and "t/o" indicates a 10 minute timeout. For each of the examples and tools the time indicated is for the fastest configuration from those described in Tables 1 to 4. For the second and third example, TTT's "divide-and-conquer"-algorithm times out, but its "enumeration"-algorithm (which is usually less efficient) finds a solution within 10 minutes. Therefore, here the runtimes are given in brackets.

The last four columns give details for the largest CNF which occurred during the termination proof with SAT (ranging over all dependency pair problems encountered). Columns 4 and 5 indicate the number of clauses and the number of literals of this CNF while Columns 6 and 7 indicate the time (in milliseconds) for encoding to propositional logic and for SAT solving.

5 TTT APR SAT # clauses # literals encod. time SAT time

Ex26_Luc03b_Z t/o t/o 1.15 12462 32027 90 48

Ex2_Luc02a_C (476.8) t/o 0.69 8478 21200 137 20

Ex49_GM04_C ( 25.8) 44.4 0.81 7040 17638 212 16

ExSec11_1_Luc02a_C t/o t/o 0.78 10968 28265 145 12

ExSec11_1_Luc02a_GM t/o t/o 0.87 19782 50608 155 72

5	TTT	APR	SAT	# clauses	# literals	encod. time	SAT time
Ex26_Luc03b_Z	t/o	t/o	1.15	12462	32027	90	48
Ex2_Luc02a_C	(476.8)	t/o	0.69	8478	21200	137	20
Ex49_GM04_C	( 25.8)	44.4	0.81	7040	17638	212	16
ExSec11_1_Luc02a_C	t/o	t/o	0.78	10968	28265	145	12
ExSec11_1_Luc02a_GM	t/o	t/o	0.87	19782	50608	155	72