Empirical Evaluation of "SAT Solving for Argument Filterings"

In the paper SAT Solving for Argument Filterings we present a novel encoding from LPO with argument filterings to propositional logic. This extends the results of Solving Partial Order Constraints for LPO Termination from pure LPO to the more practically relevant setting of LPO with argument filterings as found in the widely used DP Framework. In our experiments, we want to assess the power of our new SAT-based implementation of LPO with argument filterings. Therefore, we compare our implementation to existing non-SAT-based implementations.


Implementation in AProVE

We integrated our approach in the termination tool AProVE which implements the DP framework. A special version of AProVE that allows to repeat the experiments below can be accessed via a special web interface.


Tools

We compare our SAT-based implementation with the only two other implementations of LPO with argument filterings that we are aware of:

Examples

In our experiments, we tested the three implementations on the examples of the Termination Problem Data Base (TPDB). This is the collection of problems used in the annual termination competition. It contains 773 TRSs from different sources as a benchmark for termination analysis of term rewriting.


Experiments

Apart from the reduction pair processor, we also used the dependency graph processor, which is the other main processor of the dependency pair framework. This processor is used to split up dependency pair problems into smaller ones. As AProVE and TTT use slightly different techniques for estimating dependency graphs in the dependency graph processor and they run on different machines, their runtimes are not directly comparable.

The following tables summarize the results using the DP processors based on Theorem 5 and Theorem 12 respectively. The tools are indicated as: TTT, APR (AProVE) and SAT (AProVE with our SAT-based encoding).

For each of the experiments we consider reduction pairs based on strict- and quasi-LPO. Each of the experiments was performed with a time-out of 60 seconds (corresponding to the way tools are evaluated in the annual competition) and with a time-out of 10 minutes. We indicate by "Yes", "Fail", and "RL" the number of TRSs for which proving termination with the given technique succeeds, fails, or encounters a resource limit (time-out or exhausts memory). Finally, we give the total time in seconds for analyzing all 773 examples. Individual runtimes and proof details are available by clicking on "full version".

The first table compares all three implementations for strict--LPO using the processor of Theorem 5.

1LPO - 60 seconds timeoutLPO - 10 minutes timeout
ToolYesFailRLTimeYesFailRLTime
TTT268 448 574201.8269 465 3928030.4
APR310 358 1056936.1310 365 9860402.3
SAT327 446 081.7327 446 081.7
For details on individual examples and runtimes please see the full version.

The next table compares all three implementations for quasi--LPO using the processor of Theorem 5.

2QLPO - 60 seconds timeoutQLPO - 10 minutes timeout
ToolYesFailRLTimeYesFailRLTime
TTT297 395 816240.8297 408 6843539.5
APR320 331 1227913.1326 341 10667764.1
SAT359 414 0182.6359 414 0182.6
For details on individual examples and runtimes please see the full version.

The comparison of the corresponding SAT-based and non-SAT-based configurations in the above tables shows that the analyzers based on SAT solving with our proposed encoding are faster by orders of magnitude. Moreover, the power (i.e., the number of examples where termination can be proved) also increases substantially in the SAT based configurations. It is also interesting to note that there are almost no time-outs in the SAT-based configurations, whereas the non-SAT-based configurations have many time-outs.

To evaluate the optimizations on page 12, we also tested the SAT-based configuration with strict-LPO and the 10-minute time-out in a version where optimizations (2) and (3) are switched off. Here, the total runtime increases from 81.7 to 1967.5. Thus, optimizations (2) and (3) already decrease the total runtime by a factor of more than 20.

The following two tables provide results using the improved reduction pair processor of Theorem 12. Again, the SAT-based configuration is much faster than the corresponding non-SAT-based one.

Table 3 compares AProVE and the new SAT-based implementation for strict-LPO using the processor of Theorem 12.

3LPO - 60 seconds timeoutLPO - 10 minutes timeout
ToolYesFailRLTimeYesFailRLTime
APR338 368 674776.8341 383 4933328.7
SAT348 425 082.3348 425 082.3
For details on individual examples and runtimes please see the full version.

The fourth table compares AProVE and the new SAT-based implementation for quasi-LPO using the processor of Theorem 12.

4QLPO - 60 seconds timeoutQLPO - 10 minutes timeout
ToolYesFailRLTimeYesFailRLTime
APR357 323 936100.3359 336 7849934.1
SAT380 393 0192.8380 393 0192.8
For details on individual examples and runtimes please see the full version.

The comparison with the first two tables shows that, replacing the processor of Theorem 5 by the one of Theorem 12 increases power significantly and for SAT-based analyses has no negative influence on runtimes.

Comparing Tables 1 and 3 to Tables 2 and 4 respectively shows that quasi-LPO is more powerful but also slower than strict-LPO. However, for the SAT-based analyses, the overall runtimes are still extremely fast in comparison to the non-SAT-based configurations.

Table 5 highlights 5 examples which could not be solved by any tool in the International Termination Competition 2005, whereas the SAT-based configuration proves termination for all 5 in a total of 4.3 seconds. In fact, except for the second example, neither TTT nor AProVE are able to prove termination in their fully automatic mode within 10 minutes. The third example could have been proven terminating by TTT or AProVE if they had used a different argument filtering algorithm respectively employed a strategy which applies LPO earlier. Both tools did not do this in their fully automatic mode, though, due to efficiency considerations. This demonstrates that our encoding advances the state of the art of automated termination analysis. The columns labeled TTT, APR, and SAT indicate for the three tools analysis times in seconds (including parsing, producing proofs, computing dependency graphs, etc.) and "t/o" indicates a 10 minute timeout. For each of the examples and tools the time indicated is for the fastest configuration from those described in Tables 1 to 4. For the second and third example, TTT's "divide-and-conquer"-algorithm times out, but its "enumeration"-algorithm (which is usually less efficient) finds a solution within 10 minutes. Therefore, here the runtimes are given in brackets.

The last four columns give details for the largest CNF which occurred during the termination proof with SAT (ranging over all dependency pair problems encountered). Columns 4 and 5 indicate the number of clauses and the number of literals of this CNF while Columns 6 and 7 indicate the time (in milliseconds) for encoding to propositional logic and for SAT solving.

5TTTAPRSAT# clauses# literalsencod. timeSAT time
Ex26_Luc03b_Zt/ot/o 1.1512462320279048
Ex2_Luc02a_C (476.8)t/o 0.69 84782120013720
Ex49_GM04_C ( 25.8) 44.4 0.81 70401763821216
ExSec11_1_Luc02a_C t/o t/o 0.78109682826514512
ExSec11_1_Luc02a_GMt/o t/o 0.87197825060815572