Empirical Evaluation of "SAT Solving for Termination Analysis with Polynomial Interpretations"

Polynomial interpretations are one of the most popular techniques for automated termination analysis. The automated search for such interpretations is a two-stage process. First, termination problems are expressed as constraints over terms and translated into Diophantine constraints over the naturals. We refer to this stage as the front-end. Second, a search algorithm is used to find a satisfying solution for these Diophantine constraints. We call this the back-end.

Finding a solution for Diophantine constraints is in general undecidable. Even if one restricts the variables in these constraints to a finite domain (e.g., all naturals less than 4), the problem is still NP-complete. Therefore, the search for such interpretations is a main bottleneck in most termination provers. In the paper SAT Solving for Termination Analysis with Polynomial Interpretations we present a novel encoding of this NP-complete problem to the satisfiability problem of propositional formulas. By applying modern SAT solvers we obtain speedups in orders of magnitude compared to dedicated search algorithms.

In our experiments, we want to assess the power and efficiency of our new SAT-based back-ends. Therefore, we compare them to existing non-SAT-based back-ends.


Implementation in AProVE

We integrated our approach in the termination tool AProVE. A special version of AProVE that allows to repeat the experiments below can be accessed via a special web interface.


Tools

We compare our two SAT-based implementations (AProVE-SAT and AProVE-PB) with four existing non-SAT-based implementations of polynomial orderings. For three of these (AProVE 1.2, AProVE-CiME, and AProVE-CLP) we use the same front-end as for the SAT-based implementation, but the back-ends are different. The fourth uses both the front-end and the back-end of the TTT termination analyzer. All variants of AProVE described above are available at this special web interface.

Examples and Settings

In our experiments, we tested the tools on the examples of the Termination Problem Data Base (TPDB) 2006. This is the collection of problems used in the International Termination Competition 2006. It contains 865 TRSs from different sources as a benchmark for termination analysis of term rewriting.

We configured all tools to use a basic version of the DP method, where in addition to polynomial interpretations, the tools also apply the dependency graph refinement to split up sets of dependency pairs. Moreover, do not build constraints for all rules, but just for those that are "usable". For more information on these enhancements of the DP method, the reader is referred to The Dependency Pair Framework: Combining Techniques for Automated Termination Proofs. Thus, in total the strategy applied by all tools is:

  1. Compute the dependency pairs from the TRS.
  2. Estimate the dependency graph.
  3. For each maximal strongly connected component (SCC) of the graph:
To measure the effect of the different implementations for polynomial interpretations, we did not use any other termination techniques available in the tools. Especially, we did not use any results that allow to show innermost termination instead of full termination. AProVE and TTT use slightly different techniques for estimating dependency graphs. They also have slightly different definitions of "usable rules", e.g., AProVE uses dynamic usable rules that depend on the polynomial interpretation. Thus, their performance is not directly comparable. The experiments are meant to show that there is a substantial difference between the SAT-based implementations on the one side and all other implementations on the other side. They are not meant to compare the non-SAT-based implementations against each other.

For our experiments, the tools were run on an AMD Athlon 64 at 2.2 GHz. For each example, we imposed a time limit of 60 seconds (corresponding to the way tools are evaluated in the annual competition) or of 10 minutes, indicated by the "Limit" column in the following tables.


Experiments

In the following tables, the columns "Yes" and "TO" show the number of TRSs for which proving termination with the given configuration succeeds or times out. The column "Time" gives the total time in seconds needed for analyzing all 865 examples. The column "Range" specifies the range of the coefficients of polynomials (i.e., if the "Range" is n, then we only searched for coefficients from {0, ..., n}). The column "Degree" gives the degree of the polynomials. If the "Degree" is 1, then we used linear polynomials and "sm" means that we used simple-mixed polynomials (which are not available in TTT). Here, a non-unary polynomial is simple-mixed if we have no monomials with an exponent of more than 1 while a unary polynomial is simple-mixed if it has the form a + b x1 + c x12.

AProVE-SAT (MiniSAT) AProVE-SAT (SAT4J) AProVE-PB (Pueblo) AProVE-PB (MiniSAT+)
LimitRangeDegree YesTOTime YesTOTime YesTOTime YesTOTime
60s 1 1 4210 45.5 4210 27.4 4210 61.6 4210 55.4 [Details]
60s 2 1 4310 91.8 4310 89.8 4310158.5 4311 170.5 [Details]
60s 3 1 4340 118.6 4340 158.6 4341222.1 4342 237.5 [Details]
60s 3 sm 440515585.9 414124 10344.4 427827280.3 43482 7254.5 [Details]
10m 1 1 4210 45.5 4210 27.4 4210 61.6 4210 55.4 [Details]
10m 2 1 4310 91.8 4310 89.8 4310158.5 4311 710.5 [Details]
10m 3 1 4340 118.6 4340 158.6 4340689.6 4341 817.5 [Details]
AProVE 1.2 AProVE-CLP AProVE-CiME TTT
LimitRangeDegree YesTOTime YesTOTime YesTOTime YesTOTime
60s 1 1 4211 151.8 420161357.8 4081 168.3 32632 2568.5 [Details]
60s 2 1 414483633.2 420373558.3 408433201.0 33583 5677.6 [Details]
60s 3 1 408815793.2 407916459.5 402675324.1 338110 7426.9 [Details]
60s 3 sm 40417111608.1 36714510357.4 36114710107.7 n/an/an/a [Details]
10m 1 1 4211 691.8 421117852.2 4080 332.7 32816 14007.8 [Details]
10m 2 1 4184127888.4 4232518795.6 4123322190.4 33768 45046.6 [Details]
10m 3 1 4155338286.4 4205141493.8 4074633873.6 3409161209.2 [Details]
For details on individual examples and runtimes please click on [Details] in the right-most column.

The comparison of the SAT-based configurations AProVE-SAT and AProVE-PB with the non-SAT-based configurations shows that the provers based on SAT solving with our proposed encoding are faster by orders of magnitude. This holds in particular if one considers a higher time limit or polynomials with higher coefficients or degrees (which are needed to increase the number of "Yes"-results, i.e., to increase the power of automated termination proving). Note that if "Degree" is 1, there are no time-outs in the configuration AProVE-SAT, whereas the non-SAT-based configurations have many time-outs. Due to the increased efficiency, the number of examples where termination can be proved within the time limit is considerably higher in the SAT-based configurations. Comparing the SAT-based configurations AProVE-SAT and AProVE-PB shows that the approach of converting termination problems to propositional formulas is currently preferable to the related approach of converting them to pseudo-boolean constraints.

For efficiency, our implementation uses several optimizations:

(a) Simplification: In addition to standard simplifications for Diophantine constraints and for propositional formulas, we developed a new graph-based approach to detect possible simplifications of Diophantine constraints quickly.

(b) Sharing: We use sharing for common subexpressions, both on the level of Diophantine constraints and on the level of propositional formulas.

(c) Tracking maximum values: By taking into account that Diophantine variables are only instantiated by values from a certain set (e.g., {0, ..., 2k-1}), one can keep track of the maximum possible values for all polynomials occurring in the Diophantine constraint. This can help to improve the conversion from Diophantine constraints to tuples of propositional formulas. The reason is that we can detect cases where the most significant bits are equivalent to 0.

The next table shows the effect of our optimizations (with linear polynomials and a 60 s time limit). While the configuration AProVE-SAT uses all optimizations (a) - (c), we also give the results obtained if one omits any one of these optimizations. The table demonstrates that each optimization has a considerable positive effect, in particular if one regards higher ranges for the coefficients.

AProVE-SAT (MiniSAT) no optimization (a) no optimization (b) no optimization (c)
Range YesTOTime YesTOTime YesTOTime YesTOTime
1 421045.5 421056.6 421049.7 421050.1 [Details]
2 431091.8 4310107.5 431093.9 4310114.7 [Details]
3 4340118.6 4341159.4 4340202.8 4340138.7 [Details]
For details on individual examples and runtimes please click on [Details] in the right-most column.

The last table demonstrates the use of SAT solving when handling negative polynomials with a time limit of 60 seconds. Here, if the "Range" is n, then the constant coefficient is allowed to take values from {-n, ..., n}.

AProVE-SAT (MiniSAT) AProVE 1.2 TTT
Range YesTOTime YesTOTime YesTOTime
1 4400 98.0 441221863.7 3411067307.3 [Details]
2 4791 305.4 4601268918.3 36018112337.3 [Details]
3 4834 1092.4 43422115570.9 36124716927.7 [Details]
For details on individual examples and runtimes please click on [Details] in the right-most column.

Again, the SAT-based configuration is much faster and substantially more powerful than the non-SAT-based ones. Compared to the results for non-negative polynomials, a few timeouts occur for larger ranges, but negative polynomials increase the power significantly whereas the runtimes only increase moderately.

The SAT-based implementation of polynomial interpretations was used by AProVE in the International Competition of Termination Tools 2006. Here, AProVE was configured to use several other termination techniques in addition to polynomial interpretations. Due to the speed of our new SAT-based approach, AProVE could try polynomial interpretations (also with higher ranges) as one of the first termination techniques. In case of failure, there was still enough time to try other termination techniques afterwards. With a time limit of 60s for each example, AProVE could prove termination of 633 TRSs and thereby it was the winner of the competition. In this way, the contributions of this paper increase not only the efficiency, but also the power of automated termination proving.

To summarize, automated termination analysis is indeed a field where SAT solving has turned out to be extremely useful. At the same time, this field also poses new challenges for SAT solving, since for higher ranges and higher degrees of the polynomials, one sometimes obtains SAT problems which are hard for current SAT solvers (e.g., problems with less than 9000 variables but several days of runtime). We have therefore submitted some of these problems to the SAT competition 2007.