Empirical Evaluation of "SAT Solving for Termination Analysis with Polynomial Interpretations"

Polynomial interpretations are one of the most popular techniques for automated termination analysis. The automated search for such interpretations is a two-stage process. First, termination problems are expressed as constraints over terms and translated into Diophantine constraints over the naturals. We refer to this stage as the front-end. Second, a search algorithm is used to find a satisfying solution for these Diophantine constraints. We call this the back-end.

Finding a solution for Diophantine constraints is in general undecidable. Even if one restricts the variables in these constraints to a finite domain (e.g., all naturals less than 4), the problem is still NP-complete. Therefore, the search for such interpretations is a main bottleneck in most termination provers. In the paper SAT Solving for Termination Analysis with Polynomial Interpretations we present a novel encoding of this NP-complete problem to the satisfiability problem of propositional formulas. By applying modern SAT solvers we obtain speedups in orders of magnitude compared to dedicated search algorithms.

In our experiments, we want to assess the power and efficiency of our new SAT-based back-ends. Therefore, we compare them to existing non-SAT-based back-ends.

Implementation in AProVE

We integrated our approach in the termination tool AProVE. A special version of AProVE that allows to repeat the experiments below can be accessed via a special web interface.

Tools

We compare our two SAT-based implementations (AProVE-SAT and AProVE-PB) with four existing non-SAT-based implementations of polynomial orderings. For three of these (AProVE 1.2, AProVE-CiME, and AProVE-CLP) we use the same front-end as for the SAT-based implementation, but the back-ends are different. The fourth uses both the front-end and the back-end of the TTT termination analyzer.

AProVE-SAT 2.0-prealpha3:
The Diophantine inequality constraints are transformed into propositional SAT problems as described in the paper. We use either MiniSAT or SAT4J for the SAT solving. The latter is always used to transform general propositional formulas into CNF using Tseitin's algorithm. In principle, any SAT solver which can test satisfiability of CNFs given in DIMACS format can be used by AProVE.
AProVE-PB 2.0-prealpha3:
The Diophantine inequality constraints are transformed into pseudo-Boolean constraints and then linearized using Fortet's linearization scheme. The linear PB constraints are solved by either the Pueblo PB-SAT solver or by MiniSAT+.
AProVE 1.2:
The Diophantine inequality constraints are solved by a state-of-the-art finite domain constraint solver. This is based on the specialized finite domain constraint satisfaction procedure of Contejean et al. For negative polynomial interpretations, a specialized trial-and-error search is performed.
AProVE-CiME 2.0-prealpha3:
The Diophantine inequality constraints are solved by the popular Diophantine solver of the CiME tool. This solver is also based on the Contejean algorithm for finite domain constraint solving.
AProVE-CLP 2.0-prealpha3:
The Diophantine inequality constraints are transformed into a constraint logic program. This is then handled by the CLP(fd) engine of SICStus Prolog.
TTT:
The Tyrolean Termination Tool is a powerful and very efficient termination analyzer. Its implementation of Diophantine constraint solving is based on a generate-and-test algorithm.

All variants of AProVE described above are available at this special web interface.

Examples and Settings

In our experiments, we tested the tools on the examples of the Termination Problem Data Base (TPDB) 2006. This is the collection of problems used in the International Termination Competition 2006. It contains 865 TRSs from different sources as a benchmark for termination analysis of term rewriting.

We configured all tools to use a basic version of the DP method, where in addition to polynomial interpretations, the tools also apply the dependency graph refinement to split up sets of dependency pairs. Moreover, do not build constraints for all rules, but just for those that are "usable". For more information on these enhancements of the DP method, the reader is referred to The Dependency Pair Framework: Combining Techniques for Automated Termination Proofs. Thus, in total the strategy applied by all tools is:

Compute the dependency pairs from the TRS.
Estimate the dependency graph.
For each maximal strongly connected component (SCC) of the graph:
- Generate constraints for the dependency pairs and the usable rules.
- Generate Diophantine inequalities over naturals.
- Call a solver to find a solution for the constraints.
- If there is a solution, delete the strict dependency pairs and continue with 2.

To measure the effect of the different implementations for polynomial interpretations, we did not use any other termination techniques available in the tools. Especially, we did not use any results that allow to show innermost termination instead of full termination. AProVE and TTT use slightly different techniques for estimating dependency graphs. They also have slightly different definitions of "usable rules", e.g., AProVE uses dynamic usable rules that depend on the polynomial interpretation. Thus, their performance is not directly comparable. The experiments are meant to show that there is a substantial difference between the SAT-based implementations on the one side and all other implementations on the other side. They are not meant to compare the non-SAT-based implementations against each other.

For our experiments, the tools were run on an AMD Athlon 64 at 2.2 GHz. For each example, we imposed a time limit of 60 seconds (corresponding to the way tools are evaluated in the annual competition) or of 10 minutes, indicated by the "Limit" column in the following tables.

Experiments

In the following tables, the columns "Yes" and "TO" show the number of TRSs for which proving termination with the given configuration succeeds or times out. The column "Time" gives the total time in seconds needed for analyzing all 865 examples. The column "Range" specifies the range of the coefficients of polynomials (i.e., if the "Range" is n, then we only searched for coefficients from {0, ..., n}). The column "Degree" gives the degree of the polynomials. If the "Degree" is 1, then we used linear polynomials and "sm" means that we used simple-mixed polynomials (which are not available in TTT). Here, a non-unary polynomial is simple-mixed if we have no monomials with an exponent of more than 1 while a unary polynomial is simple-mixed if it has the form a + b x₁ + c x₁².

AProVE-SAT (MiniSAT) AProVE-SAT (SAT4J) AProVE-PB (Pueblo) AProVE-PB (MiniSAT+)

Limit Range Degree Yes TO Time Yes TO Time Yes TO Time Yes TO Time

60s 1 1 421 0 45.5 421 0 27.4 421 0 61.6 421 0 55.4 [Details]

60s 2 1 431 0 91.8 431 0 89.8 431 0 158.5 431 1 170.5 [Details]

60s 3 1 434 0 118.6 434 0 158.6 434 1 222.1 434 2 237.5 [Details]

60s 3 sm 440 51 5585.9 414 124 10344.4 427 82 7280.3 434 82 7254.5 [Details]

10m 1 1 421 0 45.5 421 0 27.4 421 0 61.6 421 0 55.4 [Details]

10m 2 1 431 0 91.8 431 0 89.8 431 0 158.5 431 1 710.5 [Details]

10m 3 1 434 0 118.6 434 0 158.6 434 0 689.6 434 1 817.5 [Details]

AProVE 1.2 AProVE-CLP AProVE-CiME TTT

Limit Range Degree Yes TO Time Yes TO Time Yes TO Time Yes TO Time

60s 1 1 421 1 151.8 420 16 1357.8 408 1 168.3 326 32 2568.5 [Details]

60s 2 1 414 48 3633.2 420 37 3558.3 408 43 3201.0 335 83 5677.6 [Details]

60s 3 1 408 81 5793.2 407 91 6459.5 402 67 5324.1 338 110 7426.9 [Details]

60s 3 sm 404 171 11608.1 367 145 10357.4 361 147 10107.7 n/a n/a n/a [Details]

10m 1 1 421 1 691.8 421 11 7852.2 408 0 332.7 328 16 14007.8 [Details]

10m 2 1 418 41 27888.4 423 25 18795.6 412 33 22190.4 337 68 45046.6 [Details]

10m 3 1 415 53 38286.4 420 51 41493.8 407 46 33873.6 340 91 61209.2 [Details]

For details on individual examples and runtimes please click on [Details] in the right-most column.

	AProVE-SAT (MiniSAT)	AProVE-SAT (SAT4J)	AProVE-PB (Pueblo)	AProVE-PB (MiniSAT+)
Limit	Range	Degree	Yes	TO	Time	Yes	TO	Time	Yes	TO	Time	Yes	TO	Time
60s	1	1	421	0	45.5	421	0	27.4	421	0	61.6	421	0	55.4	[Details]
60s	2	1	431	0	91.8	431	0	89.8	431	0	158.5	431	1	170.5	[Details]
60s	3	1	434	0	118.6	434	0	158.6	434	1	222.1	434	2	237.5	[Details]
60s	3	sm	440	51	5585.9	414	124	10344.4	427	82	7280.3	434	82	7254.5	[Details]
10m	1	1	421	0	45.5	421	0	27.4	421	0	61.6	421	0	55.4	[Details]
10m	2	1	431	0	91.8	431	0	89.8	431	0	158.5	431	1	710.5	[Details]
10m	3	1	434	0	118.6	434	0	158.6	434	0	689.6	434	1	817.5	[Details]
	AProVE 1.2	AProVE-CLP	AProVE-CiME	TTT
Limit	Range	Degree	Yes	TO	Time	Yes	TO	Time	Yes	TO	Time	Yes	TO	Time
60s	1	1	421	1	151.8	420	16	1357.8	408	1	168.3	326	32	2568.5	[Details]
60s	2	1	414	48	3633.2	420	37	3558.3	408	43	3201.0	335	83	5677.6	[Details]
60s	3	1	408	81	5793.2	407	91	6459.5	402	67	5324.1	338	110	7426.9	[Details]
60s	3	sm	404	171	11608.1	367	145	10357.4	361	147	10107.7	n/a	n/a	n/a	[Details]
10m	1	1	421	1	691.8	421	11	7852.2	408	0	332.7	328	16	14007.8	[Details]
10m	2	1	418	41	27888.4	423	25	18795.6	412	33	22190.4	337	68	45046.6	[Details]
10m	3	1	415	53	38286.4	420	51	41493.8	407	46	33873.6	340	91	61209.2	[Details]
	For details on individual examples and runtimes please click on [Details] in the right-most column.

The comparison of the SAT-based configurations AProVE-SAT and AProVE-PB with the non-SAT-based configurations shows that the provers based on SAT solving with our proposed encoding are faster by orders of magnitude. This holds in particular if one considers a higher time limit or polynomials with higher coefficients or degrees (which are needed to increase the number of "Yes"-results, i.e., to increase the power of automated termination proving). Note that if "Degree" is 1, there are no time-outs in the configuration AProVE-SAT, whereas the non-SAT-based configurations have many time-outs. Due to the increased efficiency, the number of examples where termination can be proved within the time limit is considerably higher in the SAT-based configurations. Comparing the SAT-based configurations AProVE-SAT and AProVE-PB shows that the approach of converting termination problems to propositional formulas is currently preferable to the related approach of converting them to pseudo-boolean constraints.

For efficiency, our implementation uses several optimizations:

(a) Simplification: In addition to standard simplifications for Diophantine constraints and for propositional formulas, we developed a new graph-based approach to detect possible simplifications of Diophantine constraints quickly.

(b) Sharing: We use sharing for common subexpressions, both on the level of Diophantine constraints and on the level of propositional formulas.

(c) Tracking maximum values: By taking into account that Diophantine variables are only instantiated by values from a certain set (e.g., {0, ..., 2^k-1}), one can keep track of the maximum possible values for all polynomials occurring in the Diophantine constraint. This can help to improve the conversion from Diophantine constraints to tuples of propositional formulas. The reason is that we can detect cases where the most significant bits are equivalent to 0.

The next table shows the effect of our optimizations (with linear polynomials and a 60 s time limit). While the configuration AProVE-SAT uses all optimizations (a) - (c), we also give the results obtained if one omits any one of these optimizations. The table demonstrates that each optimization has a considerable positive effect, in particular if one regards higher ranges for the coefficients.

AProVE-SAT (MiniSAT) no optimization (a) no optimization (b) no optimization (c)

Range Yes TO Time Yes TO Time Yes TO Time Yes TO Time

1 421 0 45.5 421 0 56.6 421 0 49.7 421 0 50.1 [Details]

2 431 0 91.8 431 0 107.5 431 0 93.9 431 0 114.7 [Details]

3 434 0 118.6 434 1 159.4 434 0 202.8 434 0 138.7 [Details]

For details on individual examples and runtimes please click on [Details] in the right-most column.

	AProVE-SAT (MiniSAT)	no optimization (a)	no optimization (b)	no optimization (c)
Range	Yes	TO	Time	Yes	TO	Time	Yes	TO	Time	Yes	TO	Time
1	421	0	45.5	421	0	56.6	421	0	49.7	421	0	50.1	[Details]
2	431	0	91.8	431	0	107.5	431	0	93.9	431	0	114.7	[Details]
3	434	0	118.6	434	1	159.4	434	0	202.8	434	0	138.7	[Details]
	For details on individual examples and runtimes please click on [Details] in the right-most column.

The last table demonstrates the use of SAT solving when handling negative polynomials with a time limit of 60 seconds. Here, if the "Range" is n, then the constant coefficient is allowed to take values from {-n, ..., n}.

AProVE-SAT (MiniSAT) AProVE 1.2 TTT

Range Yes TO Time Yes TO Time Yes TO Time

1 440 0 98.0 441 22 1863.7 341 106 7307.3 [Details]

2 479 1 305.4 460 126 8918.3 360 181 12337.3 [Details]

3 483 4 1092.4 434 221 15570.9 361 247 16927.7 [Details]

For details on individual examples and runtimes please click on [Details] in the right-most column.

	AProVE-SAT (MiniSAT)	AProVE 1.2	TTT
Range	Yes	TO	Time	Yes	TO	Time	Yes	TO	Time
1	440	0	98.0	441	22	1863.7	341	106	7307.3	[Details]
2	479	1	305.4	460	126	8918.3	360	181	12337.3	[Details]
3	483	4	1092.4	434	221	15570.9	361	247	16927.7	[Details]
	For details on individual examples and runtimes please click on [Details] in the right-most column.

Again, the SAT-based configuration is much faster and substantially more powerful than the non-SAT-based ones. Compared to the results for non-negative polynomials, a few timeouts occur for larger ranges, but negative polynomials increase the power significantly whereas the runtimes only increase moderately.

The SAT-based implementation of polynomial interpretations was used by AProVE in the International Competition of Termination Tools 2006. Here, AProVE was configured to use several other termination techniques in addition to polynomial interpretations. Due to the speed of our new SAT-based approach, AProVE could try polynomial interpretations (also with higher ranges) as one of the first termination techniques. In case of failure, there was still enough time to try other termination techniques afterwards. With a time limit of 60s for each example, AProVE could prove termination of 633 TRSs and thereby it was the winner of the competition. In this way, the contributions of this paper increase not only the efficiency, but also the power of automated termination proving.

To summarize, automated termination analysis is indeed a field where SAT solving has turned out to be extremely useful. At the same time, this field also poses new challenges for SAT solving, since for higher ranges and higher degrees of the polynomials, one sometimes obtains SAT problems which are hard for current SAT solvers (e.g., problems with less than 9000 variables but several days of runtime). We have therefore submitted some of these problems to the SAT competition 2007.