Empirical Evaluation of "SAT-Based Termination Analysis Using Monotonicity Constraints over the Integers"

In the paper "SAT-Based Termination Analysis Using Monotonicity Constraints over the Integers" we describe an algorithm for proving termination of programs abstracted to systems of monotonicity constraints in the integer domain. Monotonicity constraints are a non-trivial extension of the well-known size-change termination method. While deciding termination for systems of monotonicity constraints is PSPACE complete, we focus on a well-defined and significant subset, which we call MCNP, designed to be amenable to a SAT-based solution. Our technique is based on the search for a global ranking function defined in terms of bounded differences between multisets of integer values. We describe the application of our approach as the back-end for the termination analysis of Java Bytecode (JBC). At the front-end, systems of monotonicity constraints are obtained by abstracting information, using two different termination analyzers: AProVE and COSTA. Preliminary results reveal that our approach provides a good trade-off between precision and cost of analysis.

Tools

In our experiments, we consider precision and performance of four termination prover configurations that work on various flavors of programs on the integer domain.

MCNP prototype
We implemented a prototype termination analyzer based on our SAT encoding for MCNP and tested it on three benchmark suites. It uses the SAT4J v2.1.0 solver as back-end for checking satisfiability of propositional formulas.
COSTA
COSTA is a cost and termination analyzer for Java Bytecode based on path-length abstraction, developed at the Politécnica University of Madrid and the Complutense University of Madrid, Spain. The results for the experiments with COSTA were kindly provided by the COSTA team.
AProVE
AProVE is a termination prover developed at RWTH Aachen University which uses Integer Term Rewrite Systems (ITRSs) as target language for translation from programming languages that work on integers, like Java Bytecode. ITRSs are term-rewrite systems where standard operations on integers are pre-defined. After this translation to ITRSs, rewrite-based techniques are then applied to prove termination of the resulting ITRSs and conclude termination of the input program.
AProVE fixed abstraction
A main feature of AProVE in its standard configuration is that it does not use a fixed abstraction, but searches for the abstraction to be used during the termination proof. To get a more direct comparison between MCNP and the rewite-based termination proving methods applied in AProVE to show termination of ITRSs, in the configuration "AProVE fixed abstraction" we switch off this search for an abstraction and only allow AProVE to use the same abstraction as the one used to generate instances for MCNP.

Benchmark Suites

To assess the impact of our contributions, we have created three benchmark suites: Suite 1, Suite 2, Suite 3

Suite 1
The first suite, of 81 instances, consists of abstractions to MCSs for a collection of textbook style C programs and a collection of MCSs abstracted manually obtained from various research papers on termination.
- ITRS
  These 32 instances are adapted from a benchmark suite collected for the empirical evaluation of Fuhs, Giesl, Plücker, Schneider-Kamp, and Falke's paper Proving Termination of Integer Term Rewriting
- MCFromC
  Based on several textbook style C programs, the abstraction to 6 MCSs is performed using a rudimentary translator developed by Avishay Ben-Shabtai and Zahi Mann at the Tel-Aviv Academic College.
- SPEED
  These 10 examples are taken from Gulwani, Mehra, and Chilimbi's article SPEED: precise and efficient static estimation of program computational complexity.
- TermInt
  This collection of 4 examples is taken from Ben-Amram's paper Monotonicity Constraints for Termination in the Integer Domain.
- WTC
  These 29 examples are adapted from the WTC Toolsuite Benchmarks web page.

The second and the third suite both originate from Java Bytecode (JBC) programs from the termination problem data base (TPDB) used in the International Termination Competition of 2010. In this competition, the three termination analyzers AProVE, COSTA, and Julia competed against each other.

Suite 2
For the second suite, we consider both the JBC programs from the (non-recursive) Java Bytecode category and those from the Java Bytecode Recursive category of the competition. Here 165 MCS instances are obtained by first applying the preprocessor from the termination analyzer COSTA, resulting in a representation consisting of (binary clause) constraint logic programs with linear constraints (CLPQ). After minor processing, these are abstracted to MCSs (applying SWI Prolog with its CLPQ library).
Suite 3
For the third suite, the termination analyzer AProVE is applied to preprocess and abstract JBC programs from the (non-recursive) Java Bytecode category of the competition to ITRSs where standard operations on integers are pre-defined. To further transform ITRSs into MCSs, we apply an abstraction which maps terms to their size and replaces non-linear arithmetic sub-expressions by fresh variables. This results in a CLPQ representation which is further abstracted to MCSs, as performed for the COSTA examples in Suite 2. The result is a collection of 127 MCSs. Note that the translation from JBC to ITRSs that we apply here only works for programs without recursion. This is also the reason why for Suite 3 we do not consider programs from the Java Bytecode Recursive category.

Settings and Experiments

Experiment 1
We apply our MCNP prototype on the examples from this suite of 81 examples. This and also the other experiments with our MCNP implementation are conducted on a machine with an Intel Core i3 530 CPU with 2 CPU cores at 2.93 GHz each, having 2 GB RAM. We obtain the following results (all times are in seconds):

MCNP

Number of termination proofs 66

Average runtime 0.55

Maximal runtime 5.15

[Details]

	MCNP
Number of termination proofs	66
Average runtime	0.55
Maximal runtime	5.15
	[Details]

Moreover, we also compare with the results of the earlier experiments with AProVE on the 32 original problem instances of the ITRS sub-suite. Here AProVE was applied on the ITRSs and was run on a machine with an Intel Core 2 Quad CPU Q9450 with 4 cores at 2.66GHz each, having 8 GB RAM.

MCNP AProVE

Number of termination proofs 27 28

Number of proofs on examples not solved by the other tool 0 1

Average runtime (without timeouts) 0.22 5.30

Average runtime (including timeouts) 0.22 12.14

Maximal runtime 4.22 > 60.00

Timeouts (> 60.00) 0 4

[Details] [Details]

	MCNP	AProVE
Number of termination proofs	27	28
Number of proofs on examples not solved by the other tool	0	1
Average runtime (without timeouts)	0.22	5.30
Average runtime (including timeouts)	0.22	12.14
Maximal runtime	4.22	> 60.00
Timeouts (> 60.00)	0	4
	[Details]	[Details]

These experiments show that an abstraction to MCSs is sufficiently powerful for representative programs on integers and they also demonstrate the efficiency of our SAT-based implementation of MCNP. The comparison with AProVE on the ITRS examples indicates that MCNP has about the same precision as AProVE on these examples, and MCNP is significantly faster.

Experiment 2
In the second experiment, we apply MCNP on the 165 MCS instances obtained from COSTA's abstraction to CLPQ programs. To compare, COSTA is applied on the CLPQ programs. The COSTA runs are performed on a machine with an Intel Core i5 650 CPU with 2 CPU cores at 3.2 GHz each, having 3 GB RAM.

MCNP COSTA

Number of termination proofs 92 102

Number of proofs on examples not solved by the other tool 5 15

Average runtime (without timeouts) 0.662 0.076

Average runtime (including timeouts) 0.662 3.709

Maximal runtime 16.31 > 120.00

Timeouts (> 120.00) 0 5

[Details] [Details]

	MCNP	COSTA
Number of termination proofs	92	102
Number of proofs on examples not solved by the other tool	5	15
Average runtime (without timeouts)	0.662	0.076
Average runtime (including timeouts)	0.662	3.709
Maximal runtime	16.31	> 120.00
Timeouts (> 120.00)	0	5
	[Details]	[Details]

The detailed results show that MCNP finds 5 termination proofs where COSTA does not (in 4 of these instances, COSTA reaches a timeout).

From these experiments we see that although MCNP uses a very simple domain of ranking functions, it is able to provide many of the proofs, and it does not encounter timeouts. Moreover, there are 5 programs where MCNP provides a proof and COSTA does not.

Experiment 3
In the third experiment, we apply three different tool configurations on the suite of 127 instances obtained from AProVE's abstraction to ITRSs. First of all, we apply MCNP on the abstracted MCSs. Then we use two different configurations of AProVE on the corresponding ITRSs: In the configuration "AProVE fixed abstraction" we use the same fixed abstraction on user-defined function symbols as for MCNP.

MCNP AProVE fixed abstraction

Number of termination proofs 63 73

Number of proofs on examples not solved by the other tool 5 15

Average runtime (without timeouts) 2.12 11.08

Average runtime (including timeouts) 5.76 14.16

Maximal runtime > 60.00 > 60.00

Timeouts (> 60.00) 8 8

[Details] [Details]

	MCNP	AProVE fixed abstraction
Number of termination proofs	63	73
Number of proofs on examples not solved by the other tool	5	15
Average runtime (without timeouts)	2.12	11.08
Average runtime (including timeouts)	5.76	14.16
Maximal runtime	> 60.00	> 60.00
Timeouts (> 60.00)	8	8
	[Details]	[Details]

In contrast, the configuration "AProVE" searches for an abstraction in the process of finding a termination proof. This is also the setting in which AProVE is normally applied. Here the two different configurations of the tool AProVE are run on a machine with an Intel Xeon 5140 with 4 CPU cores at 2.33 GHz each, having 16 GB RAM.

MCNP AProVE

Number of termination proofs 63 95

Number of proofs on examples not solved by the other tool 3 35

Average runtime (without timeouts) 2.12 9.58

Average runtime (including timeouts) 5.76 17.12

Maximal runtime > 60.00 > 60.00

Timeouts (> 60.00) 8 19

[Details] [Details]

	MCNP	AProVE
Number of termination proofs	63	95
Number of proofs on examples not solved by the other tool	3	35
Average runtime (without timeouts)	2.12	9.58
Average runtime (including timeouts)	5.76	17.12
Maximal runtime	> 60.00	> 60.00
Timeouts (> 60.00)	8	19
	[Details]	[Details]

In the detailed results one can observe that MCNP finds 5 termination proofs on examples where "AProVE fixed abstraction" failed and 3 termination proofs on (partly other) examples where AProVE failed.

This shows that the additional proving power in AProVE comes primarily from the costly search for the right abstraction. Once fixing the abstraction, MCNP is of similar precision and much faster. Thus, it could be fruitful to combine rewrite-based tools like AProVE with an MCNP-based analyzer such that the MCNP-analyzer is tried first (to solve the "easier" examples fast) and the rewrite-based analyzer is only applied for the remaining "hard" examples. Since MCNP shows termination in 3 cases where AProVE does not, we can also expect a raise in overall precision from such a combination.

In summary, the results of our experiments clearly demonstrate the good trade-off between power and speed of MCNP. We conclude that MCNP is a valuable contribution to the battery of techniques applied in state-of-the-art termination analyzers.