Empirical Evaluation of "Proving Termination by Bounded Increase"

Most methods and tools for termination analysis of term rewrite systems (TRSs) essentially try to find arguments of functions that decrease in recursive calls. However, they fail if the reason for termination is that an argument is increased in recursive calls repeatedly until it reaches a bound. In the paper "Proving Termination by Bounded Increase", we solve that problem and present a method to prove termination of TRSs with bounded increase automatically.

In our experiments, we want to assess the power of our new termination method by running it on prototypical examples for bounded increase. We also show that these examples cannot be handled by any of the tools in the International Competition of Termination Tools 2006.

Implementation in AProVE

We integrated our approach in the termination tool AProVE. A special version of AProVE that allows to repeat the experiments below can be accessed via a special web interface.

Tools

In our experiments we use two versions of AProVE (with and without our new method) as well as all other tools that took part in the International Competition of Termination Tools 2006:

APROVE Increasing:
Our new method yields constraints over integer polynomials. These are encoded into propositional SAT problems (cf. SAT Solving for Termination Analysis with Polynomial Interpretations) which, in turn, are solved by the SAT solver MiniSAT.
AProVE WST06
The 2006 competition version of our AProVE tool.
CiME 2.0.4:
The 2006 competition version of the CiME tool from Paris.
Jambox 2.0e:
The 2006 competition version of the Jambox tool from Amsterdam.
Matchbox/SatELite:
The 2006 competition version of the Matchbox from Leipzig.
MU-TERM 4.3:
The 2006 competition version of the MU-TERM from Valencia.
TEPARLA:
The 2006 competition version of the TEPARLA from Eindhoven.
TPA:
The 2006 competition version of the TPA tool from Eindhoven.
TTTbox:
The 2006 competition version of the TTTbox tool from Innsbruck.

Examples and Settings

In our experiments, we tested the tools on the 17 prototypical examples for bounded increase from the appendix of the paper "Proving Termination by Bounded Increase". These examples include the challenge example posed at last years termination competition. They also include examples with non-boolean (possibly nested) functions in the bound, examples with combinations of bounds, examples containing increasing or decreasing defined symbols, examples with bounds on lists, examples with different increases in different arguments, increasing TRSs that go beyond the shape of functional programs, etc.

We use all tools in their fully automated competition version (thus, they require no manual settings). The AProVE version containing our new contribution applies the same strategy as the AProVE version used in the competition with only the addition of our new method for bounded increase at a suitable position. Since Jambox, Teparla, TPA, and TTTbox do not handle examples with an "innermost" strategy, we tried to prove full termination of the examples with these tools.

For our experiments, the tools were run on an AMD 64-bit machine with 2.2 GHz core frequency. For each example, we imposed a time limit of 60 seconds (corresponding to the way tools are evaluated in the annual competition).

Experiments

In the following table, each row shows the behavior of the different tools on one example. The entry "YES" means that innermost termination of that example could be proved by the corresponding tool while "MAYBE" states that the tool gave up without success. Finally, "KILLED" indicates that the tool exceeded the given time limit by more than five seconds and was therefore killed. By clicking on the respective runtime, one can inspect the output produced by the tool. To load an example into our web interface, just click on the corresponding button in the first column. Then you can run the new version of AProVE on the respective example yourself.

AProVE-Increasing		AProVE WST06		CiME 2.04		Jambox 2.0e		Matchbox-SatELite		MU-TERM 4.3		TEPARLA		TPA		TTTbox
YES	2.48	MAYBE	8.05	MAYBE	37.52	MAYBE	15.37	MAYBE	49.92	MAYBE	20.13	MAYBE	45.92	MAYBE	56.03	MAYBE	60.02
YES	2.25	MAYBE	6.14	MAYBE	19.73	MAYBE	12.27	MAYBE	53.84	MAYBE	0.41	MAYBE	33.29	MAYBE	50.64	MAYBE	60.02
YES	2.36	MAYBE	7.24	MAYBE	37.89	KILLED	65.00	MAYBE	50.70	MAYBE	20.15	MAYBE	44.66	MAYBE	56.12	MAYBE	60.02
YES	2.81	MAYBE	2.99	MAYBE	18.38	MAYBE	5.79	MAYBE	52.41	MAYBE	20.22	KILLED	65.00	MAYBE	56.87	MAYBE	58.33
YES	3.16	MAYBE	8.11	MAYBE	43.99	KILLED	65.00	MAYBE	57.78	MAYBE	20.36	KILLED	65.00	MAYBE	61.30	MAYBE	60.03
YES	3.88	KILLED	65.00	MAYBE	54.91	MAYBE	21.52	MAYBE	63.98	MAYBE	20.22	KILLED	65.00	MAYBE	61.30	MAYBE	60.02
YES	3.89	KILLED	65.00	KILLED	65.00	MAYBE	21.22	MAYBE	51.17	MAYBE	40.37	KILLED	65.00	MAYBE	62.33	MAYBE	60.02
YES	8.64	KILLED	65.00	MAYBE	40.97	MAYBE	14.90	MAYBE	60.07	MAYBE	20.23	KILLED	65.00	MAYBE	54.71	MAYBE	60.02
YES	2.60	MAYBE	8.96	MAYBE	27.70	KILLED	65.00	MAYBE	50.52	MAYBE	10.06	KILLED	65.00	MAYBE	52.48	MAYBE	60.02
YES	2.86	MAYBE	7.96	MAYBE	34.02	KILLED	65.00	MAYBE	49.34	MAYBE	10.06	KILLED	65.00	MAYBE	52.32	MAYBE	60.02
YES	2.58	MAYBE	34.43	MAYBE	21.79	KILLED	65.00	MAYBE	50.76	MAYBE	10.06	MAYBE	25.84	MAYBE	50.93	MAYBE	60.02
YES	13.82	KILLED	65.00	MAYBE	37.32	KILLED	65.00	MAYBE	58.36	MAYBE	10.12	KILLED	65.00	MAYBE	60.75	MAYBE	60.02
YES	3.14	MAYBE	5.38	MAYBE	54.80	MAYBE	23.18	MAYBE	53.49	MAYBE	20.44	KILLED	65.00	MAYBE	53.76	MAYBE	60.38
YES	32.06	MAYBE	36.91	KILLED	65.00	MAYBE	51.19	MAYBE	50.06	MAYBE	60.62	KILLED	65.00	MAYBE	57.97	MAYBE	60.02
YES	5.79	KILLED	65.00	KILLED	65.00	KILLED	65.00	MAYBE	52.46	MAYBE	50.64	KILLED	65.00	MAYBE	31.26	MAYBE	59.99
YES	3.31	KILLED	65.00	KILLED	65.00	MAYBE	53.51	MAYBE	55.16	MAYBE	20.31	MAYBE	37.93	MAYBE	60.05	MAYBE	59.99
YES	7.98	KILLED	65.00	MAYBE	54.98	KILLED	65.00	MAYBE	50.43	MAYBE	20.23	MAYBE	39.10	MAYBE	64.77	MAYBE	59.99

Except for the AProVE version containing our new contributions, none of the tools is able to solve any of these examples. To determine if the time limit was an issue for the other tools, we re-ran the experiment with a time limit of 10 minutes. This led to the same results.