Empirical Evaluation of "Proving Termination by Bounded Increase"

Most methods and tools for termination analysis of term rewrite systems (TRSs) essentially try to find arguments of functions that decrease in recursive calls. However, they fail if the reason for termination is that an argument is increased in recursive calls repeatedly until it reaches a bound. In the paper "Proving Termination by Bounded Increase", we solve that problem and present a method to prove termination of TRSs with bounded increase automatically.

In our experiments, we want to assess the power of our new termination method by running it on prototypical examples for bounded increase. We also show that these examples cannot be handled by any of the tools in the International Competition of Termination Tools 2006.


Implementation in AProVE

We integrated our approach in the termination tool AProVE. A special version of AProVE that allows to repeat the experiments below can be accessed via a special web interface.


Tools

In our experiments we use two versions of AProVE (with and without our new method) as well as all other tools that took part in the International Competition of Termination Tools 2006:

Examples and Settings

In our experiments, we tested the tools on the 17 prototypical examples for bounded increase from the appendix of the paper "Proving Termination by Bounded Increase". These examples include the challenge example posed at last years termination competition. They also include examples with non-boolean (possibly nested) functions in the bound, examples with combinations of bounds, examples containing increasing or decreasing defined symbols, examples with bounds on lists, examples with different increases in different arguments, increasing TRSs that go beyond the shape of functional programs, etc.

We use all tools in their fully automated competition version (thus, they require no manual settings). The AProVE version containing our new contribution applies the same strategy as the AProVE version used in the competition with only the addition of our new method for bounded increase at a suitable position. Since Jambox, Teparla, TPA, and TTTbox do not handle examples with an "innermost" strategy, we tried to prove full termination of the examples with these tools.

For our experiments, the tools were run on an AMD 64-bit machine with 2.2 GHz core frequency. For each example, we imposed a time limit of 60 seconds (corresponding to the way tools are evaluated in the annual competition).


Experiments

In the following table, each row shows the behavior of the different tools on one example. The entry "YES" means that innermost termination of that example could be proved by the corresponding tool while "MAYBE" states that the tool gave up without success. Finally, "KILLED" indicates that the tool exceeded the given time limit by more than five seconds and was therefore killed. By clicking on the respective runtime, one can inspect the output produced by the tool. To load an example into our web interface, just click on the corresponding button in the first column. Then you can run the new version of AProVE on the respective example yourself.

ExampleAProVE-IncreasingAProVE WST06CiME 2.04Jambox 2.0eMatchbox-SatELiteMU-TERM 4.3TEPARLATPATTTbox
YES2.48MAYBE8.05MAYBE37.52MAYBE15.37MAYBE49.92MAYBE20.13MAYBE45.92MAYBE56.03MAYBE60.02
YES2.25MAYBE6.14MAYBE19.73MAYBE12.27MAYBE53.84MAYBE0.41MAYBE33.29MAYBE50.64MAYBE60.02
YES2.36MAYBE7.24MAYBE37.89KILLED65.00MAYBE50.70MAYBE20.15MAYBE44.66MAYBE56.12MAYBE60.02
YES2.81MAYBE2.99MAYBE18.38MAYBE5.79MAYBE52.41MAYBE20.22KILLED65.00MAYBE56.87MAYBE58.33
YES3.16MAYBE8.11MAYBE43.99KILLED65.00MAYBE57.78MAYBE20.36KILLED65.00MAYBE61.30MAYBE60.03
YES3.88KILLED65.00MAYBE54.91MAYBE21.52MAYBE63.98MAYBE20.22KILLED65.00MAYBE61.30MAYBE60.02
YES3.89KILLED65.00KILLED65.00MAYBE21.22MAYBE51.17MAYBE40.37KILLED65.00MAYBE62.33MAYBE60.02
YES8.64KILLED65.00MAYBE40.97MAYBE14.90MAYBE60.07MAYBE20.23KILLED65.00MAYBE54.71MAYBE60.02
YES2.60MAYBE8.96MAYBE27.70KILLED65.00MAYBE50.52MAYBE10.06KILLED65.00MAYBE52.48MAYBE60.02
YES2.86MAYBE7.96MAYBE34.02KILLED65.00MAYBE49.34MAYBE10.06KILLED65.00MAYBE52.32MAYBE60.02
YES2.58MAYBE34.43MAYBE21.79KILLED65.00MAYBE50.76MAYBE10.06MAYBE25.84MAYBE50.93MAYBE60.02
YES13.82KILLED65.00MAYBE37.32KILLED65.00MAYBE58.36MAYBE10.12KILLED65.00MAYBE60.75MAYBE60.02
YES3.14MAYBE5.38MAYBE54.80MAYBE23.18MAYBE53.49MAYBE20.44KILLED65.00MAYBE53.76MAYBE60.38
YES32.06MAYBE36.91KILLED65.00MAYBE51.19MAYBE50.06MAYBE60.62KILLED65.00MAYBE57.97MAYBE60.02
YES5.79KILLED65.00KILLED65.00KILLED65.00MAYBE52.46MAYBE50.64KILLED65.00MAYBE31.26MAYBE59.99
YES3.31KILLED65.00KILLED65.00MAYBE53.51MAYBE55.16MAYBE20.31MAYBE37.93MAYBE60.05MAYBE59.99
YES7.98KILLED65.00MAYBE54.98KILLED65.00MAYBE50.43MAYBE20.23MAYBE39.10MAYBE64.77MAYBE59.99

Except for the AProVE version containing our new contributions, none of the tools is able to solve any of these examples. To determine if the time limit was an issue for the other tools, we re-ran the experiment with a time limit of 10 minutes. This led to the same results.