In our paper "Analyzing Innermost Runtime Complexity of Term Rewriting by
Dependency Pairs",
we present a modular framework to analyze the innermost runtime complexity of term
rewrite systems (TRSs) automatically. Our method
is based on the dependency pair (DP) framework for termination analysis.
In contrast to previous work, we developed a direct adaptation
of successful termination techniques from the DP framework
in order to use them for complexity analysis.
On this webpage, we describe extensive experimental results which
demonstrate the power of our method compared to
existing techniques.
Implementation in AProVE
We integrated our approach into a new version of the
software verification tool AProVE. In particular, this
new version of AProVE allows to repeat the experiments
below. It can be accessed via the following web
interface .
Note that this is the same web interface that is also used for termination proofs. To use it for innermost runtime
complexity analysis, the TRS should start with the following lines:
(GOAL COMPLEXITY)
(STARTTERM CONSTRUCTOR-BASED)
(STRATEGY INNERMOST)
The result of the tool is of the form YES(O(f),O(g)) where O(f) is a lower bound and O(g) is an upper bound for the innermost runtime complexity of the TRS.
Tools
In our experiments we compare AProVE to the tools TCT and
CaT.
- AProVE 2012
In this version of the AProVE tool, we implemented all contributions of our paper "Analyzing Innermost Runtime Complexity of Term Rewriting by
Dependency Pairs"". In addition, some further related DP processors were adapted to the complexity setting (e.g., processors for rewriting, instantiation, and forward instantiation of DPs). Moreover, we also used match-bounds to infer complexity bounds. On this website, we provide a complete list of all techniques used in AProVE 2012 and for each technique, we also provide links to all those examples where the respective technique was used for the complexity analysis.
The version of AProVE used in our experiments is a slight improvement of the AProVE versions used in the International Termination Competition in 2010 - 2012. In this competition, AProVE was the most successful tool for innermost runtime complexity analysis.
- TCT
TCT is a powerful
tool for automatically proving polynomial upper bounds on the derivational complexity and runtime complexity of term rewriting systems. It won the categories "innermost derivational complexity" and "runtime complexity" in the International Termination Competition 2010, 2011, and 2012. As suggested by the authors of TCT, we used the development version id ccf74e291a with a special strategy provided by the authors of TCT.
- CaT 1.5
CaT is a powerful analyzer for the complexity of TRSs. It is based on the fast and powerful termination analyzer TTT2. CaT analyzes both runtime and derivational complexity.
Among others, CaT won the category of derivational complexity in 2008-2012 in the International Termination Competition.
Examples and Settings
For the evaluation, we used the
1249 TRSs from the Termination Problem Data Base (TPDB)
that were used in the full run of the International
Termination Competition in December 2011. We omitted the 60 TRSs which contain rules with extra
variables on the right-hand side, since they are trivially non-terminating.
We ran all three tools on a quad-core Intel(R) Xeon(R) CPU 5140, with
2.33 GHz and 16 GB of memory. As in the competition, each tool had at most 60 seconds to give an
upper bound on each of the examples.
Experiments
Comparison of AProVE with CaT and TCT
The first table compares the performance of CaT against AProVE. For instance, the first row means that AProVE showed constant complexity for 343
examples. On those examples, CaT proved linear complexity in 199 cases and failed in 144
cases. Here, "O(1)" stands for Pol0, "O(n^1)" stands for Pol1, etc. So in the green part of the table, AProVE gave more precise results than CaT. In
the yellow part, both tools obtained equal results. In the red part, CaT was
more precise than AProVE. Similarly, the second table compares TCT against AProVE.
CaT 1.5 |
---|
| | O(1) | O(n^1) | O(n^2) | O(n^3) | no
result | Sum |
---|
O(1) | - | 199 | - | - | 144 | 343 |
---|
O(n^1) | - | 96 | 10 | - | 95 | 201 |
---|
O(n^2) | - | 15 | - | - | 67 | 82 |
---|
O(n^3) | - | - | - | - | 13 | 13 |
---|
no result | - | 6 | 2 | 1 | 601 | 610 |
---|
Sum | 0 | 316 | 12 | 1 | 920 | 1249 |
---|
TCT 1.9 |
---|
| | O(1) | O(n^1) | O(n^2) | O(n^3) | no
result | Sum |
---|
O(1) | 1 | 209 | 8 | - | 125 | 343 |
---|
O(n^1) | - | 128 | 19 | - | 54 | 201 |
---|
O(n^2) | - | 24 | 17 | - | 41 | 82 |
---|
O(n^3) | - | - | - | - | 13 | 13 |
---|
no result | - | 29 | 13 | - | 568 | 610 |
---|
Sum | 1 | 390 | 57 | 0 | 801 | 1249 |
---|
Evaluation of New Processors
To evaluate the usefulness of the rhs simplification processor and the unreachable DT
processor (which are new compared to the preliminary version of the current paper from the proceedings of CADE 2011),
we also tested the full version of AProVE against a variant where we disabled these two
processors. As shown in the table below, this variant is substantially weaker. (That there are also a
few examples where the restricted variant obtained more precise complexity bounds is due
to the heuristics and internal time-limits of AProVE, which determine its strategy to apply
the different processors.)
AProVE without rhs simplification and unreachable DT
removal processor |
|
| O(1) | O(n^1) | O(n^2) | O(n^3) | no
result | Sum |
O(1) | 131 | 117 | 15 | 1 | 79 | 343 |
---|
O(n^1) | 2 | 171 | 12 | - | 16 | 201 |
---|
O(n^2) | - | 7 | 60 | 1 | 14 | 82 |
---|
O(n^3) | - | - | 1 | 7 | 5 | 13 |
---|
no result | - | - | 1 | - | 609 | 610 |
---|
Sum | 133 | 295 | 89 | 9 | 723 | 1249 |
---|
Comparison of power
The following table summarizes the above results.
AProVE could show that 639 of the 1249 examples (51 %) have polynomial innermost runtime.
(Note that the collection also contains many examples whose innermost runtime complexity is not polynomial.)
In contrast,
CaT could show polynomial runtime for 329 examples (26 %)
and TCT shows polynomial runtime for 448 (36 %) examples. Even a "combined tool" of CaT and TCT (which always returns the better result of these two tools) would only show polynomial runtime for 501 examples (40 %).
Hence, the contributions of our paper indeed represent a significant advance.
Moreover, while the
full version of AProVE showed polynomial innermost runtime for 639 examples (52 %),
the restricted variant without the two new processors only succeeded for 526 TRSs (42 %). In other words, this
demonstrates the significance and usefulness of the new processors. (In the table, "AProVE restricted" refers to
AProVE without the rhs simplification and the unreachable DT removal processor.)
Tool | YES | MAYBE | TIMEOUT |
AProVE 2012 | 639 [6.34] | 67 [8.56] | 543 |
AProVE restricted | 526 [12.79] | 60 [7.73] | 663 |
TCT | 448 [6.69] | 260 [25.81] | 329 |
CaT | 329 [1.20] | 905 [3.16] | 15 |
Details on the experiments
A detailed table of all experiments can be found here.
This table gives the detailed results and runtimes of all four tools on every example. Moreover,
by clicking on the respective runtimes of successful proofs, it is possible to inspect the proof output generated by the tool.
Moreover, by clicking on the name of the example in the first column, one can run AProVE on this example in the web interface.
Note that the computer of the web interface is significantly slower than the computer used in our experiments. Hence, the proofs
in the web interface may take substantially longer.
Performance on the examples of our paper
The following table shows how the tools perform on the examples in our paper.
Again,
by clicking on the respective runtimes of successful proofs, it is possible to inspect the proof output generated by the tool.
Moreover, by clicking on the name of the example in the first column, one can run AProVE on this example in the web interface.