In our paper "The Dependency Pair Framework for Automated Complexity Analysis of Term Rewrite Systems",
we present a modular framework to analyze the innermost runtime complexity of term
rewrite systems (TRSs) automatically. Our method
is based on the dependency pair (DP) framework for termination analysis.
In contrast to previous work, we developed a direct adaptation
of successful termination techniques from the DP framework
in order to use them for complexity analysis.
On this webpage, we describe extensive experimental results which
demonstrate the power of our method compared to
existing techniques.
Implementation in AProVE
We integrated our approach into a new version of the
software verification tool AProVE. In particular, this
new version of AProVE allows to repeat the experiments
below. It can be accessed via the following web
interface .
Note that this is the same web interface that is also used for termination proofs. To use it for innermost runtime
complexity analysis, the TRS should start with the following lines:
(GOAL COMPLEXITY)
(STARTTERM CONSTRUCTOR-BASED)
(STRATEGY INNERMOST)
The result of the tool is of the form BOUNDS(O(f),O(g)) where O(f) is a lower bound and O(g) is an upper bound for the innermost runtime complexity of the TRS.
Tools
In our experiments we compare AProVE to the tools TCT and
CaT.
- AProVE 2011
In this version of the AProVE tool, we implemented
all contributions of our paper "The Dependency Pair Framework for Automated Complexity Analysis of Term Rewrite Systems".
In addition, some further related DP processors were adapted to the complexity setting (e.g., processors for rewriting, instantiation, and forward instantiation of DPs). Moreover, we also use match-bounds
to infer complexity bounds. On this website, we provide a complete list of all techniques used in AProVE 2011 and
for each technique, we also provide links to all those examples where the respective technique was used for the complexity analysis.
The version of AProVE used in our experiments is a slight improvement of the AProVE version used in
the International Termination Competition 2010. In this competition, AProVE was the most successful tool for innermost runtime complexity analysis.
- TCT 1.6
TCT is a powerful
tool for automatically proving polynomial upper bounds on the derivational complexity and runtime complexity of term rewriting systems. It won the categories "innermost derivational complexity" and "runtime complexity" in the International Termination Competition 2010.
- CaT 1.5
CaT is a powerful analyzer for the complexity of TRSs. It is based on the fast and powerful termination analyzer TTT2. CaT analyzes both runtime and derivational complexity.
Among others, CaT won the category of derivational complexity in 2008, 2009, and 2010 in the International Termination Competition.
Examples and Settings
The collection of examples used for this evaluation is derived from the TRSs in the
Termination Problem Data Base (TPDB) Version 8, which was used in
the International Competition of
Termination Provers 2010. This data base consists of many examples from
many different origins, but is generally targeted at termination analysis of
different kinds of rewriting problems. Hence, the following steps were taken to
transform it into a collection of examples suitable for runtime complexity
analysis:
- Examples of several specific variants of rewriting had to be excluded,
because none of the tools could analyze their complexity. These kinds were
"Relative Rewriting", "Higher Order Rewriting",
"Rewriting modulo Theory (Associative, Commutative)", and
"Conditional Rewriting".
- For all TRSs with "full", "outermost", or "context-sensitive" evaluation strategy, we changed the strategy to "innermost".
Afterwards, we removed TRSs which became duplicates after this change of the strategy.
- We removed 165 examples which could be shown to have infinite rewrite sequences
starting from basic terms. The reason is that those examples cannot be
bounded by a finite function. To find these examples, the non-termination
checker in AProVE was modified to check for sequences starting with basic
terms.
- Furthermore, examples which could be solved by our program Oops were removed. This is a program that
examines a TRS for different criteria in which runtime complexity analysis
of a system does not make sense (for details, see here).
After this filtering, 1323 examples suitable for runtime
complexity analysis remained.
We ran all three tools on a quad-core Intel(R) Xeon(R) CPU 5140, with
2.33 GHz and 16 GB of memory. Each tool had at most 60 seconds to give an
upper bound on each of the examples.
Experiments
Comparison of results
The first table compares the performance of CaT against AProVE. For instance, the first row means that there were 209
examples where AProVE proved constant complexity. On those examples, CaT showed linear complexity in 182 cases, failed before the timeout
in 25 cases, and reached the timeout in 2 cases. So the green part of the table is the part where AProVE gave more precise results
than CaT. In
the yellow part, the two tools obtained equal results. In the red part, CaT was more precise than AProVE.
Similarly, the second table compares TCT against AProVE. Here, "O(1)" stands for Pol_0, "O(n^1)" stands for Pol_1, etc. Finally, the last table compares AProVE against a "combination of CaT and TCT"
(which always returns the better result of these two tools).
CaT 1.5 |
|
|
O(1) |
O(n^1) |
O(n^2) |
O(n^3) |
MAYBE |
TIMEOUT |
∑ |
O(1) |
- |
182 |
- |
- |
25 |
2 |
209 |
O(n^1) |
- |
187 |
7 |
- |
73 |
3 |
270 |
O(n^2) |
- |
32 |
2 |
- |
82 |
1 |
117 |
O(n^3) |
- |
6 |
- |
- |
16 |
- |
22 |
MAYBE |
- |
6 |
- |
- |
21 |
- |
27 |
TIMEOUT |
- |
21 |
3 |
1 |
633 |
20 |
678 |
∑ |
0 |
434 |
12 |
1 |
850 |
26 |
1323 |
TCT 1.6 |
|
|
O(1) |
O(n^1) |
O(n^2) |
O(n^3) |
MAYBE |
TIMEOUT |
∑ |
O(1) |
10 |
157 |
- |
- |
- |
42 |
209 |
O(n^1) |
- |
152 |
1 |
- |
- |
117 |
270 |
O(n^2) |
- |
35 |
- |
- |
- |
82 |
117 |
O(n^3) |
- |
5 |
- |
- |
- |
17 |
22 |
MAYBE |
- |
6 |
- |
- |
- |
21 |
27 |
TIMEOUT |
- |
16 |
3 |
- |
- |
659 |
678 |
∑ |
10 |
371 |
4 |
0 |
0 |
938 |
1323 |
Best of CaT and TCT |
|
|
O(1) |
O(n^1) |
O(n^2) |
O(n^3) |
no result |
∑ |
O(1) |
10 |
177 |
- |
- |
22 |
209 |
O(n^1) |
- |
201 |
1 |
- |
68 |
270 |
O(n^2) |
- |
36 |
- |
- |
81 |
117 |
O(n^3) |
- |
6 |
- |
- |
16 |
22 |
no result |
- |
29 |
3 |
1 |
672 |
705 |
∑ |
10 |
449 |
4 |
1 |
859 |
1323 |
Comparison of power
The following table summarizes the above results.
AProVE could show that 618 of the 1323 examples (47 %) have polynomial runtime.
(Note that the collection also contains many examples whose innermost runtime complexity is not polynomial.)
In contrast,
CaT could show polynomial runtime for 447 examples (33 %)
and TCT shows polynomial runtime for 385 (29 %) examples. Even a "combined tool" of CaT and TCT (which always returns the better result of these two tools) would only show polynomial runtime for 464 examples (35 %).
Hence, the contributions of our paper indeed represent a significant advance. The last column (TOTAL RUNTIME) gives the overall time (in seconds) that the tool needed to analyze all 1323 examples. Since AProVE and TCT had much more timeouts than CaT, their runtimes were significantly higher.
Tool | YES | MAYBE | TIMEOUT | TOTAL RUNTIME |
AProVE 2011 | 618 | 27 | 678 | 49094 |
CaT 1.5 | 447 | 850 | 26 | 8163 |
TCT 1.6 | 385 | 0 | 938 | 57484 |
Details on the experiments
A detailed table of all experiments can be found here.
This table gives the detailed results and runtimes of all three tools on every example. Moreover,
by clicking on the respective runtimes of successful proofs, it is possible to inspect the proof output generated by the tool.
Moreover, by clicking on the name of the example in the first column, one can run AProVE on this example in the web interface.
Note that the computer of the web interface is significantly slower than the computer used in our experiments. Hence, the proofs
in the web interface may take substantially longer.
Performance on the examples of our paper
The following table shows how the three tools perform on the examples in our paper.
Again,
by clicking on the respective runtimes of successful proofs, it is possible to inspect the proof output generated by the tool.
Moreover, by clicking on the name of the example in the first column, one can run AProVE on this example in the web interface.