"Lower Bounds for Runtime Complexity of Term Rewriting" - Experiments

There exist numerous methods to infer upper bounds for the runtime complexity of term rewrite systems (TRSs). In the paper "Lower Bounds for Runtime Complexity of Term Rewriting", we present the first automatic technique to infer lower bounds for the runtime complexity of TRSs. While upper complexity bounds help to prove the absence of bugs that worsen the performance of programs, lower bounds can be used to find such bugs. Moreover, in combination with existing methods to deduce upper bounds, our technique can prove tight complexity results. For each lower bound found by our implementation, it also returns a witness, i.e., a family of terms that leads to the reported runtime.

The approach presented in the paper "Lower Bounds for Runtime Complexity of Term Rewriting" applies two techniques: an induction technique and loop detection. A preliminary version of the induction technique for innermost rewriting was published in our paper "Inferring Lower Bounds for Runtime Complexity", which appeared at RTA 2015. The loop detection technique is completely new.


Implementation in AProVE

We integrated our new technique in the termination and complexity analyzer AProVE, which is one of the most powerful termination and complexity tools for TRSs. It can be accessed via a custom web interface. Please note that the server running the web interface is considerably slower than the computer used for the benchmarks.


Since there are no other tools yet that infer lower bounds for the runtime complexity of TRSs automatically, to evaluate the power of our approach, we compared the results of our implementation with the upper bounds for runtime complexity that are computed by the tool TcT, which was the most powerful tool for inferring upper runtime complexity bounds for TRSs at the Termination Competition 2015. Similar to the implementation of our technique for lower bounds, the upper bounds computed by TcT are also asymptotic bounds. As suggested by the authors of TcT, we used its version 2.0.2 (which is available here) and the version 3.0.0 from the termination competition 2015. All tools were run on a 3.07 GHz 8-Core Intel i7 950 Debian Linux machine with a timeout of 300 seconds for each example.


We ran the tools on 865 TRSs from the category "Runtime Complexity - Full Rewriting" of the Termination Problem Data Base (TPDB 10.3), which is the collection of examples used at the Termination Competition 2015. Here, we omitted 60 TRSs which contain rules with extra variables on the right-hand side, since such rules are not allowed in classical term rewriting. Moreover, we also disregarded 34 examples where TcT was able to prove constant runtime complexity. The disregarded examples are marked in grey in the tables below.

Experiments & Discussion

In the following tables, rc stands for the runtime complexity function, mapping a natural number n to the length of the longest evaluation sequence starting with a basic term of at most size n.

Table: Induction Technique

Table 1 shows the results of the induction technique. The average runtime per example was 25.6 s. The induction technique is especially suitable for polynomial bounds (it proves 353 linear and 78 non-linear polynomial bounds). In particular, it is powerful for TRSs that implement realistic non-linear algorithms, e.g., it shows rcR(n) ∈ Ω(n2) for many implementations of sorting algorithms like Quicksort, Maxsort, Minsort, and Selectionsort from the TPDB.

Table: Induction Technique + Improvements

Table 2 shows that the induction technique yields significantly better results with argument filtering and indefinite lemmas (see "Inferring Lower Bounds for Runtime Complexity"). In this setting, 673 non-trivial lower bounds were proved. The average runtime was 24.5 s. Hence, even the efficiency of the induction technique benefits a little from the improvements.

Click here for more detailed results of the evaluation of the induction technique.

Table: Loop Detection

Table 3 presents the results of loop detection. The average runtime was 3.4 s, i.e., it is much more efficient than the induction technique. Loop detection infers non-trivial bounds for almost all analyzed TRSs: There are only 28 cases where loop detection fails. Note that for some of these examples, there might not even exist a non-trivial lower bound. Furthermore, loop detection is also powerful for non-polynomial lower bounds (it finds 144 exponential and 90 infinite lower bounds).

Table: Loop Detection + Argument Filtering

Table 4 shows that loop detection does not benefit from argument filtering. It proves 13 exponential bounds less than loop detection without argument filtering.

Click here for more detailed results of the evaluation of loop detection.

Table: Induction Technique + Loop Detection

In Table 5, we apply loop detection and the induction technique after each other. The average runtime was 24.6 s. Like loop detection on its own, this configuration proves non-trivial lower bounds for almost all analyzed TRSs (compared to Table 3, it fails for one more TRS due to a timeout). There are less non-linear polynomial lower bounds than in Table 1, since loop detection infers a better exponential or infinite lower bound for some TRSs where the induction technique can only prove a non-linear polynomial bound.

Table: Induction Technique + Loop Detection + Improvements

Finally, in Table 6 we combine loop detection and the induction technique with argument filtering and indefinite lemmas. Here, argument filtering was applied prior to the induction technique, but not before loop detection, since, as Table 4 shows, loop detection does not benefit from argument filtering. The average runtime was 24.4 s, but the analysis usually finished much faster. The median of the runtime was 2.6 s. This is the most powerful combination of our techniques to infer lower bounds with AProVE. However, the comparison of Table 5 and 6 shows that argument filtering and indefinite lemmas have little impact on the results if we use both loop detection and the induction technique.

According to Table 6, our implementation inferred non-trivial lower bounds for 836 (97%) of the 865 TRSs. Upper bounds were only obtained for 179 (21%) TRSs, although upper bounds smaller than ω exist for at least all 647 TRSs where AProVE shows termination. So although this is the first technique for lower runtime bounds, its applicability exceeds the applicability of the techniques for upper bounds which were developed for years. Of course, the task of finding worst-case upper bounds is very different from the task of inferring worst-case lower bounds. Tight bounds (where the lower and upper bounds are equal) were proven for the 234 TRSs on the diagonal of the table.

Click here for more detailed results of the evaluation of the combination of the induction technique and loop detection.

Here you find a detailed evaluation of our techniques for innermost term rewriting.

More Information on the Examples Used in the Evaluation

The examples in the TPDB contain many natural functional programs. For example, here is a list of examples contained in the TPDB that implement typical algorithms. This demonstrates that our experiments are indeed useful to evaluate the performance of our approach on realistic algorithms. For example, the Quicksort implementation of the leading example in our paper corresponds to Rubio_04/quick.xml from the TPDB (which in turn is the same as AG01/#3.55.xml without the rules for subtraction and division).

The category "Runtime Complexity - Full Rewriting" of the TPDB contains 3 Quicksort implementations where AProVE detects a quadratic lower bound:

In addition, the category "Runtime Complexity - Innermost Rewriting" contains the Quicksort implementation raML/quicksort.raml, where AProVE also proves a quadratic lower bound. Moreover, AProVE is able to show quadratic lower bounds for the following implementations of sorting algorithms: However, there are also implementations of sorting algorithms where AProVE is just able to prove a linear lower bound, like the following Quicksort implementations: Here the filtering functions (corresponding to the functions low and high from our leading example) also delete duplicates. Therefore, for homogeneous lists, the worst case runtime is linear, not quadratic. Since our generator functions only represent homogeneous data objects (e.g., lists or trees where all elements have the same value), the induction technique cannot detect a non-linear lower bound here.