"Lower Bounds for Runtime Complexity of Term Rewriting" - Experiments for Innermost Rewriting

The approach presented in the paper "Lower Bounds for Runtime Complexity of Term Rewriting" applies two techniques: an induction technique and loop detection. In our paper "Inferring Lower Bounds for Runtime Complexity", we explain how to adapt the induction technique for innermost rewriting. How to adapt loop detection for innermost rewriting is explained in Appendix A of the paper "Lower Bounds for Runtime Complexity of Term Rewriting".

Settings

Since there are no other tools yet that infer lower bounds for the innermost runtime complexity of TRSs automatically, to evaluate the power of our approach, we compared the results of our implementation with the upper bounds for innermost runtime complexity that are computed by the tools AProVE and TcT. These were the most powerful tools for inferring upper innermost runtime complexity bounds for TRSs at the Termination Competition 2015. Similar to the implementation of our technique for lower bounds, the upper bounds computed by these tools are also asymptotic bounds. As suggested by the authors of TcT, we used its version 2.0.2 (which is available here) and the version 3.0.0 from the termination competition 2015. All tools were run on a 3.07 GHz 8-Core Intel i7 950 Debian Linux machine with a timeout of 300 seconds for each example.

Examples

We ran the tools on 970 TRSs from the category "Runtime Complexity - Innermost Rewriting" of the Termination Problem Data Base (TPDB 10.3), which is the collection of examples used at the Termination Competition 2015. Here, we omitted 60 non-standard TRSs which contain rules with extra variables on the right-hand side, since such rules are not allowed in classical term rewriting. Moreover, we also disregarded 52 examples where AProVE or TcT were able to prove constant innermost runtime complexity. The disregarded examples are marked in grey in the tables below.

Experiments & Discussion

To assess the quality of the inferred lower bounds, the table below compares them with the minimum upper bound computed by AProVE or TcT. Here, irc stands for the innermost runtime complexity function, mapping a natural number n to the length of the longest innermost evaluation sequence starting with a basic term of at most size n.

Table 1 shows the results of the induction technique. The average runtime per example was 29.6 s. The induction technique is especially suitable for polynomial bounds (it proves 426 linear and 85 non-linear polynomial bounds). In particular, it is powerful for TRSs that implement realistic non-linear algorithms, e.g., it shows irc_R(n) ∈ Ω(n²) for many implementations of sorting algorithms like Quicksort, Maxsort, Minsort, and Selectionsort from the TPDB.

Table 2 shows that the induction technique yields significantly better results with argument filtering and indefinite lemmas (see "Inferring Lower Bounds for Runtime Complexity"). In this setting, 772 non-trivial lower bounds were proved. The average runtime was 26.8 s. Hence, even the efficiency of the induction technique benefits a little from the improvements.

Click here for more detailed results of the evaluation of the induction technique.

Table 3 presents the results of loop detection. The average runtime was 2.8 s, i.e., it is much more efficient than the induction technique. Loop detection infers non-trivial bounds for almost all analyzed TRSs: There are only 13 cases where loop detection fails. Note that for some of these examples, there might not even exist a non-trivial lower bound. Furthermore, loop detection is also powerful for non-polynomial lower bounds (it finds 41 exponential and 100 infinite lower bounds).

Table 4 shows that loop detection hardly benefits from argument filtering. Compared to Table 3, we just get one additional linear lower bound, where loop detection without argument filtering times out.

Click here for more detailed results of the evaluation of loop detection.

In Table 5, we combine loop detection and the induction technique. The average runtime was 28.2 s. Like loop detection on its own, this configuration proves non-trivial lower bounds for almost all analyzed TRSs. Compared to Table 3, it fails for two more TRSs due to timeouts and it succeds for one more TRS. While the latter has a decreasing loop, our implementation does not find it since our heuristics to generate the rewrite sequences to analyze are not optimal for this particular example. In contrast, the induction technique successfully proves a linear lower bound. There are less non-linear polynomial lower bounds than in Table 1, since loop detection infers a better exponential or infinite lower bound for some TRSs where the induction technique can only prove a non-linear polynomial bound.

Finally, in Table 6 we combine loop detection and the induction technique with argument filtering and indefinite lemmas. Here, argument filtering was applied prior to the induction technique, but not before loop detection, since, as Table 4 shows, loop detection hardly benefits from argument filtering. The average runtime was 26.5 s, but the analysis usually finished much faster. The median of the runtime was 2.8 s. This is the most powerful combination of our techniques to infer lower bounds with AProVE . However, the comparison of Table 5 and 6 shows that argument filtering and indefinite lemmas have little impact on the results if we use both loop detection and the induction technique.

According to Table 6, our implementation inferred non-trivial lower bounds for 956 (99%) of the 970 TRSs. Upper bounds were only obtained for 424 (44%) TRSs. So although this is the first technique for lower innermost runtime bounds, its applicability exceeds the applicability of the techniques for upper bounds which were developed for years. Of course, the task of finding worst-case upper bounds is very different from the task of inferring worst-case lower bounds. Tight bounds (where the lower and upper bounds are equal) were proven for the 384 TRSs on the diagonal of the table.

Click here for more detailed results of the evaluation of the combination of the induction technique and loop detection.

Here you find a detailed evaluation of our techniques for full term rewriting.