"Symbolic Evaluation Graphs and Term Rewriting —
A General Methodology for Analyzing Logic Programs"
- Experiments and Proofs

In very recent work (see "A Linear Operational Semantics for Termination and Complexity Analysis of ISO Prolog" published at LOPSTR '11 and its corresponding full version), we introduced a novel operational semantics for ISO Prolog. This semantics is linear, i.e., each query gives rise to a linear evaluation sequence instead of a tree, and, thus, lends itself well to various analysis tasks such as termination, complexity, or determinacy analysis.

In the paper "Symbolic Evaluation Graphs and Term Rewriting — A General Methodology for Analyzing Logic Programs" we present a transformational approach for automatically proving termination, finding asymptotic upper bounds on the runtime, and proving determinacy for Prolog queries, based on automated analysis techniques developed for term rewriting. First, we show how to obtain symbolic evaluation graphs for classes of queries. Then we present a method to synthesize term rewrite systems (TRSs) from a symbolic evaluation graph such that the resulting TRSs can simulate all evaluations of the original Prolog program w.r.t. the given class of queries. Thus, termination of the resulting TRS implies termination of the original Prolog program. We continue by showing that the generated TRSs can also be used to prove upper bounds for the runtime complexity of the original Prolog program. Here, the runtime of a Prolog program is measured as the number of unification attempts during the evaluation of a query. Finally, we introduce a syntactic criterion based on symbolic evaluation graphs which is sufficient to prove determinacy of a class of queries.

A full version of our paper including all proofs is available here. For details on the heuristics used to construct symbolic evaluation graphs, we refer to the diploma thesis of Thomas Ströder.


Implementation in AProVE

We integrated our new transformational approach in the termination, complexity, and determinacy analyzer AProVE which is one of the most powerful termination tools for logic programs. It can be accessed via a custom web interface. Here, an answer like "BOUNDS(O(1), O(n^2))" means that O(1) is a lower bound and O(n2) is an upper bound for the runtime of the logic program. Currently, AProVE does not support techniques to prove lower bounds for Prolog programs other than "O(1)", i.e., up to now we only focus on upper bounds.


Settings

We compare our termination, complexity, and determinacy analysis in AProVE to other existing automated analyzers for logic programs:

Examples

We ran the tools on all 477 examples for logic programs from the Termination Problem Data Base (8.0.6) used in the international Termination Competition 2012. These examples have been collected from a large number of sources including several papers, entries for programming contests, and freely available Prolog programs.

Experiments & Discussion

All three tools were run on 2.2 GHz Quad-Opteron 848 Linux machines with a timeout of 60 seconds per example, as in the competitions on automated termination and complexity analysis (which are held annually as part of the Termination Competition).

The results for termination, complexity and determinacy analysis are presented separately. Each section concludes with a link to a table presenting all individual examples and the results of the tools on them. Clicking on the times when a proof has been found opens the corresponding proof. Moreover, by clicking on the name of the example in the first column, one can run AProVE on this example in the web interface.

Termination

Here, we tested four different techniques implemented in AProVE against each other. Since AProVE is already the most powerful tool for termination analysis of logic programs (as indicated in the annual Termination Competitions), there is no repeated comparison to other termination provers for logic programs.
The four techniques (named according to the list of references in the current paper) are:

In the table, "YES" indicates the number of examples where termination could be proved, "MAYBE" the number of examples where the corresponding tool returned without an answer in 60 seconds and "Timeout" the number of times a tool exceeded the 60 seconds timeout. We use one row for each version of AProVE and highlight the best version for each column using bold font.

ToolYESMAYBETimeoutAvg. Runtime
AProVE-[31]265197177.1s
AProVE-[32]287176167.6s
AProVE-[35]340127125.7s
AProVE-New342124136.5s

Detailed results for termination

Note that the full version of AProVE combines several techniques for termination analysis of logic programs and is, thus, even more powerful than any of the presented techniques on its own (the full version can show termination of 358 out of these 477 example programs). However, it is hard to determine the exact power of one particular technique in such a combined system. Hence, we compared the techniques separately.

Complexity

The table below shows the results of our experiments on complexity analysis with one row for each tool. The first four columns give the number of logic programs that could be shown to have a constant bound ("O(1)"), a linear bound ("O(n)"), a quadratic bound ("O(n2)"), or an exponential bound ("O(n⋅2n)"). In Rows 5 and 6 we give the total number of upper bounds that could be found by the tool and its average runtime for each example, respectively. We highlight the best tool for each column using bold font.

This table shows that AProVE can find upper bounds for a much larger subset (42.8%) of the programs than any of the other tools (6.1%). However, there are also 6 examples where CASLOG or CiaoPP can prove constant (1), linear (1), quadratic (1), or exponential bounds (3), whereas AProVE fails. In summary, the experiments clearly demonstrate that our transformational approach for determining upper bounds advances the state of the art in automated complexity analysis of logic programs significantly.

ToolO(1)O(n)O(n2)O(n⋅2n)BoundsAvg. Runtime
CASLOG121432914.8s
CiaoPP-o319432911.7s
CiaoPP-r318432812.5s
AProVE5411737020810.6s

Detailed results for complexity

Determinacy

In the following tables, "YES" indicates the number of examples where determinacy could be proved, "MAYBE" the number of examples where the corresponding tool returned without an answer in 60 seconds and "Timeout" the number of times a tool exceeded the 60 seconds timeout. We present the results on definite logic programs and Prolog programs using built-in predicates separately to show the advantages and disadvantages of the different tools more clearly. We see that CiaoPP is much more powerful on definite logic programs than AProVE. However, on Prolog programs using built-in predicates like cuts, AProVE's determinacy analysis can prove determinacy of 14 examples more than CiaoPP. So while CiaoPP is more powerful in total, there are 58 examples where AProVE can prove determinacy while CiaoPP fails. Thus, our new determinacy analysis adds substantial power to existing approaches for automated determinacy analysis.

ToolYESMAYBETimeoutAvg. Runtime
AProVE1927833.4s
CiaoPP-det13216356.9s

Detailed results for determinacy on definite logic programs

ToolYESMAYBETimeoutAvg. Runtime
AProVE7510023.3s
CiaoPP-det6110978.2s

Detailed results for determinacy on Prolog programs using built-in predicates