"Symbolic Evaluation Graphs and Term Rewriting —
A General Methodology for Analyzing Logic Programs" - Experiments and Proofs

In very recent work (see "A Linear Operational Semantics for Termination and Complexity Analysis of ISO Prolog" published at LOPSTR '11 and its corresponding full version), we introduced a novel operational semantics for ISO Prolog. This semantics is linear, i.e., each query gives rise to a linear evaluation sequence instead of a tree, and, thus, lends itself well to various analysis tasks such as termination, complexity, or determinacy analysis.

In the paper "Symbolic Evaluation Graphs and Term Rewriting — A General Methodology for Analyzing Logic Programs" we present a transformational approach for automatically proving termination, finding asymptotic upper bounds on the runtime, and proving determinacy for Prolog queries, based on automated analysis techniques developed for term rewriting. First, we show how to obtain symbolic evaluation graphs for classes of queries. Then we present a method to synthesize term rewrite systems (TRSs) from a symbolic evaluation graph such that the resulting TRSs can simulate all evaluations of the original Prolog program w.r.t. the given class of queries. Thus, termination of the resulting TRS implies termination of the original Prolog program. We continue by showing that the generated TRSs can also be used to prove upper bounds for the runtime complexity of the original Prolog program. Here, the runtime of a Prolog program is measured as the number of unification attempts during the evaluation of a query. Finally, we introduce a syntactic criterion based on symbolic evaluation graphs which is sufficient to prove determinacy of a class of queries.

A full version of our paper including all proofs is available here. For details on the heuristics used to construct symbolic evaluation graphs, we refer to the diploma thesis of Thomas Ströder.

Implementation in AProVE

We integrated our new transformational approach in the termination, complexity, and determinacy analyzer AProVE which is one of the most powerful termination tools for logic programs. It can be accessed via a custom web interface. Here, an answer like "BOUNDS(O(1), O(n^2))" means that O(1) is a lower bound and O(n²) is an upper bound for the runtime of the logic program. Currently, AProVE does not support techniques to prove lower bounds for Prolog programs other than "O(1)", i.e., up to now we only focus on upper bounds.

Settings

We compare our termination, complexity, and determinacy analysis in AProVE to other existing automated analyzers for logic programs:

Automated Program Verification Environment (AProVE):
AProVE is one of the most powerful automated analyzers for termination of programs in several programming languages including term rewriting, Haskell, Prolog, and Java. This new experimental version of AProVE contains an implementation of symbolic evaluation graphs and of our transformation to term rewriting. Then it uses its existing backend for termination and complexity analysis of TRSs and concludes termination of the original Prolog program or translates the bounds found for the TRSs to upper bounds for the Prolog program. For determinacy analysis, the construction of a symbolic evaluation graph and the check of a syntactic criterion on this graph are already sufficient to prove determinacy of a class of queries.
Complexity Analysis System for LOGic (CASLOG):
This system was the first automated complexity analyzer for logic programs and its original version from 1992 is still available. As this version requires an outdated version of Sicstus Prolog, we adapted it such that it can be compiled using Sicstus 4 or Ciao Prolog. For repeatability of the experiments, the adapted version is available for download with the permission of the CASLOG author Nai-Wei Lin. In the tables below, we used a version that was compiled using Ciao Prolog. Results using Sicstus Prolog were identical up to minor differences in runtime.
CASLOG performs the complexity analysis in a fully automatic way, but it requires that the user annotates moding and measure information, i.e., the user has to indicate which arguments are to be considered as input arguments and which norms should be used to measure their size. To obtain these annotations automatically, in the tables below, we used CiaoPP as a pre-processor for CASLOG.
Ciao Preprocessor (CiaoPP):
The Ciao System contains a pre-processor that offers a wealth of (static) analysis techniques, including two techniques to determine upper bounds for logic programs and one technique to prove determinacy. For complexity analysis, we used both the original cost analysis (CiaoPP-o) and CiaoPP's new resource framework which allows to measure different forms of costs (CiaoPP-r). Here, we chose the cost measure "res_steps" which approximates the number of resolution steps needed in evaluations. CiaoPP-det is the determinacy analysis implemented in CiaoPP.

Examples

We ran the tools on all 477 examples for logic programs from the Termination Problem Data Base (8.0.6) used in the international Termination Competition 2012. These examples have been collected from a large number of sources including several papers, entries for programming contests, and freely available Prolog programs.

Experiments & Discussion

All three tools were run on 2.2 GHz Quad-Opteron 848 Linux machines with a timeout of 60 seconds per example, as in the competitions on automated termination and complexity analysis (which are held annually as part of the Termination Competition).

The results for termination, complexity and determinacy analysis are presented separately. Each section concludes with a link to a table presenting all individual examples and the results of the tools on them. Clicking on the times when a proof has been found opens the corresponding proof. Moreover, by clicking on the name of the example in the first column, one can run AProVE on this example in the web interface.

Termination

Here, we tested four different techniques implemented in AProVE against each other. Since AProVE is already the most powerful tool for termination analysis of logic programs (as indicated in the annual Termination Competitions), there is no repeated comparison to other termination provers for logic programs.
The four techniques (named according to the list of references in the current paper) are:

AProVE-[31]: A direct transformation from definite logic programs to term rewrite systems presented in ACM TOCL '09. If a logic program contains cuts, they are just ignored.
AProVE-[32]: The first method for termination analysis of logic programs with cut based on symbolic evaluation graphs presented at ICLP '10. Here, we synthesize definite logic programs from such graphs which can then be analyzed further (e.g., by using the technique of TOCL '09).
AProVE-[35]: At LOPSTR '10, we improved the method of ICLP '10 by synthesizing so-called dependency triples instead of definite logic programs from symbolic evaluation graphs. This approach is much more powerful than the preceding ones, but also much more complicated.
AProVE-New: This version uses the technique from the current paper. While being as powerful as AProVE-[35], it is considerably simpler. Moreover, it directly synthesizes term rewrite systems instead of dependency triples. Thus, any tool for termination analysis of term rewrite systems can be employed afterwards while one needs a tool for the (non-standard) notion of dependency triples for the approach of AProVE-[35].

In the table, "YES" indicates the number of examples where termination could be proved, "MAYBE" the number of examples where the corresponding tool returned without an answer in 60 seconds and "Timeout" the number of times a tool exceeded the 60 seconds timeout. We use one row for each version of AProVE and highlight the best version for each column using bold font.

Tool	YES	MAYBE	Timeout	Avg. Runtime
AProVE-[31]	265	197	17	7.1s
AProVE-[32]	287	176	16	7.6s
AProVE-[35]	340	127	12	5.7s
AProVE-New	342	124	13	6.5s

Detailed results for termination

Note that the full version of AProVE combines several techniques for termination analysis of logic programs and is, thus, even more powerful than any of the presented techniques on its own (the full version can show termination of 358 out of these 477 example programs). However, it is hard to determine the exact power of one particular technique in such a combined system. Hence, we compared the techniques separately.

Complexity

The table below shows the results of our experiments on complexity analysis with one row for each tool. The first four columns give the number of logic programs that could be shown to have a constant bound ("O(1)"), a linear bound ("O(n)"), a quadratic bound ("O(n²)"), or an exponential bound ("O(n⋅2ⁿ)"). In Rows 5 and 6 we give the total number of upper bounds that could be found by the tool and its average runtime for each example, respectively. We highlight the best tool for each column using bold font.

This table shows that AProVE can find upper bounds for a much larger subset (42.8%) of the programs than any of the other tools (6.1%). However, there are also 6 examples where CASLOG or CiaoPP can prove constant (1), linear (1), quadratic (1), or exponential bounds (3), whereas AProVE fails. In summary, the experiments clearly demonstrate that our transformational approach for determining upper bounds advances the state of the art in automated complexity analysis of logic programs significantly.

Tool	O(1)	O(n)	O(n²)	O(n⋅2ⁿ)	Bounds	Avg. Runtime
CASLOG	1	21	4	3	29	14.8s
CiaoPP-o	3	19	4	3	29	11.7s
CiaoPP-r	3	18	4	3	28	12.5s
AProVE	54	117	37	0	208	10.6s

Detailed results for complexity

Determinacy

In the following tables, "YES" indicates the number of examples where determinacy could be proved, "MAYBE" the number of examples where the corresponding tool returned without an answer in 60 seconds and "Timeout" the number of times a tool exceeded the 60 seconds timeout. We present the results on definite logic programs and Prolog programs using built-in predicates separately to show the advantages and disadvantages of the different tools more clearly. We see that CiaoPP is much more powerful on definite logic programs than AProVE. However, on Prolog programs using built-in predicates like cuts, AProVE's determinacy analysis can prove determinacy of 14 examples more than CiaoPP. So while CiaoPP is more powerful in total, there are 58 examples where AProVE can prove determinacy while CiaoPP fails. Thus, our new determinacy analysis adds substantial power to existing approaches for automated determinacy analysis.

Tool	YES	MAYBE	Timeout	Avg. Runtime
AProVE	19	278	3	3.4s
CiaoPP-det	132	163	5	6.9s

Detailed results for determinacy on definite logic programs

Tool	YES	MAYBE	Timeout	Avg. Runtime
AProVE	75	100	2	3.3s
CiaoPP-det	61	109	7	8.2s

Detailed results for determinacy on Prolog programs using built-in predicates

"Symbolic Evaluation Graphs and Term Rewriting —A General Methodology for Analyzing Logic Programs" - Experiments and Proofs