In very recent work (see "A Linear Operational Semantics for
Termination and Complexity Analysis of ISO Prolog" published at LOPSTR '11 and its corresponding full version), we introduced a novel operational semantics for ISO Prolog. This semantics
is linear, i.e., each query gives rise to a linear evaluation sequence instead of a tree, and, thus, lends itself well to various analysis
tasks such as termination, complexity, or determinacy analysis.
In the paper "Symbolic Evaluation Graphs and Term Rewriting — A General Methodology for Analyzing Logic Programs"
we present a transformational approach for automatically proving termination, finding asymptotic
upper bounds on the runtime, and proving determinacy for Prolog queries, based
on automated analysis techniques developed for term rewriting. First, we show how to obtain
symbolic evaluation graphs for
classes of queries. Then we present a method to synthesize term rewrite systems (TRSs)
from a symbolic evaluation graph such that the resulting TRSs can simulate all evaluations of
the original Prolog program w.r.t. the given class of queries. Thus, termination of the resulting TRS
implies termination of the original Prolog program. We continue by showing that the generated
TRSs can also be used to prove upper bounds for the runtime complexity of the original Prolog
program. Here, the runtime of a Prolog program is measured as the number of unification attempts
during the evaluation of a query. Finally, we introduce a syntactic criterion based on symbolic
evaluation graphs which is sufficient to prove determinacy of a class of queries.
A full version of our paper including all proofs is available
here. For details on the heuristics used to construct symbolic evaluation graphs, we
refer to the
diploma thesis of Thomas Ströder.
Implementation in AProVE
We integrated our new transformational approach in the
termination, complexity, and determinacy analyzer AProVE which is
one of the most powerful termination tools for logic programs.
It can be accessed via a custom web
interface. Here, an answer like "BOUNDS(O(1), O(n^2))"
means that O(1) is a lower bound and O(n2) is an upper bound for the runtime of the
logic program. Currently, AProVE does not support techniques to prove lower bounds for Prolog
programs other than "O(1)", i.e., up to now we only focus on upper bounds.
Settings
We compare our termination, complexity, and determinacy analysis in AProVE to other existing automated analyzers for logic programs:
- Automated Program Verification Environment (AProVE):
AProVE is one of the most powerful automated analyzers for termination of programs in several
programming languages including term rewriting, Haskell, Prolog, and Java.
This new experimental version of AProVE contains an implementation of symbolic evaluation graphs
and of our transformation
to term rewriting. Then it uses its existing backend for termination and complexity analysis of TRSs
and concludes termination of the original Prolog program or translates the bounds found for the
TRSs to upper bounds for the Prolog program. For determinacy analysis, the construction of a
symbolic evaluation graph and the check of a syntactic criterion on this graph are already sufficient to
prove determinacy of a class of queries.
- Complexity Analysis System for LOGic (CASLOG):
This system was the first automated complexity analyzer for logic programs and its original version from 1992
is still available. As this version requires an outdated version of Sicstus Prolog, we adapted it such that it can be compiled using Sicstus 4 or Ciao Prolog.
For repeatability of the experiments, the adapted version is available for download with the permission of the CASLOG author Nai-Wei Lin.
In the tables below, we used a version that was compiled using Ciao Prolog. Results using Sicstus Prolog were identical up to minor differences in runtime.
CASLOG performs the complexity analysis in a fully automatic way, but it requires that the user annotates moding and measure information, i.e., the user has to indicate which arguments
are to be considered as input arguments and which norms should be used to measure their size. To obtain these annotations automatically, in the tables below,
we used CiaoPP as a pre-processor for CASLOG.
- Ciao Preprocessor (CiaoPP):
The Ciao System contains a pre-processor
that offers a wealth of (static) analysis techniques, including two techniques
to determine upper bounds for logic programs and one technique to prove determinacy.
For complexity analysis, we used both the original cost analysis
(CiaoPP-o) and CiaoPP's new resource framework which allows to measure
different forms of costs (CiaoPP-r). Here, we chose the cost measure
"res_steps" which approximates the number of resolution steps needed in
evaluations. CiaoPP-det is the determinacy analysis implemented in CiaoPP.
Examples
We ran the tools on all 477 examples for logic programs from the Termination Problem Data Base (8.0.6)
used in the international Termination Competition 2012. These examples have been collected
from a large number of sources including several papers, entries for programming contests, and freely available Prolog programs.
Experiments & Discussion
All three tools were run on 2.2 GHz Quad-Opteron 848 Linux machines with a timeout of 60 seconds
per example, as in the competitions on automated termination and complexity analysis (which are
held annually as part of the Termination Competition).
The results for termination, complexity and determinacy analysis are
presented separately. Each section concludes with a link to a table presenting
all individual examples and the results of the tools on them. Clicking on the
times when a proof has been found opens the corresponding proof. Moreover, by
clicking on the name of the example in the first column, one can run AProVE on
this example in the web interface.
Termination
Here, we tested four different techniques implemented in AProVE against each
other. Since AProVE is already the most powerful tool for termination analysis
of logic programs (as indicated in the annual Termination Competitions), there is
no repeated comparison to other termination provers for logic programs.
The four techniques (named according to the list of references in the current
paper) are:
- AProVE-[31]: A direct transformation from definite logic programs to term
rewrite systems presented in
ACM TOCL '09.
If a logic program contains cuts, they are just ignored.
- AProVE-[32]: The first method for termination analysis of logic programs with
cut based on symbolic evaluation graphs presented at
ICLP '10.
Here, we synthesize
definite logic programs from such graphs which can then be
analyzed further (e.g., by using the technique of TOCL '09).
- AProVE-[35]: At LOPSTR '10,
we improved the method of ICLP '10 by synthesizing so-called
dependency triples instead of definite logic
programs from symbolic evaluation graphs. This approach is much more
powerful than the preceding ones, but also much more complicated.
- AProVE-New: This version uses the technique from the current paper. While being as
powerful as AProVE-[35], it is considerably simpler. Moreover, it
directly synthesizes term rewrite systems instead of dependency triples.
Thus, any tool for termination analysis of term rewrite systems can
be employed afterwards while one needs a tool for the (non-standard)
notion of dependency triples for the approach of AProVE-[35].
In the table, "YES" indicates the number of examples where termination could be
proved, "MAYBE" the number of examples where the corresponding tool returned
without an answer in 60 seconds and "Timeout" the number of times a tool
exceeded the 60 seconds timeout. We use one row for each version of AProVE and
highlight the best version for each column using bold font.
Tool | YES | MAYBE | Timeout | Avg. Runtime |
AProVE-[31] | 265 | 197 | 17 | 7.1s |
AProVE-[32] | 287 | 176 | 16 | 7.6s |
AProVE-[35] | 340 | 127 | 12 | 5.7s |
AProVE-New | 342 | 124 | 13 | 6.5s |
Detailed results for termination
Note that the full version of AProVE combines several techniques for termination
analysis of logic programs and is, thus, even more powerful than any of the presented
techniques on its own (the full version can show termination of 358 out of these 477
example programs). However, it is hard to determine the exact power of one particular
technique in such a combined system. Hence, we compared the techniques separately.
Complexity
The table below shows the results of our experiments on complexity analysis with one
row for each tool. The first four
columns give the number of logic programs that could be shown to have a constant
bound ("O(1)"), a linear bound ("O(n)"), a quadratic bound ("O(n2)"), or
an exponential bound ("O(n⋅2n)").
In Rows 5 and 6 we give the total number of upper bounds that could be found by the
tool and its average runtime for each example, respectively. We highlight the best
tool for each column using bold font.
This table shows that AProVE can find upper bounds for a much larger subset (42.8%) of
the programs than any of the other tools (6.1%).
However, there are also 6 examples where CASLOG or CiaoPP can prove constant (1),
linear (1), quadratic (1), or exponential bounds (3), whereas AProVE fails. In summary,
the experiments clearly demonstrate that our transformational approach for determining
upper bounds advances the state of the art in automated complexity analysis of logic
programs significantly.
Tool | O(1) | O(n) | O(n2) | O(n⋅2n) | Bounds | Avg. Runtime |
CASLOG | 1 | 21 | 4 | 3 | 29 | 14.8s |
CiaoPP-o | 3 | 19 | 4 | 3 | 29 | 11.7s |
CiaoPP-r | 3 | 18 | 4 | 3 | 28 | 12.5s |
AProVE | 54 | 117 | 37 | 0 | 208 | 10.6s |
Detailed results for complexity
Determinacy
In the following tables, "YES" indicates the number of examples where determinacy could be
proved, "MAYBE" the number of examples where the corresponding tool returned
without an answer in 60 seconds and "Timeout" the number of times a tool
exceeded the 60 seconds timeout. We present the results on definite logic programs and
Prolog programs using built-in predicates separately to show the advantages and
disadvantages of the different tools more clearly. We see that CiaoPP is much more
powerful on definite logic programs than AProVE. However, on Prolog programs using built-in
predicates like cuts, AProVE's determinacy analysis can prove determinacy of 14 examples
more than CiaoPP. So while CiaoPP is more powerful in total, there are 58 examples where
AProVE can prove determinacy while CiaoPP fails. Thus, our new determinacy analysis adds
substantial power to existing approaches for automated determinacy analysis.
Tool | YES | MAYBE | Timeout | Avg. Runtime |
AProVE | 19 | 278 | 3 | 3.4s |
CiaoPP-det | 132 | 163 | 5 | 6.9s |
Detailed results for determinacy on
definite logic programs
Tool | YES | MAYBE | Timeout | Avg. Runtime |
AProVE | 75 | 100 | 2 | 3.3s |
CiaoPP-det | 61 | 109 | 7 | 8.2s |
Detailed results for determinacy on Prolog
programs using built-in predicates