Empirical Evaluation of Automated Termination Proofs for Haskell by Term Rewriting

In the paper Automated Termination Proofs for Haskell by Term Rewriting we present an approach for analyzing the termination of Haskell programs using term rewriting techniques.

Our approach proves termination of a given start term. The use of a start term is necessary, since a term like "take 10 [(0::Int)..]" terminates in Haskell, although it contains non-terminating functions. Here, the term "[(0::Int)..]" evaluates to an infinite list of integers and, thus, it obviously does not terminate. This behavior is a fundamental consequence of the lazy evaluation strategy of Haskell. In the following we present an overview of the implementation of our approach as well as an extensive experimental evaluation.


Implementation in AProVE

Haskell Evaluation Web Interface

We integrated our approach in the termination tool AProVE which implements the DP framework. A special version of AProVE allows to repeat the experiments below: Here, the user first selects which module to use as the context of his/her Haskell program. In most cases, the Prelude module is the obvious choice. Then one can enter a program and give a start term where unbound variables are interpreted as arbitrary terminating terms. By activating "Show Termination Graph" the (potentially huge) Termination Graph will be rendered for the proof output (requires a browser capable of showing inline PNGs).

Analysis Workflow

Our goal is to analyze the termination behavior of full Haskell as specified by the Haskell 98 Report. Thus, as a first step all function declarations used by the start term are transformed automatically into a simplified form suitable for analysis with the techniques presented in our paper. A Haskell function declaration is in simplified form if and only if it contains only rules with basic patterns on left-hand sides and basic terms on right-hand sides. A basic pattern contains only constructors and variables; a basic term is built using only functions, variables, and constructors. To this end, lambda abstractions, case-, let-, and if-expressions, advanced patterns, and similar constructions are eliminated by introducing new auxiliary functions. For example, lambda abstractions are removed by replacing every Haskell-term "\ t1 ... tn -> t" with the free variables x1, ..., xm by "f x1 ... xm". Here, f is a new function symbol with the defining equation "f x1 ... xm t1 ... xn = t".

In the second step, the termination graph is constructed. Its nodes are terms and its edges essentially correspond to evaluation steps in Haskell. When a term cannot be evaluated further, this is either due to a constructor expected in a rule pattern, a type class instance that must be selected, or because a function does not have enough arguments. In the former cases, all possible instances are generated and the evaluation is continued. In the latter case, the term is expanded by applying it to fresh variables until the number of required arguments is reached. When the current term is an instance of a term already present in the graph, an instantiation edge is drawn between these two, where the substituted terms remain to be evaluated.

Each strongly connected component of the graph corresponds to a DP problem (P,R) where P is built from the instantiation edges and R contains all rules for the functions used in the matching substitution. The finiteness of the resulting DP problems is analyzed by the DP Framework implemented in the AProVE system. If finiteness can be shown, all instances of the start term are terminating using the Haskell evaluation strategy.

For a more detailed description including proofs for all theorems and lemmas we refer to our paper. For details on the simplification and on strategies for constructing termination graphs consult the diploma thesis of Stephan Swiderski (in German) and the diploma thesis of Matthias Raffelsieper.


Experiments and Discussion

Examples

In our experiments, we tested the termination of the pre-defined functions exported by the modules Prelude, FiniteMap, List, Monad, and Queue of the Haskell Interpreter Hugs 98 resulting in a total of 1281 start terms.

We used the modules from Hugs 98-distribution exactly as they were with only two minor adjustments. First, to accommodate for the differences between the module system of the Haskell implementation in AProVE and the module system of Hugs 98, we made trivial adjustments to the import statements. Second, we supply implementations for the primitive functions declared but not implemented in the Hugs 98 Prelude, e.g., primPlusInt :: Int -> Int -> Int, primMinusInt :: Int -> Int -> Int, and so on. The source files that we used for our analysis can be found here:

We first try to prove termination of a start term in its most general form. If the type of the start term has a class context, then afterwards we also try to prove termination of all variants of the start term where type variables with class constraints are replaced by actual type instances. The reason is that by regarding all instances we get a finer analysis for those cases where the most general form of the start term does not terminate (or cannot be shown terminating).

To avoid redundancy, for equivalent type instances we only analyzed one instance. For the class Integral we only considered Int, as in our implementation Int is unbounded and therefore equivalent to Integer, the other instance of Integral. The instances Float and Double are also equivalent in our implementation. Thus, we only considered the Float instances. Additionally, in our implementation we do not yet support derived instances of the classes Read and Ix.

In order to obtain self-contained proofs, AProVE did not re-use previously computed termination properties of functions from libraries. This implies that for all functions used by the start term, termination has to be shown anew.

For example the proof for the start term elem :: Eq a => a -> [a] -> Bool indirectly contains also the proof for the start term (==) :: Eq a => a -> a -> Bool , because the function elem depends on the function (==).

Complexity of real programs

When evaluating the outcome of our experiments, one should keep in mind that we analyze termination of real Hugs 98 code. Click
here to compare the complexity of proving termination for self-defined take- and from-functions and the complexity of proving termination for the built-in take- and from-functions from the Prelude of Hugs 98.

Results

The following table summarizes the results of the experimental evaluation of our approach. Each of the experiments was performed with a time-out of 5 minutes. We indicate by "Yes", "Maybe", and "Timeout" the number of start terms for which proving termination succeeded, could not be shown, or encountered the time limit. Moreover, our implementation also has some limited means of disproving termination, and the number of start terms where termination could be disproved is indicated in the row "No". Indeed, at least 49 of the considered start terms are not terminating with respect to Haskell's evaluation. So, the attempt to prove their termination should fail (Maybe, Timeout, or No). The start terms with individual runtimes and proof details are available by clicking on the name of the module in the table heading. Moreover, there we also point out the 49 start terms that are known to be non-terminating.

Status FiniteMapListMonadPrelude Queue Total
YES 25880.37% 16896.55%6986.25% 49972.10% 5100.00% 99978.54%
MAYBE 0 0% 4 2.29% 1113.75% 42 6.06% 00.00% 57 4.48%
TIMEOUT 63 19.62% 2 1.14% 0 0.00% 14020.23% 00.00% 205 16.12%
NO 0 0.00% 0 0.00% 0 0.00% 11 1.58% 00.00% 11 0.85%
TOTAL 321100.00%174100.00%80100.00%692100.00%5100.00% 1272100.00%

Overall, we can conclude that our approach allows for a powerful termination analysis of Haskell programs. When subtracting the 49 start terms that are known to be non-terminating from the total of 1272, termination can be shown for 81.68 percent of all start terms.


Erratum

AProVE analyzes Haskell programs with a given start term for H-termination. However, the original definition of H-termination in Def. 2.4 needs to be slightly modified. We thank Cynthia Kop and Carsten Fuhs for their help in fixing this definition. The original definition was the following one.

Definition 2.4 (H-Termination). The set of H-terminating ground terms is the smallest set of ground terms t such that t is H-terminating if the following three conditions hold for t:
  • t does not start an infinite evaluation w.r.t. →H
  • if tH* (f t1 ... tn) for a defined function symbol f, n < arity(f), and the term s is H-terminating, then (f t1 ... tn s) is also H-terminating.
  • if tH* (c t1 ... tn) for a constructor c, then t1, ..., tn are also H-terminating.
A term t  is H-terminating iff t σ is H-terminating for all substitutions σ with H-terminating ground terms.

To see why an adaption of this definition is necessary, consider the following Haskell program, which encodes the untyped λ-calculus:

data Foo = Bar | L (Foo -> Foo)

app :: Foo -> Foo -> Foo
app (L f) x = f x

omega = L (\x -> app x x)

AProVE can show H-termination of the following start terms:

app
omega
app omega
app omega x

This would also imply H-termination of the term app omega omega since both app omega x and omega are H-terminating and a term t  is considered to be H-terminating iff t σ is H-terminating for all substitutions σ with H-terminating ground terms. But AProVE can actually prove its non-termination (AProVE also features a prover of non-termination for Haskell, which however is not very powerful). Indeed, the term app omega omega is non-terminating in Haskell, as witnessed by the following infinite evaluation:

app omega omega →H app (L (\x -> app x x)) omega →H (\x -> app x x) omega →H app omega omega →H ...

The problem is that the definition of H-termination is currently contradictory. More precisely, it is unclear whether omega should be regarded as being H-terminating or not. Both possibilities lead to a contradiction: The problem results from constructors with arguments of non-base type, like L in this example. Thus, the approach of the paper (and its implementation in AProVE) should be restricted to programs where the argument types of all constructors are base types. Analogously, type variables in the argument types of constructors must also be restricted such that they can only be instantiated with base types. Base types and types of higher order can be formally defined as follows: Now H-termination can be defined as follows by induction on the order of the types. This makes clear that now the set of H-terminating terms is well defined.

Definition 2.4 (H-Termination). The set of H-terminating ground terms is the smallest set of ground terms t such that:
  • For base types, the set of H-terminating ground terms is the smallest set of terms t that do not start an infinite evaluation w.r.t. →H ∪ { (c x1 ... xn) → xi | 1 ≤ i ≤ n, c is a constructor}. Note that due to the restriction to constructors with base type arguments, all terms resulting from →H ∪ { (c x1,...,xn) → xi | 1 ≤ i ≤ n, c is a constructor} have base type.
  • Let i ≥ 1. Then for ground terms whose type has order i+1, the set of H-terminating terms is the smallest set of terms t such that t does not start an infinite evaluation w.r.t. →H and such that for all H-terminating terms s, (t s) is also H-terminating. Note that the types of s and of (t s) have an order less than i+1, i.e., this is well defined.
A term t  is H-terminating iff t σ is H-terminating for all substitutions σ with H-terminating ground terms.