Experimental Evaluation of "Automated Termination Analysis of Java Bytecode by Term Rewriting"

We present an automated approach to prove termination of Java Bytecode (JBC) programs by automatically transforming them to term rewrite systems (TRSs). In this way, the numerous techniques and tools developed for TRS termination can now also be used for imperative object-oriented languages like Java, which can be compiled into JBC. A full version of our paper including all proofs is available here.

Implementation in AProVE

A new version of the termination tool AProVE implementing our approach is available here. In particular, this new version of AProVE allows to repeat the experiments below. It can be accessed via a web interface.

The web interface takes an arbitrary jar-file as input. As in the International Termination Competition, it then tries to prove termination of the main method of the class that is indicated in the file META-INF/MANIFEST.MF. For details, please see the definition of JBC termination problems. In order to prove termination of a specific method, one therefore has to add an appropriate main method calling the specific method with random inputs.

Tools

We compare our implementation with two other tools for termination analysis of Java Bytecode. Thus, in our experiments we use the following three tools:

AProVE
This is a new version of AProVE which is available through our web interface. For AProVE, we ran the experiments using an Intel Core i7 920 CPU with four CPU cores and 2.66 GHz each.
Julia
Julia is a nullness and termination analyzer for Java Bytecode based on path-length abstraction, developed at the University of Verona, Italy and the University of Réunion, France. The Julia team kindly provided us with access to a web interface running the current version of Julia. We performed the experiments on 21st of January, 2010. The Julia web interface was powered by an Intel Xeon CPU with four CPU cores and 2.66GHz each.
COSTA
COSTA is a cost and termination analyzer for Java Bytecode based on path-length abstraction, developed at the Politécnica University of Madrid and the Complutense University of Madrid, Spain. The results for the experiments with COSTA were kindly provided by the COSTA team. The computer used for these experiments was an Intel Core 2 CPU with two CPU cores and 1.86 GHz each.

Examples

In our experiments, we tested the tools on the 106 non-recursive JBC examples from the termination problem data base (TPDB) used in the International Termination Competition. We removed one controversial example ("overflow") from the TPDB whose termination depends on the treatment of integer overflows. Furthermore, we added the two examples "count" and "flatten" from our paper. In the experiments, we used a timeout of 60 seconds for each example. This is the same timeout that is used in the International Termination Competition.

The tables below summarize our experiments. They show that for the problems in the current example collection, our rewriting-based approach in AProVE currently yields the most precise results. The main reason is that we do not use a fixed abstraction from data objects to integers, but represent objects as terms. On the other hand, this also explains the longer runtimes of AProVE compared to Julia and COSTA. Still, our approach is efficient enough to solve most examples in reasonable time. Our method benefits substantially from the representation of objects as terms, since afterwards arbitrary TRS termination techniques can be used to prove termination of the algorithms. Of course, while the examples in the TPDB are challenging, they are still quite small. Future work will be concerned with the application and adaption of our approach in order to use it also for large examples and Java libraries. Moreover, in the current paper, we restricted ourselves to JBC programs without recursion. Of course, an extension of our method to recursive programs is another main point for future work.

In the following table Success gives the number of examples where termination was proved, Failure means that the proof failed in less than 60 seconds, Timeout gives the number of examples where the tool took longer than 60 seconds, and Runtime is the average time needed per example.

Tool Success Failure Timeout Runtime

AProVE 89 5 12 14.3 sec

Julia 74 32 0 2.6 sec

Costa 60 46 0 3.4 sec

Tool	Success	Failure	Timeout	Runtime
AProVE	89	5	12	14.3 sec
Julia	74	32	0	2.6 sec
Costa	60	46	0	3.4 sec

In the following table, each row shows the behavior of the tools on one example. The entry "YES" means that termination of that example could be proved by the corresponding tool while "MAYBE" states that the tool gave up without success. Finally, "TIMEOUT" indicates that the tool exceeded the given time limit and was therefore stopped. By clicking on the respective runtime, one can inspect the output produced by the tool. (For COSTA, we did not receive the proofs.) To load an example into the web interface of AProVE, just click on the corresponding button in the first column. Then you can run AProVE on the respective example yourself. Of those examples, 10 are known to be non-terminating, which can be easily proved manually. The corresponding lines contain the remark "(non-term.)".

When using the AProVE web interface, please keep in mind that the computer running the web interface is considerably slower (four AMD Opteron CPU cores with 2.2GHz each) than the one used for the experiments. Therefore, a higher timeout of up to 300 seconds is needed to solve all examples that AProVE could solve in the table below. Please also keep in mind that the computer used for the web interface is used by several other applications as well, so the runtimes may vary.

Example	AProVE		Julia		COSTA
	YES	19.26	YES	2.7	YES	0.54
	TIMEOUT	60	MAYBE	2.24	MAYBE	0.72
	YES	8.43	YES	2.97	MAYBE	3.64
	YES	39.74	MAYBE	5.04	MAYBE	38.0
	YES	6.42	MAYBE	4.78	MAYBE	11.08
	YES	3.97	YES	2.04	YES	0.26
	YES	3.99	YES	2.08	YES	0.42
	YES	2.44	YES	2.26	YES	0.25
	YES	4.67	YES	2.54	YES	1.18
	YES	2.34	YES	2.08	YES	0.18
	YES	27.32	YES	4.95	MAYBE	32.12
	YES	20.33	MAYBE	5.56	MAYBE	23.29
	YES	11.73	MAYBE	4.7	MAYBE	36.93
	YES	36.82	MAYBE	2.24	MAYBE	1.12
	YES	10.43	MAYBE	2.13	MAYBE	1.15
	YES	9.33	MAYBE	2.44	MAYBE	1.09
	YES	13.0	YES	2.39	MAYBE	0.9
	YES	49.51	MAYBE	2.59	MAYBE	1.31
	YES	18.34	MAYBE	4.16	MAYBE	10.8
	YES	5.3	YES	2.96	YES	0.91
	YES	3.17	YES	2.55	YES	1.0
	YES	19.29	YES	2.96	YES	1.09
	YES	3.92	YES	3.7	MAYBE	8.45
	YES	25.76	MAYBE	2.31	MAYBE	0.78
	YES	2.18	YES	2.29	MAYBE	0.24
	YES	3.22	YES	2.61	MAYBE	0.93
	YES	20.52	MAYBE	2.35	MAYBE	0.76
	TIMEOUT	60	YES	2.14	MAYBE	0.62
	TIMEOUT	60	MAYBE	2.12	MAYBE	0.98
	YES	2.51	YES	1.97	YES	0.2
	YES	3.53	YES	2.19	MAYBE	0.73
	TIMEOUT	60	MAYBE	2.13	MAYBE	0.95
	YES	26.16	MAYBE	4.61	MAYBE	25.22
	YES	4.34	YES	2.15	YES	0.34
	YES	30.39	YES	2.23	YES	0.29
	YES	4.92	YES	2.28	YES	0.18
	YES	2.35	YES	2.03	YES	0.18
	YES	2.23	YES	1.82	YES	0.18
	YES	2.78	YES	2.13	YES	0.23
	YES	2.42	YES	2.18	YES	0.25
	YES	2.0	YES	2.13	YES	0.19
	YES	2.93	YES	2.07	YES	0.22
	YES	4.48	YES	1.98	YES	0.24
	YES	3.83	YES	2.04	YES	0.26
	YES	2.83	YES	2.23	YES	0.34
	YES	35.67	YES	2.34	YES	0.53
	YES	4.46	YES	1.78	YES	0.26
	YES	5.95	YES	2.23	YES	0.31
	YES	4.8	YES	2.18	YES	0.19
	YES	12.53	YES	2.34	MAYBE	0.27
	YES	9.22	YES	2.29	YES	0.26
	YES	2.03	YES	2.13	YES	0.17
	YES	2.04	YES	2.18	YES	0.19
	YES	2.64	YES	2.18	MAYBE	0.16
	YES	2.22	YES	2.09	YES	0.19
	YES	2.13	YES	1.93	YES	0.17
	YES	2.09	YES	2.13	YES	0.2
	YES	2.56	YES	2.14	MAYBE	0.24
	YES	3.63	YES	2.39	YES	0.19
	YES	9.2	YES	2.28	YES	0.34
	YES	9.25	YES	2.2	YES	0.36
	YES	3.72	YES	2.28	YES	0.16
	YES	5.39	MAYBE	2.08	YES	0.19
	YES	7.31	YES	2.26	YES	0.62
	TIMEOUT	60	MAYBE	2.14	YES	0.19
	YES	3.1	YES	2.19	YES	0.27
	YES	6.04	YES	2.24	YES	0.43
	YES	3.12	YES	2.34	MAYBE	0.85
	YES	13.53	MAYBE	1.93	MAYBE	0.52
	YES	2.76	YES	3.12	YES	1.28
	YES	23.78	MAYBE	4.76	MAYBE	25.73
	YES	4.54	YES	3.72	MAYBE	4.03
	TIMEOUT	60	MAYBE	3.86	MAYBE	4.33
	YES	2.04	YES	1.96	YES	0.16
	YES	2.65	YES	2.45	YES	1.5
	YES	25.19	MAYBE	7.82	MAYBE	28.3
	YES	1.33	MAYBE	1.71	MAYBE	0.72
	YES	1.24	YES	1.87	YES	0.05
	YES	8.42	YES	2.61	YES	1.43
	YES	1.59	YES	2.04	YES	0.06
(non-term.)	MAYBE	17.37	MAYBE	1.92	MAYBE	0.2
(non-term.)	MAYBE	18.68	MAYBE	2.51	MAYBE	6.13
	YES	1.6	YES	4.64	YES	2.15
	YES	3.35	YES	2.44	YES	0.91
	YES	2.32	YES	1.99	YES	0.23
	YES	1.28	YES	2.13	YES	0.2
	YES	1.46	YES	2.62	YES	2.27
(non-term.)	TIMEOUT	60	MAYBE	1.75	MAYBE	0.94
(non-term.)	TIMEOUT	60	MAYBE	1.88	MAYBE	0.35
(non-term.)	TIMEOUT	60	MAYBE	1.97	MAYBE	0.19
	YES	1.42	YES	1.97	YES	0.09
(non-term.)	TIMEOUT	60	MAYBE	1.92	MAYBE	0.2
	YES	1.36	YES	2.13	YES	0.09
	YES	1.3	YES	1.97	YES	0.19
(non-term.)	MAYBE	17.14	MAYBE	2.02	MAYBE	0.52
	TIMEOUT	60	YES	4.24	YES	21.47
	YES	6.04	YES	4.58	MAYBE	22.2
	YES	1.97	YES	1.71	YES	0.11
	YES	1.55	YES	2.19	YES	0.08
	YES	1.5	YES	1.77	YES	0.06
	YES	1.28	YES	2.71	YES	1.45
(non-term.)	TIMEOUT	60	MAYBE	2.03	MAYBE	0.91
(non-term.)	MAYBE	17.08	MAYBE	2.14	MAYBE	0.12
(non-term.)	TIMEOUT	60	MAYBE	2.01	MAYBE	1.8
	YES	1.39	YES	2.13	MAYBE	0.5
	YES	1.34	YES	4.08	MAYBE	12.42