The Java Virtual Machine (JVM) is a key part underlying the definition and portability of Java programs. The JVM is a relatively clean and simple abstract machine, which has been realized in software (as an interpreter) and is being realized in hardware as well. The JVM has a bytecoded instruction set designed to be compact and easily interpreted in either hardware or software.
The JVM is an object-oriented machine, manipulating objects as well as numeric data. The JVM also supports multithreaded programs, although much of the semantics of concurrent programs are defined by standard classes, rather than by the JVM itself.
The JVM was designed to allow efficient, safe execution of (bytecode compiled) Java programs. One innovative technique employed to promote efficiency is the use of the bytecode verifier. The JVM does not assure type-safe execution by itself. Rather, JVM execution is intended to be type-safe when running programs that have been ``accepted'' by the bytecode verifier. The bytecode verifier is intended only to accept programs whose execution on the JVM will be type-safe. The standard JVM is defined to elide some type-safety checks based on these intended guarantees, using these guarantees to justify optimizations in the JVM definition that would not be type-safe without those guarantees.
Thus, while the JVM execution is intended to be type-safe when running programs accepted by the bytecode verifier, the type-safety is not obvious. The JVM is not guaranteed to be type-safe when running arbitrary programs.
The defensive JVM (dJVM) augments the JVM definition with additional run-time checks to assure type-safe execution. The definition is intended to be obviously type-safe when run on any program. There is no need for a bytecode verifier to screen programs.
The dJVM is obviously type-safe. At least the dJVM is intended to be obviously type-safe. In contrast, the JVM is obviously not type-safe by itself. The behavior of the JVM is only specified for programs that are (mostly) type-safe. For example, the behavior of the iadd instruction is not defined unless the top two words on the operand stack are int values. In contrast, the behavior of iadd on the dJVM is completely defined.At least the behavior of iadd is defined in all contexts in which such a question makes sense. For example, it does not make sense to ask about the effect of executing an iadd instruction if the call stack is empty, because that means no method is active to execute the instruction.
The effect of every instruction in the dJVM is defined for all possible dJVM states. For every state in which the effect of a JVM instruction is defined, the corresponding dJVM instruction has the same effect on the the corresponding dJVM state. In addition, in the states for which a JVM instruction is undefined, the effect of the corresponding dJVM instruction in the corresponding dJVM states is to halt with an error indication.If some constraint (a ``must'' or ``must not'') in an instruction description is not satisfied at run time, the behavior of the Java Virtual Machine is undefined [emphasis added -- RMC].
[ 6.1, p. 151 in JVMS]
Some things we can do with a formal dJVM model:
The current model is the first phase of modeling the JVM in ACL2. It was intended to demonstrate that such a model could reasonably capture the key properties of the JVM, but to be a complete JVM model. It is a rough draft.
This model includes the basic object-oriented instructions and data structures. It leaves out concurrency, interfaces, arrays, and garbage collection. It leaves out floating point data. It only includes 103 of the 202 standard JVM instructions.
These are the essential features of the JVM, and were included in the dJVM 0.5 model so that this first model would address these key aspects of the JVM. These features also addresses the key aspects of object creation and manipulation in this initial phase of the model.
A minimal subset of standard Java classes needed to support class definition, object creation, and method execution are provided. This includes portions of the standard classes: Object and Class.
Required instructions: new, aconst_null
Required instructions: invoke virtual, invoke special, invoke static, return, areturn, ireturn.
Required instructions: getfield, putfield, getstatic, putstatic.
Some of the instructions that manipulate int or long values are included, except those related to excluded types (e.g., the i2f instruction, which converts an int integer value to a floating point value.)
i2l, i2s, iadd, iand, idiv, iinc, imul, ineg, ior, isub, ixor, l2i, ladd, land, ldiv, bipush, sipush
Such instructions include:
aload, astore, iload, istore, lload, lstore
pop, pop2, dup, dup2, dup_x1, dup_x2, dup2_x1, dup2_x2, swap
goto, if_acmpeq, if_acmpne, if_icmpeq, if_icmpne, if_icmplt, if_icmpge, if_icmpgt, if_icmple, ifeq, ifne, iflt, ifle, ifge, ifnull, ifnonnull
A number of important features of the JVM have not been included in the dJVM 0.5 model. Some of these are candidates for the first extensions to dJVM 0.5.
Technically the class-file verifier and bytecode verifier are not part of the JVM itself. However, they do form an important component of most practical Java implementations. As such they would be worthy extensions of the dJVM.
They are an implicit part of the JVM, because they are invoked implicitly when a class is created; they are never invoked explicitly.
dJVM presumes that all class objects have been constructed and initialized prior to the start of execution (i.e., effectively constructed by some primordial loading process).
Including the instructions jsr, jsr_w, ret, and athrow.
The instructions monitorenter and monitorexit have been be left out, as well as the special-case of the method-invocation instructions that deals with synchronized methods.
Garbage collection is logically a no-op, and so can reasonably be left out of the model.
This section describes features of the JVM that do not pose problems for modeling or formalization. They have been left out of this initial dJVM model merely to reduce the level of effort required.
ACL2 does not support floating point numbers.ACL2 does support arbitrary-precision integers (a.k.a., bignums) and rational numbers. The axiomatic description of rational arithmetic is much more tractable than that of floating point arithmetic. Further, unlike floating point numbers, rational numbers obey the normal laws of arithmetic! So it is unlikely that the dJVM model will ever be extended to define floating point operations in the JVM. However, since the JVM is defined to perform floating point calculations according to the IEEE Floating Point Standard, there is little advantage to such an extension.
Native methods are really extensions to the JVM. They are effectively new instructions that are executed via the method invocation mechanism, rather than via normal instruction dispatch.
Extending the dJVM to handle native methods would require two steps:
Eventually we want to augment our model to describe interactions with external resources (e.g., file systems and networks).
The dJVM model is built in ACL2. ACL2 is a mathematical logic based on the applicative subset of Common Lisp [CommonLisp]. As such it provides a means for describing a formal specification of the dJVM. ACL2 also has a theorem prover for the logic. This theorem prover is the latest in a line of theorem provers built by Bob Boyer and J Moore. [ACL,ACLH,ACL2,acl2-industrial] Thus we can (attempt to) prove properties of these specifications.
ACL2 allows specifications to be built using function definitions, in the manner of pure LISP and other functional programming languages. Since ACL2 is based on a subset of Common Lisp, specifications may be executed in an underlying implementation of Common Lisp. Thus we can build executable specifications.
We prefer to use the terms formal model and executable model, rather than formal specification and executable specification, because these formal artifacts are always models of the real computing systems they are meant to describe. Often they are abstract models, leaving out some level of detail in order allow formal analysis and proofs about the behavior of the model. If it is a good model of the real computing system, then the analysis of the behavior of the model should also be predictive about the behavior of the real system.
The text of this report is mechanically derived from the dJVM definition accepted and compiled by the ACL2 system. In the tradition of Knuth's literate programming, the formal program and its description were woven together, and subsequently derived portions can be processed by either a compiler, a theorem prover, or a document formatter. [literate-programming,weave]
Thus, the program text that appears in this report is derived directly from the ASCII files used for compilation and execution of the model. This increases our confidence in the fidelity of this presentation to the executable model.
There are two versions of this report. One includes only the executable portion of the dJVM definition, and is intended to be more usable as straightforward (and somewhat more compact) explanation of the behavior of the dJVM.
A second version of this report includes the theorems and rewrite rules given to the ACL2 theorem prover as part of the model development and to support ACL2 guard verification for the model. This is not a complete rendition of the ACL2 proof script because some ACL2 directives are suppressed in the text.All of the definitions and theorems of the ACL2 proof script are included. Numerous directives enabling and disabling rewrite rules are suppressed, as are the directives for including ACL2 ``books'' (i.e., libraries). Of course, all of the details are available in the ACL2 source files, from which the sources for this document were derived.
ACL2 requires that programs be defined in a bottom-up fashion. Each function must be defined (or axiomatized) before it can be used. The presentation in this report follows this same basic order. However, some portions of the definition that don't contribute to the exposition appear at the end of this document as appendices. These portions include the description of bounded-integer arithmetic and the definitions of macros used to build the actual instruction definition. The macros are described in the main text, but the details of their definitions are relegated to the appendices.
The File Structure of the Model
The ``source code'' for the ACL2 model (and this report) are divided into a number of separate files. The files form a dependency hierarchy with most files depending upon definitions and rules introduced by other files. Here's a list of the files, and roughly bottom-up order. Several low-level files that do not contribute to the exposition appear as appendices.
Here's a brief overview of the remainder of this report.
This report documents a proof-of-concept for building a formal ACL2 model of the defensive Java Virtual Machine. Due to the fast pace of the development of this initial model, I have not been entirely consistent in the use of naming conventions. In fact, I employed several in different sections of the model. Common Lisp and ACL2 are normally not case sensitive in resolving function and variable names. The name Calling-Frame and the name calling-frame are considered to be the same. In some places I have taken advantage of that, and have used capitalization in an attempt to improve the readability of long function-names.I'd like to use mixed-case function names throughout the presentation, but don't yet.
When function definitions are presented, the name of the function is typeset in a larger and different type face to make it stand out. The same style is used for ACL2 events that define theorems, data structures, and so on.
The text notes about ACL2, future extensions of the dJVM model, the Java language, and the Java Virtual Machine. These notes are set out from the regular text in the following style.
Note on the
Java
Language
Specification{
The first public version of the Java language manual was
approximately 35 pages.
The second version was about 70 pages. The third (and current)
version is over 800 pages. As you ask more and more specific
questions, the explanation of the details just seem to multiply.
}
The margin note identifies the topic of the note.
The ACL2 definitions that comprise the dJVM 0.5 model appear in the text in a fixed width typeface. To facilitate perusal of the definitions cross-references to functions appear in the margins. For example, the definition of the function local-method? from page [local-method?] appears below.
(defun local-method? (method-name method-sig class-decl) (declare (xargs :guard (and (stringp method-name) (stringp method-sig) (class-decl-p class-decl)))) (java-method-bound? method-name method-sig (class-decl-methods class-decl)))The boxes appearing in the margin are cross references for a function that appears on the corresponding line of the definition. In this case indicating that the definition of class-decl-p can be found on page [class-decl-p], and the definition of java-method-bound? can be found on page [java-method-bound?]. The function to which the cross-reference refers is printed in small capital letters (e.g., as java-method-bound? and class-decl-methods above).
is supposed to put margin notes in the outer margin of the page. However, it doesn't always get it right, as you may be able to observe on this page and elsewhere in this document. Apparently the page-fill algorithm doesn't reliably communicate with the margin note placement mechanism when text is moved to start a new page. I'm not sure how to fix this.