Overview of the dJVM Specification

The dJVM specification is a formal description of a defensive version of the JVM. This description of the dJVM 0.5 is complete, in the sense that all questions about the behavior of the dJVM can be answered strictly from the ACL2 definitions in this formal description.Of course, the description of ACL2 and the public libraries are also necessary for a full formal definition of the dJVM. Also, this report only describes the initial dJVM 0.5 model, so not all of the ultimate dJVM 1.0 has been modeled. However, this report does not by itself present a full explanation of the JVM or the Java language. This report is more akin to the first draft of a formal reference manual for the dJVM. It serves to answer questions about the virtual machine's behavior in detail, but does not attempt to serve as an introduction either to the Java Virtual Machine or the Java programming language. The reader should already be familiar with them, or be prepared to assimilate them from the formal definitions with the occasional accompanying brief explanations.

We recommend that the reader not already familiar with these topics have ready access to The Java Virtual Machine Specification [JVMS]. Java Virtual Machine [meyer97] provides an informal, but less authoritative, introduction to the JVM. Access to The Java Language Specification [JLS] may also be useful. The Java Language [arnold96] provides a good introduction to the Java language.

The dJVM as a State Machine

The dJVM state consists of a set of class declarations, a set of objects (instances of classes), a call stack, and a status flag.

Future Extensions: To support multiple threads, the dJVM state would need to be extended to include multiple call-stacks with new call-stacks being created each time a new thread was started.

The call stack contains stack frames representing each of the method invocations whose execution has not yet completed.

dJVM execution proceeds by taking the next step (i.e., executing the next instruction) for the top-most (most recently created) stack frame. If that executes a JVM instruction (other than one of the ``return'' instructions), then the state of that frame is updated (e.g., adjusting the local data stack, local variables, and local PC appropriately), and execution continues. If the instruction was one of the ``return'' instructions, then that stack frame is popped from the evaluation stack, and any returned-value is pushed onto the data stack of the new top-most stack frame. If an error occurred during execution of the step, then the dJVM 0.5 halts with status indicating what error occurred.

Future Extensions: The dJVM 0.5 model does not support throwing or handling exceptions. When an exception would be thrown by the JVM, the dJVM halts.

The dJVM 0.5 should be extended to support throwing and catching exceptions.

Overview of dJVM State

The dJVM state includes four basic components:

a heap of objects,
a call stack, which records information about currently executing methods.
a class table mapping class names to class definitions, and
a status flag, indicating whether the dJVM is halted.

The call stack contains call frames. Each call frame corresponds to a method invocation that has not yet completed.

Future Extensions: The dJVM 0.5 state can be extended to support multi-thread computations simply by adding multiple call stacks.

A call frame contains:

an operand stack
a vector of local variables (called ``registers'' in the early JVM specification)
a method

Instructions as State Transitions

An Executable Interpreter

Naming conventions

Instruction definitions are usually named in the style of djvm-execute-XXX with XXX replaced with the instruction name.

Standard Pieces of Instruction Definitions

Instruction definitions are broken into several parts. For each instruction we will (normally) define one function to correspond to each of these parts. By convention, each of these functions have standard names and take the standard argument lists:

(xxx-wff-inst? inst)
This function tests whether its argument is a well-formed dJVM instruction, where xxx stands for the name of the instruction whose form is to be checked.
(xxx-proper-arg-types? inst frame)
This function tests whether the types of the instruction arguments and operands in the frame are properly typed. Instruction arguments are the values that appear within an instruction. In addition to instruction arguments, most instructions also take additional operands from the operand stack or from local variables.
Note that this test is made with respect to the current frame, but without respect to objects in the heap. This is meant to correspond to information that could be checked by a bytecode verifier at load time.
(xxx-resolve-args inst djvm)
This function is a place-holder for potential resolution of instruction arguments or class initialization. In all of the current instruction definitions it is a no-op.
(xxx-proper-arg-values? inst resolved-djvm)
This function tests whether the values of the instruction arguments and operands are appropriate. For example, this function checks that operands that reference objects in the heap identify instances of the appropriate reference types (or null). In contrast, the xxx-proper-arg-types? test could only check that the operand was some reference type, since resolving the actual reference type requires accessing the object in the heap.
This function may accept values that are defined to trigger run-time exceptions. For example, it checks that the stack operands of the integer division instruction idiv are integers, but does not check that the denominator is non-zero. The idiv instruction is defined to signal a run-time exception if the denominator is zero.
These tests may include tests normally considered part of class, field, and method resolution.
(djvm-execute-xxx-optimized instruction djvm)
This function defines the behavior of the xxx instruction assuming that all of the tests just mentioned succeeded. Thus, we know that the instruction is well-formed, that its arguments and operands all have proper types and values, and proper reference values (for operands that are of reference types).
(xxx-proper-result-type-? inst new-djvm)
For instructions that leave a result on the operand stack, this function checks that the result is of an appropriate type. For example, idiv-proper-result-type? checks that after normal completion of the idiv instruction the top-most operand on the stack is of type int.
(djvm-execute-xxx-inst inst djvm-with-pc-incremented)
This is the defensive definition of the instruction. After making the tests mentioned above it applies the optimized version, and checks the result type afterward.

Appendix [ch:Define-JVM-Instruction-Macro] describes the forms and macros used for defining instructions in more detail.

Instruction Formats

Some instructions include small integer arguments. When an JVM instruction takes an argument that is an index into the constant pool, the corresponding dJVM instruction takes the corresponding value or values from the constant pool. For example, the JVM instruction getfield takes as its single argument an index identifying a CONSTANT_Fieldref in the constant pool. The dJVM instruction getfield takes three arguments, the three string constant items (class name, field name, and type signature) identified in the constant pool. Here is an example of the dJVM instruction:


    (getfield "Point" "x" "I")

where ``Point'' is the name of a class of the instance to be accessed, ``x'' is the name of the field to be accessed, and ``I'' is the type signature of that field. As well as distributing the constant-pool entries into the instructions, the dJVM uses symbolic opcodes, rather than the numeric bytecode values. Both of these choices were made to make the dJVM states easier to examine and construct. They also help to simplify the dJVM model somewhat, allowing us to concentrate on the richer parts of the definition.