[Table of Contents]

state.lisp

JVM Data Types

Primitive Data Types


Remark: Integrate this section with Chapter [ch:primitive-values]

In addition to the primitive numerical types discussed earlier, the JVM also supports several non-simple types:

  1. object-reference
  2. array-reference (essentially a subtype of object-reference)
  3. address (created by jsr and jsr_w instructions, used by iret \& astore-class instructions).

The JVM does not fully support byte, char, and short as data types. They are only partially supported. Their support is limited to:

Return Addresses


Remark: Return addresses are not part of the dJVM 0.5. This section is here as a place-holder for a future version.


Future Extensions: The dJVM 0.5 does not support ``return addresses'' or the JVM's mechanism for subroutine call/return within a method. This JVM mechanism is used to support exception handlers and Java's try/finally construct. So neither of these constructs are yet supported.

Program addresses can only be manipulated in a very restricted fashion. They appear in three different contexts in the dJVM:

  1. Return addresses can be generated by the jsr and jsr_w instructions, which push a return address on the stack. They can be moved from the stack into a local variable via an astore instruction.

    And a local variable containing an address may be used by the ret instruction. No other JVM instructions can manipulate an address value.

  2. Addresses appear in call frames as the PC component.

In the dJVM, numeric program addresses appear as labels in the body of bytecoded methods. These numeric addresses appear as simple integers, rather than as tagged values. Everywhere else in the dJVM that program addresses appear, they must be represented appropriately as tagged values.


Future Extensions: Program addresses can also appear in various attribute tables associated with method declarations, such as line-number tables, exception tables, and local variable name tables. These are not supported in the dJVM 0.5.

Some preliminary lemmas


Remark: This should be moved to Appendix [ch:preliminaries].

This section contains lemmas related to ACL2's initial theories or those introduced by the libraries. These lemmas relate to those basic theories, rather than to the theory of the dJVM itself. They may eventually have been included in the base system or in the appropriate books.


(defthm binding-of-unbound-key
  (implies (and (alistp alist)
                (not (bound? key alist)))
           (equal (binding key alist) nil))
  :hints (("Goal" :in-theory (enable bound? binding))))

Objects and the Heap

There are two kinds of objects on the heap: instances and classes (really, class surrogates). The reason that classes are distinguished from instances is that classes are not merely instances of the class Class. A class must also include meta-information from the class declaration (e.g., method and field declarations, type information about static variables, etc.) that is used by the interpreter or to identify a type-correct state. Classes are not ``first-class citizens'' of the heap, because this meta-information is not directly accessible or manipulatable by user programs.The addition of the reflection API in JDK 1.1 extends the amount of meta-information for which user programs can have ``surrogate objects,'' but still does not allow user programs direct access to this information. We have not yet considered the implications of the reflection API to the dJVM, but do not anticipate any conceptual problems. For this reason we sometimes refer to these class objects in the heap as class surrogates. The surrogate object satisfies the language requirement that there be a user-visible object corresponding to each class. The surrogate class objects also contain the user-accessible class static variables and the class monitor (when the dJVM is extended to support concurrency).

The meta-information of a class declaration (as recorded in a Java class file) is represented in a separate structure, called a class-decl structure. These class-decl structures are kept in the class table of the dJVM state, which is distinct from the standard heap.

The class table entries are indexed by (fully qualified) class name.


Remark: Fix this up to describe how we handle the null class loader.


Future Extensions: Class declarations in the class table are really indexed by the pair class name and class loader. However, in dJVM 0.5 only the null class loader is supported.

Instances and Class Surrogates

Class instances and class surrogates are the objects in the heap.

The Java Virtual Machine does not require any particular internal structure for objects.
[ 3.7, p. 69 in JVMS]

There are several invariants that should hold on the JVM state. Each class-decl's superclass must refer to another class-decl in the class table. Each object in the heap is either an instance of a class-decl in the class table, or it is a class surrogate. The class table segregates the class-decl structures from the heap of user-manipulatable objects. This will be convenient in describing these invariants. The class-decl structures could be included directly in the class surrogates; it proves convenient in our model to segregate them (e.g., asserting that the class-decl structures do not change at run time).

var-bindings are used to record variable bindings within an object. Methods refer to instance variables by symbolic name. The values of the variable bindings must be stack values as recognized by sv-p. The use of ``weak'' structure predicates is described in section [defstructure], [defstructure].


(defalist var-binding-p (l)
  "Define a theory of alists mapping variable names to JVM tagged-values.
   This will be used for maps of field names to field values."
  (stringp . fv-p)
  (:options (:domain-type string-listp)
            (:range-type  fv-listp)
            ))

(defalist weak-var-binding-p (l) "Define a weak theory of alists mapping variable names to JVM tagged-values. This will be used for maps of field names to field values." (stringp . weak-tv-p) (:options (:domain-type string-listp) (:range-type weak-tv-listp) ))

(defthm string-alistp-var-binding-p (implies (var-binding-p x) (string-alistp x)) :hints (("Goal" :in-theory (enable var-binding-p))) :rule-classes (:forward-chaining))

(deflist var-binding-listp (l) var-binding-p)

(deflist weak-var-binding-listp (l) weak-var-binding-p)

(in-theory (disable var-binding-p))

An instance contains the instance variables declared by its class and those declared by all of its superclasses. This is reflected in an instance by a two-level variable-binding list. The upper level maps class names to the bindings of the instance variables declared by that name.

As an example, assume the following two class declarations:


      class Point extends Object {
        int x, y;
      }

class ThreeDPoint extends Point { int z; }

The instance variables of ThreeDPoint could be represented as

      (("ThreeDPoint" ("z" . (:int 15)))
       ("Point"       ("x" . (:int 5))
                      ("y" . (:int 10))))
This structure will fit easily with the JVM instructions getfield and putfield, because those instructions identify instance variables by specifying the instance variable name and the class name. The predicate instance-data-p recognizes such a two-level variable-binding.

(defalist instance-data-p (l)
  "Define a theory for alists mapping class names to variable-bindings."
  (stringp . var-binding-p)
  (:options (:domain-type string-listp)
            (:range-type  var-binding-listp)
            ))

(defalist weak-instance-data-p (l) "Define a weak theory for alists mapping class names to variable-bindings." (stringp . weak-var-binding-p) (:options (:domain-type string-listp) (:range-type weak-var-binding-listp) )) (defthm string-alistp-binding-instance-data (implies (instance-data-p instance-data) (string-alistp (binding name instance-data))) :hints (("Goal" :in-theory (enable instance-data-p binding))))

(in-theory (disable instance-data-p))

Class instances will be represented in the heap by instance structures. The field instance-class must be a reference to a class (surrogate) object. The field instance-data will always satisfy the predicate instance-data-p (defined above).


Note on JVM Semantics: JVMS only describes how uninitialized instances are treated at the level of the method creating them or within an instance-initialization method. There are no constraints placed on the actions within the method after an appropriate instance-initialization method has been invoked (and completedly normally).


Future Extensions: The field instance-lock is introduced in anticipation of modeling threads and concurrency in a future version of the dJVM model. In the current model instance-lock will not be used.

We name the constructor for instances make-an-instance, because Common Lisp defines a function named make-instance as part of CLOS, and ACL2 does not allow us to (re)define any function that is part of standard Common Lisp.


(defstructure  instance
  (class  (:assert (tv-ref-p class) :rewrite))
  (data   (:assert (instance-data-p data) :rewrite
                   (:type-prescription (true-listp data))))
  (lock   (:assert (null lock) :rewrite)
          (:default nil))
  (:options :guards
            (:verify-guards t)
            (:keyword-constructor make-an-instance)
            ))

(in-theory (disable instance-p))


Remark: Explain the alist/mapping theory and bound?, bind,and binding.

Instance data is recorded as a two-level nested alist. We will access the alist entries using the function binding. The function binding requires that its first argument satisfy alistp.

The following rule notes that the binding of a name in the instance-data always satisfied alistp. This follows because if the name is bound in the instance-data (i.e., is a key in that alist), then by the definitions above the corresponding value satisfies instance-data-p. If the name is not bound in the instance data, then binding returns nil, which also satisfies alistp.


(defthm alistp-binding-instance-data
  (implies (instance-p object)
           (alistp (binding class-name (instance-data object))))
  :hints (("Goal" :cases ((bound? class-name (instance-data object))))))

Class Surrogate Objects


Remark: Not sure which of these class-status values will show up in the early dJVM models. Eventually we'll model dynamic loading of classes and have to model the whole process described in JLS chapter 12.

Figure out what class status values we really need. Do we require that our classes are initialized in dJVM 0.5?



(def-enumeration  class-status 
  (loaded
   unresolved
   resolved
   prepared
   initialized))

The information about class declarations will be stored in the class table (see page [classTableIntro]). The surrogate class object on the regular heap and the class meta-information in the class table are always paired. The class table has an entry for each class that has been loaded into the JVM state. Entries are indexed by (fully qualified) class name. There are no direct references to class declarations. Class surrogates in the heap are paired with their associated class-table entry by class name.

The structure a-class will be used represent the class objects in the heap (i.e., as the surrogate class object).This structure is called a-class, rather than class because the Common Lisp defines classes as part of the Common Lisp Object System, and so functions such as class-name are already defined in standard Common Lisp, and ACL2 does not allow us to redefine functions in standard Common Lisp. Naming the struction a-class means that the accessors are named a-class-name, etc. The data field of a class surrogate contains the class variables (i.e., the static variables of the class declaration). The class declaration information is stored in the class table, which is represented outside of the heap. The class table is indexed by (fully qualified) class name. The class name is present in the structure a-class, so computations requiring information from the class declaration can be resolved.


Future Extensions: For the moment both class-loader and lock are always null.


(defstructure  a-class
  (name   (:assert (stringp name) :type-prescription ))
  (data   (:assert (var-binding-p data) :rewrite
                   (:type-prescription (true-listp data))))
  (status (:assert (class-status-p status) :rewrite
                   (:type-prescription (symbolp status))))
  (lock   (:assert (null lock) :rewrite)
          (:default nil))
  (loader (:assert (or (tv-ref-p loader)
                       (null loader)))
          (:default nil)
          :read-only)
  (:options :guards
            (:verify-guards t)))

(in-theory (disable a-class-p))

The Heap

The predicate object-p recognizes instances and class surrogates. The predicate object-listp recognizes a sequence of objects.


(defun object-p (x)
  (or (instance-p x)
      (a-class-p x)))

(deflist object-listp (l) object-p)

The heap contains a set of objectsActually, the heap contains a bag or multi-set of objects, since multiple equal objects may reside in the heap.. The heap associates each object with a unique natural number; this number is analogous to an object address in a conventional implementation. We define a heap as an alist. Thus we can use the predicates and accessors on alists (e.g., bound? and binding)to access objects in the heap.


(defalist heap-p (l)
  "Define a theory for alists mapping heap-addresses to objects."
  (naturalp . object-p)
  (:options (:domain-type naturalp-listp)
            (:range-type  object-listp)
            ))

(in-theory (disable heap-p))

Every object belongs to some particular class: the class that was mentioned in the creation expression that produced the object, the class whose object was used to invoke the newInstance method to produce the object. This class is called the class of the object.
[ 4.5.5, p. 47 in JLS]

Class Declarations

The class table contains the run-time representation of class declarations. Class declarations may include field declarations and method declarations.

The structure of class objects is known to the JVM.

String Values


Remark: We don't support the class String yet.

The only strings that appear in the dJVM so far are those that are instruction arguments. For the moment we can represent them as ACL2 strings. Strings never appear as values of instance fields.

Ultimately we'll probably have to represent them as instances of class String.


Java defines string values to be immutable.The class StringBuffer is used to provide mutable strings. Further, Java requires that string constants be unique. That is, two string constants that contain the same characters (in number and order) will be represented at run-time by references to the same string constant. String values computed at run-time will be represented by new ACL2 string values. The method String.intern is just the identity function, because we can only compare ACL2 strings with equal, which denotes abstract string equality.

The Code Body of a Method

dJVM Opcodes and Instructions

The dJVM instruction opcodes ending with ``_wide'' are not opcodes defined in the JVM. They correspond to instruction variants of the JVM wide instruction. For example, a dJVM instruction with the opcode aload_wide corresponds to the JVM with the opcode wide and the opcode ``being widened'' within the wide instruction being aload.

The dJVM does not implement the wide opcode. Rather, we define new opcodes corresponding to the variants of the JVM wide instruction. The function bytecode-opcode-p recognizes the symbolic op-codes defined as part of the dJVM 0.5 model.


(defun bytecode-opcode-p (x)
  (memberp x '(
               ;; Integer instructions

ICONST_M1 ; (bytecode 2) ICONST_0 ; (bytecode 3) ICONST_1 ; (bytecode 4) ICONST_2 ; (bytecode 5) ICONST_3 ; (bytecode 6) ICONST_4 ; (bytecode 7) ICONST_5 ; (bytecode 8) BIPUSH ; (bytecode 16) SIPUSH ; (bytecode 17) ILOAD ; (bytecode 21) ILOAD_wide ILOAD_0 ; (bytecode 26) ILOAD_1 ; (bytecode 27) ILOAD_2 ; (bytecode 28) ILOAD_3 ; (bytecode 29) ISTORE ; (bytecode 54)

ISTORE_0 ; (bytecode 59) ISTORE_1 ; (bytecode 60) ISTORE_2 ; (bytecode 61) ISTORE_3 ; (bytecode 62) IADD ; (bytecode 96) ISUB ; (bytecode 100) IMUL ; (bytecode 104) IDIV ; (bytecode 108) INEG ; (bytecode 116) IAND ; (bytecode 126) IOR ; (bytecode 128) IXOR ; (bytecode 130) IINC ; (bytecode 132) I2L ; (bytecode 133) I2S ; (bytecode 147)

;; Long integer instructions

LCONST_0 ; (bytecode 9) LCONST_1 ; (bytecode 10) LLOAD ; (bytecode 22) LLOAD_WIDE LLOAD_0 ; (bytecode 30) LLOAD_1 ; (bytecode 31) LLOAD_2 ; (bytecode 32) LLOAD_3 ; (bytecode 33) LSTORE ; (bytecode 55) LSTORE_0 ; (bytecode 63) LSTORE_1 ; (bytecode 64) LSTORE_2 ; (bytecode 65) LSTORE_3 ; (bytecode 66) LADD ; (bytecode 97) LSUB ; (bytecode 101) LDIV ; (bytecode 109) L2I ; (bytecode 136) LCMP ; (bytecode 148)

;; Object-related instructions

ACONST_NULL ; (bytecode 1) ALOAD ; (bytecode 25) ALOAD_WIDE ALOAD_0 ; (bytecode 42) ALOAD_1 ; (bytecode 43) ALOAD_2 ; (bytecode 44) ALOAD_3 ; (bytecode 45) ASTORE ; (bytecode 58) ASTORE_WIDE ASTORE_0 ; (bytecode 75) ASTORE_1 ; (bytecode 76) ASTORE_2 ; (bytecode 77) ASTORE_3 ; (bytecode 78) GETSTATIC ; (bytecode 178) PUTSTATIC ; (bytecode 179) GETFIELD ; (bytecode 180) PUTFIELD ; (bytecode 181) NEW ; (bytecode 187) ;; CHECKCAST ; (bytecode 192) -- not yet supported ;; INSTANCEOF ; (bytecode 193) -- not yet supported

;; Exception Handling: ;; ;; (Not included in JVM 0.5)

;; ATHROW ; (bytecode 191) ;; JSR ; (bytecode 168) ;; RET ; (bytecode 169) ;; JSR_W ; (bytecode 201)

;; Control-flow instructions

NOP ; (bytecode 0) GOTO ; (bytecode 167) GOTO_W ; (bytecode 200) IFEQ ; (bytecode 153) IFNE ; (bytecode 154) IFLT ; (bytecode 155) IFGE ; (bytecode 156) IFGT ; (bytecode 157) IFLE ; (bytecode 158) IF_ICMPEQ ; (bytecode 159) IF_ICMPNE ; (bytecode 160) IF_ICMPLT ; (bytecode 161) IF_ICMPGE ; (bytecode 162) IF_ICMPGT ; (bytecode 163) IF_ICMPLE ; (bytecode 164) IF_ACMPEQ ; (bytecode 165) IF_ACMPNE ; (bytecode 166) IFNULL ; (bytecode 198) IFNONNULL ; (bytecode 199) TABLESWITCH ; (bytecode 170) LOOKUPSWITCH ; (bytecode 171)

;; Stack manipulation instructions

POP ; (bytecode 87) POP2 ; (bytecode 88) DUP ; (bytecode 89) DUP_X1 ; (bytecode 90) DUP_X2 ; (bytecode 91) DUP2 ; (bytecode 92) DUP2_X1 ; (bytecode 93) ;; DUP2_X2 ; (bytecode 94) -- not yet supported SWAP ; (bytecode 95)

;; Method invocation instructions

INVOKEVIRTUAL ; (bytecode 182) INVOKESPECIAL ; (bytecode 183) INVOKESTATIC ; (bytecode 184)

IRETURN ; (bytecode 172) ;; LRETURN ; (bytecode 173) -- not yet supported ARETURN ; (bytecode 176) RETURN ; (bytecode 177)

;; Load values from the constant pool

;;LDC ; (bytecode 18) -- not yet supported ;;LDC2_W ; (bytecode 20) -- not yet supported )))

(defun instruction-p (x) (and (true-listp x) (not (null x)) (symbolp (car x))))

(defthm true-list-if-instruction-p-forward (implies (instruction-p x) (true-listp x)) :rule-classes (:forward-chaining))

(defthm consp-if-instruction-p-forward (implies (instruction-p x) (consp x)) :rule-classes (:forward-chaining))

(in-theory (disable instruction-p))

(defun op-code (ins) (declare (xargs :guard (instruction-p ins))) (car ins))

(defun op-args (x) (declare (xargs :guard (instruction-p x))) (cdr x))

(defun arg1 (ins) (declare (xargs :guard (instruction-p ins))) (cadr ins))

(defun arg2 (ins) (declare (xargs :guard (instruction-p ins))) (caddr ins))

(defun arg3 (ins) (declare (xargs :guard (instruction-p ins))) (cadddr ins))

(deflist instruction-list (l) instruction-p)

(in-theory (disable instruction-list-true-listp))

(defalist bytecode-body-p (l) "Define an alist mapping bytecode addresses to bytecode instructions." (naturalp . instruction-p) (:options (:domain-type naturalp-listp) (:range-type instruction-list) ))

(defthm alistp-if-java-method-body-p (implies (bytecode-body-p x) (alistp x)))

(in-theory (disable bytecode-body-p))

Type Signature Encoding

Java class files encode field and method type signatures using the following abbreviations:

array "[" (not supported on the dJVM 0.5)
byte "B"
char "C"
class "L"
float "F" (not supported on the dJVM 0.5)
double "D" (not supported on the dJVM 0.5)
int "I"
long "J"
short "S"
void "V"
boolean "Z"


Note on the dJVM Model: Although the dJVM 0.5 model provides partial support for the types byte and short, these types are not yet supported in field and method signatures. Neither are char nor boolean yet supported.


Remark: What would it take to support byte, boolean, and short in field and method signatures?

At least type-list-for-field-type-sig would have to translate B, C, S, and Z to '(:int).

Does the JVM distinguish these narrow integer types in any context, except arrays?

How does a method with signature (S)S differ from one declared (I)I or one declared (S)I? Obviously their signatures are distinct, and so method invocation will consider them distinctly named methods.

All methods whose signature indicates an one-word integer return value use the ireturn instruction. So there appears to be no requirement that a method with signature (I)S return a short value, only that it return an int value.


Field Signatures

Each field member of a class has an associated field signature that encodes the declared type of the field. A field signature can represent any single primitive type or reference type. Currently the dJVM 0.5 model only supports the int and long primitive types as field values, along with reference types. Thus, the following are acceptable field signatures: I for int values, J for long values, and Lxyz; for reference values of the class xyz. In a type signature a long value is treated as a single value, rather than as two single-word values.

We start by defining a recognizer for reference-type signatures, and then one for the int and long primitive types.


Remark: The functions in this section are really superseded by those in the following sections.

But I haven't yet replaced their use in the description of state or of the new instruction. I think getfield and putfield use the newer mechanism.



(defun class-sig-p (sig)
  "Recognize a reference-type field-signature."
  (and (stringp sig)
       (let ((n (length sig)))
         (and (>= n 3)
              (equal (char sig 0) #L)
              (equal (char sig (1- n)) #;)))))

(defun field-sig-p (sig) "Recognize an INT or LONG field-signature." (or (member-equal sig '("I" ; :int "J" ; :long )) (class-sig-p sig)))

(defthm stringp-if-class-sig-p (implies (class-sig-p x) (stringp x)) :rule-classes (:forward-chaining))

(defthm stringp-if-field-sig-p (implies (field-sig-p x) (stringp x)) :rule-classes (:forward-chaining))

(defun sig-class-name (sig) "Extract the class name from a reference-type field-signature." (declare (xargs :guard (class-sig-p sig))) (subseq sig 1 (1- (length sig))))

Field Type Signatures

We define a recognizer for well-formed type-signature strings. The recognizer actually returns the ending index of the signature if it succeeds, and nil if it fails. Returning the ending index allows us to recognize and step over each field signature in the parameter-list signatures for methods. It also allows us to check that type signatures that include class names don't have extraneous characters after the class name.


Note on the dJVM Model: We constrain class names to the ASCII subset of UNICODE.


(defun field-type-sig-classname-p (sig i)
  "Test whether the substring of SIG starting at I is a
  (possibly-empty) name followed by a semicolon."
  (declare (xargs :guard (and (stringp sig)
                              (naturalp i)
                              (<= i (length sig)))
                  :measure (max 0 (- (length sig) (acl2-count i)))))
  (if (naturalp i)
      (if (>= i (length sig))
          nil
        (if (equal (char sig i) #;)
            ;; Return the next string index
              (1+ i)
          ;; Not at the end of the class-name yet.
          (field-type-sig-classname-p sig (1+ i))))
    nil))

(defun field-type-sig-class-name (sig i) "Return the name in SIG starting at position I ending with a semicolon." (declare (xargs :guard (and (stringp sig) (naturalp i) (<= i (length sig))) :measure (max 0 (- (length sig) (acl2-count i))))) (let ((end (field-type-sig-classname-p sig i))) (if (integerp end) (subseq sig i (1- end)) ':error-in-type-sig)))

(defun field-type-sig-class-p (sig i) "Test whether the substring of SIG starting at I is a non-empty name followed by a semicolon." (declare (xargs :guard (and (stringp sig) (naturalp i) (<= i (length sig))) :measure (max 0 (- (length sig) (acl2-count i))))) (if (naturalp i) (if (>= i (length sig)) nil (if (equal (char sig i) #;) ;; The first char must not be a semicolon, or the class ;; name was empty! nil ;; Otherwise look for the semicolon. (field-type-sig-classname-p sig (1+ i)))) nil))

We need to know that the type-signature recognizers always increase the index when they succeed. This fact is needed to show that the recursive function method-type-sig-p-internal terminates.

(defthm field-type-sig-classname-p-increases
  (implies (and (integerp i)
                (not (null (field-type-sig-classname-p sig i))))
           (< i (field-type-sig-classname-p sig i)))
  :rule-classes :linear)

(defthm field-type-sig-classname-p-bound (implies (and (integerp i) (not (null (field-type-sig-classname-p sig i)))) (<= (field-type-sig-classname-p sig i) (length sig))) :rule-classes :linear)

(defthm field-type-sig-class-p-increases (implies (and (integerp i) (not (null (field-type-sig-class-p sig i)))) (< i (field-type-sig-class-p sig i))))

(defthm field-type-sig-class-p-bound (implies (and (integerp i) (not (null (field-type-sig-class-p sig i)))) (<= (field-type-sig-class-p sig i) (length sig))) :rule-classes :linear)

The function field-type-sig-internal-p recognizes the type signature for a field (i.e., for one value). Primitive type signatures are a single character. The signature of a reference type (i.e., a class) has variable length.


Future Extensions: The dJVM 0.5 does not support arrays.




(defun field-type-sig-internal-p (sig i)
  "Test whether the string SIG contains a field-signature starting at index I.
   If so return the index after the last character in the field signature."
  (declare (xargs :guard (and (stringp sig)
                              (naturalp i)
                              (<= i (length sig)))))
  (if (>= i (length sig))
      nil
    (case (char sig i)

;; One-word types (#B (1+ i)) ; byte (#I (1+ i)) ; int (#S (1+ i)) ; short (#Z (1+ i)) ; boolean

;; One-word types not yet supported (#C nil) ; char (#F nil) ; float

;; Two-word types (#J (1+ i)) ; long (#D nil) ; double

;; Class names (#L (field-type-sig-class-p sig (1+ i)))

;; Arrays are not handled in dJVM 0.5 (#[ nil) (otherwise nil))) )

The function field-type-sig-p checks the length of the string to make sure that the entire string forms a single field-type signature, without any extraneous characters at the end.

(defun field-type-sig-p (sig)
  (declare (xargs :guard (stringp sig)))
  (equal (field-type-sig-internal-p sig 0)
         (length sig)))

Method Signatures

Method-signature strings are composed of an open parenthesis followed by zero or more field signatures followed by a close parenthesis and a return-type signature. For example, the method signature of a method taking an int and a long parameter and returning an int value would be (IJ)I.

We first define a recognizer the return-type of a method signature.


(defun return-type-sig-p (sig i)
  "Recognize a method return-type signature.  This is either
   a field-type sigature or the letter V."
  (declare (xargs :guard (and (stringp sig)
                              (naturalp i)
                              (<= i (length sig)))))
  (if (< i (length sig))
      (if (equal (char sig i) #V)
          (1+ i)
        (field-type-sig-internal-p sig i))
    nil))

(defthm field-type-sig-internal-p-increases (implies (and (integerp i) (not (null (field-type-sig-internal-p sig i)))) (< i (field-type-sig-internal-p sig i))) :rule-classes :linear)

(defthm field-type-sig-internal-p-bound (implies (and (integerp i) (not (null (field-type-sig-internal-p sig i)))) (<= (field-type-sig-internal-p sig i) (length sig))) :rule-classes :linear)

The following two functions recognize complete type-signatures for methods. The complete type signature of a method is a string consisting of
  1. an open parenthesis,
  2. zero or more parameter types,
  3. a close parenthesis,
  4. a return type.


(defun method-type-sig-p-internal (sig i)
  "Parse sequential field-types in a signature
   until we hit the end of the string or a right parenthesis."
  (declare (xargs :guard (and (stringp sig)
                              (naturalp i)
                              (<= i (length sig)))
                  :measure (max 0 (- (length sig) (acl2-count i)))
                  :guard-hints (("Goal" :in-theory (disable length)))))
  (if (and (naturalp i)
           (< i (length sig)))
      (if (equal (char sig i) #))
          (return-type-sig-p sig (1+ i))
        (let ((next-field-index (field-type-sig-internal-p sig i)))
          (and (not (null next-field-index))
               (method-type-sig-p-internal sig next-field-index))))
    nil))

(defun method-type-sig-p (sig) "Recognize a method type-signature." (declare (xargs :guard t)) (and (stringp sig) (> (length sig) 0) (equal (char sig 0) #() (method-type-sig-p-internal sig 1)))

(defthm stringp-if-method-type-sig-p (implies (method-type-sig-p x) (stringp x)) :rule-classes (:forward-chaining))

Type-Tag Lists

Now we define a translator from a type-signature string to a list of type tags, which is more easily used for run-time type-checking.

Note that the type-signature of a single field may translate into a list of one or two type tags, depending on whether the type-signature represents a one-word type (e.g., int) or a two-word type (e.g., long).

Note also that type-list-for-field-type-sig does not return a list of symbols. Reference types are represented as the cons-pair :ref and the class name (as a string). We define the predicate extended-jvm-type-listp, which is similar to the predicate jvm-type-listp, to recognize these extended type-lists.


(defun type-list-for-field-type-sig-internal (sig i)
  (declare (xargs :guard (and (stringp sig)
                              (naturalp i)
                              (<= i (length sig)))))
  (if (< i (length sig))
      (case (char sig i)
        (#I '(:int))
        (#J '(:long-top-half :long-bot-half))
        (#L (list (list ':ref (field-type-sig-class-name sig (1+ i)))))
        (otherwise '(:error)))
    '(:error)))

The function type-list-for-field-type-sig translates a field type-signature into a list of type tags as described above. The tag list is used for run-time type-checking when accessing instance fields and when invoking or return from methods.

(defun type-list-for-field-type-sig (sig)
  "Translate a field type-signature to a list of extended-type tags."
  (declare (xargs :guard (and (stringp sig)
                              (field-type-sig-p sig))))
  (type-list-for-field-type-sig-internal sig 0))

(defun type-list-for-method-parameters-internal (sig i) (declare (xargs :guard (and (stringp sig) (naturalp i) (<= i (length sig))) :measure (max 0 (- (length sig) (acl2-count i))) )) (if (or (not (naturalp i)) ;;(not (stringp sig)) (>= i (length sig))) nil (let ((next-field-index (field-type-sig-internal-p sig i))) (if (not (null next-field-index)) (append (type-list-for-field-type-sig-internal sig i) (type-list-for-method-parameters-internal sig next-field-index)) (if (equal (char sig i) #)) nil '(:error))))))

The function type-list-for-method-parameters returns a list of extended-type tags for a list of stack values to be a well-typed parameter list satisfying the given type-signature.


(defun type-list-for-method-parameters (sig)
  (declare (xargs :guard (and (stringp sig)
                              (method-type-sig-p sig))))
  (if (and (> (length sig) 0)
           (equal (char sig 0) #())
      (type-list-for-method-parameters-internal sig 1)
    '(:error)))

(in-theory (disable field-type-sig-p method-type-sig-p type-list-for-method-parameters type-list-for-field-type-sig))

Methods


(defun protection-modifiers ()
  '(:public :private :protected :default-protection))

(defun protection-modifier-p (x) (memberp x (protection-modifiers)))

(defun method-access-tags () '( :static :final :synchronized :native :abstract))

(defun method-access-flag-p (x) (memberp x (method-access-tags)))

(deflist method-access-flag-listp (l) method-access-flag-p)

(defun method-access-flags-p (x) (method-access-flag-listp x))

We define a separate recognizer for access-control modifiers. We break them out in the field and method declaration structures, so that they are easy to access and so it is easy to require that only one of them is specified for a given field or method.

Native methods


Remark: I haven't thought this out yet. It is not yet supported.

The body of a native method is a tagged-value with the tag :native-method. The value should be a symbol, which the interpreter will use to dispatch to the defined ACL2 ``native methods.''

Native methods will be implemented as ACL2 functions. Each native method takes the dJVM state as its argument, and returns a new dJVM state as its value. We expect that each native method will preserve well-formedness of dJVM states. (The implementor should use the ACL2 theorem prover to establish this, but the interpreter can be used without such a proof.)

The dJVM interpreter contains a case statement indexed by the native-method name (i.e., the value in a :native-method tagged-value) that calls the ACL2 function that implements that native method.


(def-tv  native-method-body-p (:native-method symbolp))

Method Bodies

A method body is either a bytecode-body (a sequence of bytecoded instructions) or a native method (implemented as an extension to the JVM) or is null. An unimplemented, abstract method has a null body. Since a method body is meta-data (i.e., information used directly by the interpreter itself, rather than manipulated by the JVM program being interpreted), we use the Lisp value nil, rather than a reference to the Java null.


(defun method-body-p (x)
  (or (bytecode-body-p x)
      (native-method-body-p x)
      (null x)))

Method Declarations

In addition to the method attributes taken from the class file representation of a method, we explicitly include the class name in which the method was declared. This will be used when we invoke the method, since the call frame associated with the method invocation must note that class as the ``current class'' during execution of the method.

The method structure includes both the method name and the method signature. These are used when resolving methods and selecting applicable methods as part of method invocation (i.e., in the invokevirtual, invokespecial, and invokestatic instructions).

We also use the method name in the description of the return instruction to determine if we are returning from an instance-initialization method. Instance-initialization methods always have the name <init>. When an instance-initialization method returns normally (i.e., via the return instruction), then the instance being initialized is marked as having been initialized in the context of the calling frame. See section [sec:uninitialized-instances], page [sec:uninitialized-instances], for a discussion of uninitialized instances. The structure of call frames is described in section [sec:call-frames], page [sec:call-frames]. The return instruction is described in section [inst:return], page [inst:return].

Note that we require the method signatures to be well-formed. The description of method invocation and return get much more complicated if we require run-time testing for well-formed signatures. In this sense the dJVM is not as fully-defensive as it could be. However, it is a simple matter to test method signatures for well-formedness when class definitions are loaded into the dJVM state.


Remark: Rename this to be method-descriptor or m-descriptor.

In standard class files the access flags of a method include its protection attribute (public, protected, private, or default). For convenience we have put the protection attribute in a separate field in java-method structure. So that it is easier to access, since it plays an important role in visibility and access rules.

All of the slots are read-only. They can be given initial values when a method structure is created, but cannot be altered thereafter. The :read-only option in the structure slots indicate this.Actually, the :read-only option merely suppresses the generation of the setter function (e.g., set-java-method-xxx) function for the slot. Since ACL2 is an applicative language, data structures are never updated destructively. The setter functions actually copies a sufficient portion of the data structure to create the new value without altering the old value. However, if ACL2 can deduce that no references to the old value still exist, the structure may be updated destructively. This is used to implement arrays efficiently.


Note on the dJVM Model: The dJVM 0.5 does not support any method attributes except the Code attribute, which defines the bytecode body for a method.The JVM Specification defines additional attributes to record debugging information, such as a line number table and a local variable symbol table.

It does not support the Exceptions attribute, which is required of all JVM implementations by the JVM Specification.
[ 4.6, p. 106 in JVMS]



Remark: Explain each field below.


Remark: Use the word slot to describe components of an ACL2 structure, so as to avoid confusion with Java's instance fields.


(defstructure  java-method
  (name            (:assert (stringp name)         :type-prescription)
                   :read-only)
  (class-name      (:assert (stringp class-name)   :type-prescription)
                   :read-only)
  (sig             (:assert (method-type-sig-p sig):rewrite
                            (:type-prescription    (stringp sig )))
                   :read-only)
  (protection      (:assert (protection-modifier-p protection) :rewrite
                            (:type-prescription  
                                          (non-nil-non-t-symbolp protection)))
                   (:default ':default-protection)
                   :read-only)
  (access-flags    (:assert (method-access-flags-p access-flags) :rewrite)
                   (:default nil)
                   :read-only)
  (body            (:assert (method-body-p body)   :rewrite)
                   :read-only)
  (max-stack       (:assert (naturalp max-stack)   :type-prescription)
                   :read-only)
  (max-locals      (:assert (naturalp max-locals)  :type-prescription)
                   :read-only)
  (exception-table (:assert (null exception-table) :type-prescription)
                   (:default nil)
                   :read-only)
  (attrs           (:assert (alistp attrs)         :rewrite
                            (:type-prescription    (true-listp attrs)))
                   (:default nil)
                   :read-only)
  (:options :guards
            (:verify-guards t)))

(defthm natural-p-java-max-stack
  (implies (java-method-p method)
           (naturalp (java-method-max-stack method)))
  :rule-classes (:rewrite :type-prescription))

(defthm naturalp-java-method-max-locals (implies (java-method-p method) (naturalp (java-method-max-locals method))) :rule-classes (:rewrite :type-prescription))

(defthm weak-java-method-p-if-java-method-p (implies (java-method-p x) (weak-java-method-p x)))

(in-theory (disable java-method-p))

(deflist java-method-listp (l) java-method-p)

The use of ``weak'' structure predicates is described in section [defstructure], [defstructure].


(deflist weak-java-method-listp (l)
  weak-java-method-p)

(defthm weak-java-method-listp-if-java-method-listp (implies (java-method-listp x) (weak-java-method-listp x)) :rule-classes (:forward-chaining))

(defun java-method-bound? (name sig method-list) (declare (xargs :guard (and (stringp name) (stringp sig) (java-method-listp method-list)))) (if (endp method-list) nil (if (and (equal name (java-method-name (car method-list))) (equal sig (java-method-sig (car method-list)))) t (java-method-bound? name sig (cdr method-list)))))

(defun java-method-binding (name sig method-list) (declare (xargs :guard (and (stringp name) (stringp sig) (java-method-listp method-list)))) (if (endp method-list) nil (if (and (equal name (java-method-name (car method-list))) (equal sig (java-method-sig (car method-list)))) (car method-list) (java-method-binding name sig (cdr method-list)))))

(defthm java-method-p-binding-member-method-listp (implies (and (java-method-listp l) (java-method-bound? name sig l)) (member (java-method-binding name sig l) l)))

(defthm java-method-p-binding-method-listp (implies (and (java-method-listp l) (java-method-bound? name sig l)) (java-method-p (java-method-binding name sig l))))

(defthm java-method-listp-type-bound? (implies (and (java-method-listp l) (or (not (stringp name)) (not (stringp sig)))) (not (java-method-bound? name sig l))))

(defun java-bytecode-method-p (x) (and (java-method-p x) (bytecode-body-p (java-method-body x))))

(defun bytecoded-method-p (x) (declare (xargs :guard (java-method-p x))) (bytecode-body-p (java-method-body x)))

(defun native-method-p (x) (declare (xargs :guard (java-method-p x))) (native-method-body-p (java-method-body x)))

Fields

Field Access Flags

The access flags describe several different aspects of a field. They describe its accessibility or protection (public, protected, or private). They describe whether it is a class field (i.e., static) or an instance field (i.e., not static). The volatile and transient flags relate to concurrent computations and data serialization, which are not addressed in dJVM 0.5.

When we describe the representation of a field declaration, we will represent the protection attributes of a field separately from the other access flags. This will allow us to easily state that only of one of the protection attributes may be set.


(defun field-access-tags ()
  '( :static
     :final
     :volatile
     :transient
     ))

(defun field-access-flag-p (x) (memberp x (field-access-tags)))

(deflist field-access-flag-listp (l) field-access-flag-p)

(defun field-access-flags-p (x) (field-access-flag-listp x))


Remark: Rename this to be field-descriptor or {f-descriptor}.

As with the java-method the structure the slots in the field structure are read-only. We require the field signatures to be well-formed.


(defstructure  field
  (name         (:assert (stringp name) :type-prescription)
                :read-only)
  (sig          (:assert (field-sig-p sig) :rewrite
                         (:type-prescription  (stringp sig)))
                :read-only)
  (protection   (:assert (protection-modifier-p protection) :rewrite
                         (:type-prescription  (non-nil-non-t-symbolp protection)))
                (:default ':default-protection)
                :read-only)
  (access-flags (:assert (field-access-flags-p access-flags) :rewrite)
                (:default nil)
                :read-only)
  (attrs        (:assert (alistp attrs) :rewrite
                         (:type-prescription  (true-listp attrs)))
                (:default nil)
                :read-only)
  (:options :guards
            (:verify-guards t)))

(defthm weak-field-p-if-field-p
  (implies (field-p x)
           (weak-field-p x)))

(in-theory (disable field-p))

(deflist field-listp (l) field-p)

(deflist weak-field-listp (l) weak-field-p)

(defthm weak-field-listp-if-field-listp (implies (field-listp x) (weak-field-listp x)) :rule-classes (:forward-chaining))

We now define a little theory about lists of field declarations that mirrors a portion of the alist theory, but uses a different lookup mechanism. Note that field lists are not alists, but merely lists of field declarations. Since each field declaration includes the field name, there is no need to build an alist on top of the field list in order to access the fields. We merely define field-bound? and field-binding, analogous to the functions bound? and binding for alists. We also define some of the theorems that defalist includes automatically.


(defun field-bound? (name field-list)
  (declare (xargs :guard (field-listp field-list)))
  (if (endp field-list)
      nil
    (if (equal name (field-name (car field-list)))
        t
      (field-bound? name (cdr field-list)))))

(defun field-binding (name field-list) (declare (xargs :guard (field-listp field-list))) (if (endp field-list) nil (if (equal name (field-name (car field-list))) (car field-list) (field-binding name (cdr field-list)))))

(defthm field-p-binding-field-listp (implies (and (field-listp l) (field-bound? x0 l)) (field-p (field-binding x0 l))))

(defthm field-listp-type-bound? (implies (and (field-listp l) (not (stringp x0))) (not (field-bound? x0 l))))


Remark: Does this explanation belong earlier?

Use a consistent terminology for slot readers/writers. Bishop uses ``readers'' and ``writers'' in the description of defstructure. Bill Bevier uses ``accessors'' and ``update functions.'' In the frame-operation-macro description I use getters and setters.


In general we don't want to try to prove that something satisfies the weak recognizer weak-field-p by attempting to prove that it satisfies the stronger predicate field-p. The motivation for having the weak form of the recognizer is to attempt to allow the ACL2 rewriter to rewrite formulas involving the structure accessors and setters without having to prove the semantic constraints on the slots.

When we compose structures (i.e., define a slot in one structure to contain another structure), we will often compose a reader for the lower-level structure with a reader for the higher-level structure. The slot assertion for the higher-level structure will normally use the strong recognizer. But the guard for the lower-level reader uses the weak-recognizer. Thus we are faced with precisely the problem described above: we must prove the guard assertion (i.e., the weak predicate) from the slot assertion (i.e., the strong assertion).

However, we wish to avoid having this rewrite rule be used in general when trying to prove the weak recognizer is satisfied. So, we restrict the rewrite rule to apply only when we're trying to prove that some variable binding satisfies the weak recognizer by writing the rules as


      (implies (field-p (binding x y))
               (weak-field-p (binding x y)))
rather than the more general case

      (implies (field-p x)
               (weak-field-p x))
This more restricted rule will only apply when we extract a field declaration using binding.

In that case we back-chain to attempt to prove the binding satisfies the strong recognizer. This pattern will be repeated each time we define a structure that will be composed in this fashion. Another way to avoid this is to define a forward-chaining rule that adds the strong assertion as a hypothesis whenever the weak assertion is a hypothesis. This avoids the cost of back-chaining while still making the strong assertion available when needed.


Remark: This lemma is no longer needed (or useful) now that we're using the :struct option in defstructure.


(defthm weak-field-p-binding-if-field-p-binding
  (implies (field-p (binding x y))
           (weak-field-p (binding x y))))

Class Declarations


(defun class-access-flag-bits ()
  '( :public
     :final
     :interface
     :abstract
     :super
     ))

(defun class-access-flag-bit-p (x) (memberp x (class-access-flag-bits)))

(deflist class-access-flags-bit-listp (l) class-access-flag-bit-p)


Remark: We should add the restrictions on mutually-exclusive bits.


(defun class-access-flags-p (x)
  (class-access-flags-bit-listp x))

The class-decl structure represents the dJVM image of the class definition contained in a standard Java class file. This includes the class name, access flags, field and method definitions.

The class-decl structure records all superclasses of a class, rather than just recording its immediate superclass. This makes it easier to check that all the class-decls in the class-table for a well-formed and form a consistent tree-structure of classes. Having the list of superclasses allows us to use a simpler recursion pattern in the wff-predicate and simplifies the justification that the recursive predicate terminates.


Remark: Explain that the constant pool data is distributed to the portions of the class declaration that refer to it. So the constant pool itself (as a distinct entity) vanishes.


Remark: Shouldn't most of these slots be :read-only?


(defstructure  class-decl
  (name          (:assert (stringp name)            :type-prescription))
  (status        (:assert (class-status-p status)   :rewrite
                          (:type-prescription       (non-nil-non-t-symbolp status)))
                 (:default 'initialized))
  (surrogate     (:assert (tv-ref-p surrogate)      :rewrite))
  (access-flags  (:assert (class-access-flags-p access-flags) :rewrite))
  (superclass    (:assert (stringp superclass)      :type-prescription))
  (superclasses  (:assert (string-listp superclasses) :rewrite
                          (:type-prescription       (true-listp superclasses))))
  (interfaces    (:assert (string-listp interfaces) :rewrite
                          (:type-prescription       (true-listp interfaces))))
  (fields        (:assert (field-listp fields)    :rewrite
                          (:type-prescription       (true-listp fields)))
                 (:struct (weak-field-listp fields)))
  (methods       (:assert (java-method-listp methods)  :rewrite
                          (:type-prescription       (true-listp methods)))
                 (:struct (weak-java-method-listp methods)))
  (attrs         (:assert (string-alistp attrs)     :rewrite
                          (:type-prescription       (true-listp attrs)))
                 (:default nil))
  (:options :guards
            (:verify-guards t)
            ))

(defthm weak-class-decl-p-if-class-decl-p
  (implies (class-decl-p x)
           (weak-class-decl-p x)))

(in-theory (disable class-decl-p))

(defun interface-p (x) (and (class-decl-p x) (memberp ':interface (class-decl-access-flags x))))

(defun abstract-class-p (x) (and (class-decl-p x) (memberp ':abstract (class-decl-access-flags x))))

(defun class-decl-protection (class-decl) "Return the protection flag of the class-decl." (declare (xargs :guard (class-decl-p class-decl))) (if (member ':public (class-decl-access-flags class-decl)) ':public ':default-protection))

(defthm class-decl-protection-fact (member (class-decl-protection x) '(:default-protection :public)))

Some Requirements of Classes

The Java language makes some requirements of well-formed classes. Many of these requirements should appear in the JVM specification as requirements on well-formed class files. For example:

It is a compile-time error for the body of a class declaration to contain declarations of two fields with the same name.
[ 8.3, p. 144 in JLS]


Remark: Is the JVM supposed to check for this sort of error when loading class declarations?

None of the class-file constraints in the JVM Specification appear to prohibit this!



Future Extensions: Someday these requirements on well-formed classes will be expressed as a requirements on the class-table in well-formed dJVM states, and proven as invariants during dJVM execution.

The Full dJVM State

We have now described almost all the major pieces of the dJVM state. We have descried the structure of class declarations and heap objects. We shall now describe the first two classes needed (the class Class and the class Object) and the dJVM call stack, after which we can define the full dJVM state.

Class Class and Class Object


Remark: We want to recognize syntactically well-formed class-like objects.

Then we'll impose some additional restrictions that every object points to a class object, whose class is Class. Class is an instance of Object, but I think it's also magic/built-in.


The first few objects are treated specially. The class Object is the superclass of every other class, but has no superclass itself. All classes are instances of the class Class, including the class Class itself. These two most basic objects form a little clique: class Object and class Class are both instances of class Class, and class Class has class Object as its superclass. This is the only case in which class objects may form a cycle.

In fact, classes are pretty special objects. Classes have several special properties:

The JVM heap will initially contain a few standard classes. In dJVM version 0.5 we include the class Object and the class Class. We will include additional standard classes in later revisions of the model. The initial class declarations are given in Chapter [ch:top-level].


Remark: Explain about the class table versus class surrogates in the heap. See chapter [sec:objects], page [sec:objects], and especially section [sec:class-surrogates], page [sec:class-surrogates].


(deflist class-decl-listp (l)
  class-decl-p)

(deflist weak-class-decl-listp (l) weak-class-decl-p)

(defthm weak-class-decl-listp-if-class-decl-listp (implies (class-decl-listp x) (weak-class-decl-listp x)) :rule-classes (:forward-chaining))

Currently the class table is indexed using only the the fully-qualified class name. No account is taken of the class loader when resolving a class name in the class table.
Future Extensions: The class table will have to be indexed by class name (a string) and class loader (an instance reference) when the dJVM is extended to include multiple class loaders.


(defalist class-table-p (l)
  "Define an alist mapping a class name to the corresponding class declaration."
  (stringp . class-decl-p)
  (:options (:domain-type string-listp)
            (:range-type  class-decl-listp)
            ))

(defalist weak-class-table-p (l) "Define an alist mapping a class name to the corresponding class declaration." (stringp . weak-class-decl-p) (:options (:domain-type string-listp) (:range-type weak-class-decl-listp) ))

(defthm weak-class-table-p-if-class-table-p (implies (class-table-p x) (weak-class-table-p x)) :rule-classes (:forward-chaining))

(in-theory (disable class-table-p))

(defmacro class-found? (class-name class-table) `(bound? ,class-name ,class-table))

(defun all-classes-found? (class-names class-table) (declare (xargs :guard (and (string-listp class-names) (class-table-p class-table)))) (if (endp class-names) t (and (class-found? (car class-names) class-table) (all-classes-found? (cdr class-names) class-table))))

(defun find-class-decls (class-names class-table) (declare (xargs :guard (and (string-listp class-names) (class-table-p class-table)))) (if (endp class-names) nil (cons (binding (car class-names) class-table) (find-class-decls (cdr class-names) class-table))))

(defthm class-decl-p-listp-find-class-decls (implies (and (string-listp names) (class-table-p class-table) (all-classes-found? names class-table)) (class-decl-listp (find-class-decls names class-table))) :hints (("Goal" :induct (find-class-decls names class-table))))

Method Lookup

Methods are identified by name and signature. Instance methods are inherited, and so are first looked up in the instance's class declaration, and then in successive superclass declarations, until an applicable method is found or until the class Object has been searched.

Class constructors in the Java language correspond to instance-initialization methods (i.e., methods named <init>) at the JVM level.

Note that:

  1. Class initialization methods (i.e., methods named <clinit>) are only invoked implicitly when the JVM initializes a class. They are never invoked explicitly.The dJVM 0.5 model does not include dynamic class loading. In dJVM 0.5 Class initialization methods can be invoked when a new class is added to the dJVM state, but this is not the behavior specified either by the Java Language Specification or by the Java Virtual Machine Specification. (See the definition of djvm-load-class-decl in section [sec:clinit], page [sec:clinit], for more details.)

  2. Instance initialization methods (i.e., methods named <init>) are only invoked via the invokespecial instruction. They cannot be invoked via invokevirtual, as normal instance methods are. Instance-initialization methods and private methods are always looked up directly in a specified class declaration. They are not searched for by dynamically looking up the class' inheritance hierarchy, as other instance methods are.

  3. The invokespecial instruction is also used to implement the semantics associated with dispatching an overridden method not normally visible in a given class (i.e., the semantics associated with super in Java).


Remark: Where should we check that method declarations are compatible with method declarations in each class' superclasses?

What are the ways in which a method declaration can be incompatible with that of a superclass?


If a non-public class overrides a public method of a public superclass, then the overriding method must be declared public, even though it is declared in a class that is not public.
[ 8.2.1.4, p. 142 in JLS]

A compile-time error occurs if an instance method overrides a static method.
[ 8.4.6, p. 165 in JLS]


Remark: Put this here and under class loading.

As part of the loading and linking process the virtual machine checks that an overriding method is at least as accessible as the overridden method; an IncompatibleClassChangeError occurs if this is not the case.
[ 15.11.4, p. 336 in JLS]

The following compile-time error is probably ignored at load-time and execution-time:

If a method declaration overrides or hides the declaration of another method, then a compile-time error occurs if they have different return types or if one has a return type and the other is void.
[ 8.4.6.3, p. 166 in JLS]

Note that the run-time behavior simply performs dynamic look-up on the method descriptor (i.e., the method signature and result type). So an overriding method with a different result type will not match the method descriptor given in the invokevirtual instruction.

It can be ignored because the compile-time error is not relevant to the dynamic execution. Method look-up at execution time uses the complete method signature (including the return type). So the method look-up will simpy fail to find a method if the method's return type does not match the return-type signature specified in the invocation.

However, the following error condition may have to be checked at load time.

A method declaration must not have a throws clause that conflicts with that of any method it overrides or hides.
[ 8.4.6.3, p. 166 in JLS]


Remark: Does the throws clause even appear in the class file??

A private method is never accessible to subclasses and so cannot be hidden or overridden. This means that [private methods never participate in method inheritance.]
[ 8.4.6.3, p. 166 in JLS]


Remark: But this statement would suggest that a subclass could inherit an accessible method through a superclass that had a private method with the same name and signature. But I don't believe that's true.


(defthm java-method-listp-class-decl-methods
  (implies (class-decl-p class-decl)
           (java-method-listp (class-decl-methods class-decl))))

(defun local-method? (method-name method-sig class-decl) (declare (xargs :guard (and (stringp method-name) (stringp method-sig) (class-decl-p class-decl)))) (java-method-bound? method-name method-sig (class-decl-methods class-decl)))

(defun lookup-local-method (method-name method-sig class-decl) (declare (xargs :guard (and (stringp method-name) (stringp method-sig) (class-decl-p class-decl)))) (java-method-binding method-name method-sig (class-decl-methods class-decl)))

(defun lookup-method-in-superclasses (method-name method-sig class-decls) (declare (xargs :guard (and (stringp method-name) (stringp method-sig) (class-decl-listp class-decls)))) (if (endp class-decls) nil (or (lookup-local-method method-name method-sig (car class-decls)) (lookup-method-in-superclasses method-name method-sig (cdr class-decls)))))

(defthm weak-class-decl-p-binding-in-class-table (implies (and (class-table-p class-table) (stringp class-name) (class-found? class-name class-table)) (weak-class-decl-p (binding class-name class-table))))

(defthm lookup-method-guard-helper (implies (and (class-table-p class-table) (stringp class-name) (bound? class-name class-table)) (string-listp (class-decl-superclasses (binding class-name class-table)))))

(defthm lookup-method-guard-helper-1 (implies (and (class-table-p class-table) (stringp class-name) (bound? class-name class-table) (all-classes-found? (class-decl-superclasses (binding class-name class-table)) class-table)) (class-decl-listp (find-class-decls (class-decl-superclasses (binding class-name class-table)) class-table))))

(defun lookup-method (method-name method-sig class-name class-table) (declare (xargs :guard (and (stringp method-name) (stringp method-sig) (stringp class-name) (class-table-p class-table)) :guard-hints (("Subgoal 2" :by (:instance lookup-method-guard-helper)) ("Subgoal 1" :by (:instance lookup-method-guard-helper-1)) ))) (if (class-found? class-name class-table) (let* ((class-decl (binding class-name class-table)) (superclasses (class-decl-superclasses class-decl))) (if (local-method? method-name method-sig class-decl) (lookup-local-method method-name method-sig class-decl) (if (all-classes-found? superclasses class-table) (let ((superclass-decls (find-class-decls superclasses class-table))) (lookup-method-in-superclasses method-name method-sig superclass-decls)) nil))) nil))

Field Lookup

In instructions that access class fields, the class fields are always referenced by the simple name of the field and the fully-qualified name of the class in which they are declared. So resolution of field names always looks up the field name in the class in which that field was declared, using that information to determine which instance field to access at run time. Thus, the field-accessing instructions never need to consider the hiding of fields in subclasses. These concerns have been handled by the compiler, and need not be handled at link time or execution time.

Unlike methods, field declarations are not overridden. Their names are just hidden. There are no consistency constraints between field declarations in a class declaration and those of its superclasses (if any), as there are with method inheritance and overriding.

It is permissible for an instance variable to hide a static variable.
[ 8.4.6, p. 165 in JLS]

It is permissible for a field to hide a field of another type.
[ 8.4.6.3, p. 166 in JLS]


Remark: It is permissible for a field to hide a field with a different protection. Do they mention that explicitly?


Remark: This looks up the field-declaration.

It doesn't look up the field value.



(defun lookup-local-field (field-name class-decl)
  (declare (xargs :guard (and (stringp field-name)
                              (class-decl-p class-decl))))
  (field-binding field-name
                 (class-decl-fields class-decl)))

(defun lookup-field-in-superclasses (field-name class-decls) (declare (xargs :guard (and (stringp field-name) (class-decl-listp class-decls)))) (if (endp class-decls) nil (or (lookup-local-field field-name (car class-decls)) (lookup-field-in-superclasses field-name (cdr class-decls)))))

(defun lookup-field (field-name class-name class-table) (declare (xargs :guard (and (stringp field-name) (stringp class-name) (class-table-p class-table)))) (if (class-found? class-name class-table) (let ((class-decl (binding class-name class-table))) (lookup-local-field field-name class-decl)) nil))

(defmacro object-p-deref (ref memory) `(object-p (deref ,ref ,memory)))

(defthm ref-p-object-class (implies (instance-p thing) (tv-ref-p (instance-class thing))))

(defthm alistp-object-data (implies (instance-p x) (alistp (instance-data x))))

Extract a list of field names from a list of field declarations.

(defun field-names (field-list)
  (declare (xargs :guard (field-listp field-list)))
  (if (endp field-list)
      nil
    (cons (field-name  (car field-list))
          (field-names (cdr field-list)))))

Call Frames

A frame on the Java stack contains the:


Remark: The JVMS says that a call frame must contain... In addition the dJVM call frame includes...

Briefly explain why we keep the uninitialized list.


A Java Virtual Machine frame may be extended with additional implementation-specific information.
[ 3.6.6, p. 68 in JVMS]


(defalist local-vars-listp (l)
  "Define an alist mapping addresses of local-variables to tagged-values."
  (naturalp . sv-p)
  (:options (:domain-type naturalp-listp)
            (:range-type  sv-listp)
            ))

(in-theory (disable local-vars-listp))

(defalist weak-local-vars-listp (l) "Define an alist mapping addresses of local-variables to weak-tv values." (naturalp . weak-tv-p) (:options (:domain-type naturalp-listp) (:range-type weak-tv-listp) ))

(verify-guards weak-local-vars-listp)

(in-theory (disable weak-local-vars-listp))

The field pc contains the address of the next instruction to execute. This is incremented before each instruction is executed. This is true for all fixed-length dJVM instructions.There are two exceptions to this rule: the variable-length instructions tableswitch and lookupswitch. These are both variable-length, unconditional branch instructions. During their execution the pc contains the address of the tableswitch or lookupswitch instruction being executed. Thus, during a branch instruction, the pc contains the address of the instruction following the branch instruction.

The field cia contains the current instruction address. This is the byte-address of the opcode of the instruction being executed. It is updated before each instruction is executed. All JVM branch instructions are relative branches, and the branch offset is relative to the address of the branch instruction. Thus the cia field acts as the base when computing the target address of a branch instruction.


Note on JVM Semantics: JVM Specification does not define when the program counter is updated. Branches are not defined relative to the program counter, but always relative to the address of the branch instruction.

The choice of when to update the program counter is left as an implementation decision. In order to make the dJVM definition executable, we update the pc field as described above, and add the cia field to allow branch targets to be described in the style of the JVM Specification.



Remark: Actually, the JVMS does specify the meaning of the pc field for bytecoded methods.

[If the current] method is not native, the pc register contains the address of the Java Virtual Machine instruction currently being executed.
[ 3.5.1, p. 62 in JVMS]

So the role of the pc and cia registers in the dJVM model should be reversed. Note that this description implicitly requires an additional register to hold the branch target until the ``instruction currently being executed'' completes, since the pc register may not change while it is executing.

Uninitialized Instances

The frame component method is used to hold the method associated with this frame.


Remark: For the moment the dJVM only supports bytecode methods. When the model is extended to support "native" methods (i.e., methods implemented in Lisp or ACL2), all of the dJVM instructions will have to check that they are executing within the frame for a bytecode method. It doesn't make sense to execute a bytecode instruction unless the dJVM is executing a bytecode method body. But the defensive machine must check.

Perhaps the only place the guard is necessary is in the macro define-defensive-instruction. It should halt the machine if the current frame's method is not a bytecode method.


The frame component object-ref is used for frames denoting the invocation of an instance-initialization method. When an instance-initialization method competes normally, the instance being initialized is considered properly initialized for the class of the method. Since a method completes normally when it exits via a return instruction, the definition of the return instruction handles marking instances as initialized. In order for the return instruction to be able to mark the proper object, we record the object reference in the call frame when invoking an instance-initialization method. The object-ref field is left empty when invoking any other method.

The field new-refs will contain a list of newly created objects that have not yet been initialized. When an instance is created via the new instruction, a reference to it is added to new-refs. When an instance-initialization method is invoked, the object reference on which it is to be dispatched must be a reference in new-refs. When the frame corresponding to the invocation of the instance-initialization method is created, the initial value of new-refs has a single entry, the object-ref on which the method was invoked.

The reference is removed from new-refs when an instance-initialization method invoked on it returns to this frame. Removing the reference from new-refs any earlier would allow an object to be considered initialized even if the instance-initialization method did not complete normally.

The value of the object-ref field is used when an instance-initialization method returns to identify which reference should be removed from the calling frame's new-refs field. In a call frame for a normal instance method (i.e., not an instance initialization method) the object-ref field should be a reference to null.


(in-theory (enable sv-listp-true-listp
                   fv-listp-true-listp))

(defstructure frame (class (:assert (stringp class) :type-prescription) :read-only) (method (:assert (java-method-p method) :rewrite) (:struct (weak-java-method-p method)) :read-only) (pc (:assert (naturalp pc) :type-prescription)) (cia (:assert (naturalp cia) :type-prescription) (:default 0)) (locals (:assert (local-vars-listp locals) :rewrite (:type-prescription (true-listp locals))) (:struct (weak-local-vars-listp locals))) (stack (:assert (sv-listp stack) :rewrite (:type-prescription (true-listp stack))) (:struct (weak-tv-listp stack))) (object-ref (:assert (tv-ref-p object-ref)) (:default (ref-to-null)) :read-only) (new-refs (:assert (tv-ref-listp new-refs) :rewrite (:type-prescription (true-listp new-refs))) (:default nil)) (:options :guards (:verify-guards t)))


(defthm weak-frame-p-if-frame-p
  (implies (frame-p x)
           (weak-frame-p x)))

(defthm naturalp-frame-pc (implies (frame-p frame) (naturalp (frame-pc frame))))

(defthm frame-pc-linear (implies (frame-p frame) (<= 0 (frame-pc frame))) :rule-classes (:linear))

(defthm naturalp-frame-cia (implies (frame-p frame) (naturalp (frame-cia frame))))

(defthm frame-cia-linear (implies (frame-p frame) (<= 0 (frame-cia frame))) :rule-classes (:linear))

(defun frame-max-locals (frame) (declare (xargs :guard (weak-frame-p frame))) (java-method-max-locals (frame-method frame)))

(defthm naturalp-frame-max-locals (implies (frame-p frame) (naturalp (frame-max-locals frame))) :rule-classes (:rewrite :type-prescription))

(defthm rationalp-frame-max-locals (implies (frame-p frame) (rationalp (frame-max-locals frame))) :rule-classes (:rewrite))

(in-theory (disable frame-p))

(defmacro op-stack (frame) `(frame-stack ,frame))

(defthm stringp-frame-class (implies (frame-p frame) (stringp (frame-class frame))))

Here we see a slight variation on the theorems relating the strong and weak recognizers. The structure frame is not used directly as the assertion on a slot in a higher-level structure. Rather, a frame-listp is used. Since we expect to extract elements from the list of frames via car and cdr, we define the analogous lemmas using car and cdr, rather than binding.


(defthm weak-frame-p-car-if-frame-p-car
  (implies (frame-p (car x))
           (weak-frame-p (car x))))

(defthm weak-frame-p-cadr-if-frame-p-cadr (implies (frame-p (cadr x)) (weak-frame-p (cadr x))))

The main internal consistency requirement for frames is that the pc should refer to an instruction in the method, and that the method should be some method in the current class. Recall that a method is an alist from natural numbers to instructions, and that only the addresses corresponding to the beginning of an instruction are bound in the alist. Thus we can use bound? to check that the pc value is consistent with the method body. (We will see this done later in section [sec:standard-parts].)

In order to get good-frame-p accepted, we need to know that the bytecoded body is an alist. So it would be convenient to make the bytecoded body an explicit component in the java-method structure. We can then attach an assertion to that component.


(deflist call-stack-p (l)
  frame-p)

(deflist weak-call-stack-p (l) weak-frame-p)

(defthm weak-call-stack-p-if-call-stack-p (implies (call-stack-p x) (weak-call-stack-p x)) :rule-classes (:forward-chaining))

The dJVM State

Now we can define the overall dJVM state, and begin to specify the invariants that define a good dJVM state.


Future Extensions: The dJVM 0.5 gives only the simple, relatively local invariants on the dJVM state. Many additional constraints imposed by the JVM and the Java Language are yet to be. For example, we have not yet defined consistency requirements between class declarations, nor does the dJVM require that the current method is a member of the current class.

The field status reflects whether the dJVM is running normally or has encountered an error. dJVM execution halts if the status is anything except the symbol :running. The status value can be used to indicate why the dJVM halted.


(defstructure  djvm
  (heap        (:assert (heap-p heap) :rewrite
                        (:type-prescription (true-listp heap))))
  (class-table (:assert (class-table-p class-table) :rewrite
                        (:type-prescription (true-listp class-table))))
  (stack       (:assert (call-stack-p stack) :rewrite
                        (:type-prescription (true-listp stack)))
               (:struct (weak-call-stack-p stack)))
  status
  (:options :guards
            (:verify-guards t)
            ))
The top-most call-frame is the current frame. The class of the current frame is the current class, and the method of the current frame is the current method, and


(defun current-frame (djvm)
  (declare (xargs :guard (and (weak-djvm-p djvm)
                              (non-empty (djvm-stack djvm)))))
  (car (djvm-stack djvm)))

(defun current-method (djvm) (declare (xargs :guard (and (weak-djvm-p djvm) (non-empty (djvm-stack djvm))))) (frame-method (current-frame djvm)))

(defthm weak-djvm-p-if-djvm-p (implies (djvm-p x) (weak-djvm-p x)))

(in-theory (disable djvm))

Define abbreviated accessors for the program counter (pc) and the current instruction address (cia) fields of the current frame.


(defun djvm-pc (djvm)
  (declare (xargs :guard (and (djvm-p djvm)
                              (non-empty (djvm-stack djvm)))))
  (frame-pc (car (djvm-stack djvm))))

(defun djvm-cia (djvm) (declare (xargs :guard (and (djvm-p djvm) (non-empty (djvm-stack djvm))))) (frame-cia (car (djvm-stack djvm))))

(defthm naturalp-djvm-pc (implies (and (djvm-p djvm) (non-empty (djvm-stack djvm))) (naturalp (djvm-pc djvm))) :rule-classes (:rewrite :type-prescription))

Accessing the Heap

An alist whose domain is the natural numbers can be called a memory. The heap and several other data structures have this form. We define a few common operations on memories.


(defalist a-memory-p (l)
  (naturalp . t)
  (:options (:domain-type naturalp-listp)
            ))

(defthm a-memory-p-any-of (implies (or (heap-p x) (bytecode-body-p x) (local-vars-listp x) ) (a-memory-p x)) :hints (("Goal" :in-theory (enable heap-p bytecode-body-p local-vars-listp))))

(defthm a-memory-p-if-bytecode-body-p (implies (bytecode-body-p x) (a-memory-p x)) :hints (("Goal" :in-theory (enable bytecode-body-p))))

(in-theory (disable a-memory-p))

(defun memory-ref-p (x) (tv-ref-p x))

(defun good-memory-ref-p (ref memory) (declare (xargs :guard (and (memory-ref-p ref) (a-memory-p memory)))) (bound? (tv-val ref) memory))

(defun deref (ref memory) (declare (xargs :guard (and (memory-ref-p ref) (a-memory-p memory) (good-memory-ref-p ref memory)))) (binding (tv-val ref) memory))

(observe state-2790 (implies (heap-p heap) (naturalp-listp (domain heap))))

(observe state-2791 (implies (heap-p heap) (object-listp (range heap))))

(defthm deref-in-heap-is-object (implies (and (tv-ref-p ref) (heap-p memory) (good-memory-ref-p ref memory)) (object-p (deref ref memory))) :hints (("Goal" :in-theory (disable object-p))))

(in-theory (disable object-listp-true-listp instruction-list-true-listp method-access-flag-listp-true-listp java-method-listp-true-listp weak-java-method-listp-true-listp field-access-flag-listp-true-listp field-listp-true-listp class-access-flags-bit-listp-true-listp class-decl-listp-true-listp call-stack-p-true-listp))

Facts about the dJVM State

The theorems in this section are expressions of facts we expect to hold about any dJVM state. For example, the heap component of the state should always satisfy the predicate a-memory-p.


(defthm a-memory-p-djvm-heap
  (implies (force (djvm-p djvm))
           (a-memory-p (djvm-heap djvm))))

Some facts about frames and the call stack.


(defthm weak-frame-p-car-djvm-stack
  (implies (force (and (weak-djvm-p djvm)
                       (djvm-stack djvm)))
           (weak-frame-p (car (djvm-stack djvm)))))

(defthm frame-p-car-djvm-stack (implies (force (and (djvm-p djvm) (djvm-stack djvm))) (frame-p (car (djvm-stack djvm)))))

(defthm stringp-frame-class-car-djvm-stack (implies (force (and (djvm-p djvm) (non-empty (djvm-stack djvm)))) (stringp (frame-class (car (djvm-stack djvm))))))

(defthm java-method-p-frame-method (implies (frame-p frame) (java-method-p (frame-method frame))))

(defthm naturalp-java-method-max-locals-frame-method (implies (force (and (djvm-p djvm) (consp (djvm-stack djvm)))) (naturalp (java-method-max-locals (frame-method (car (djvm-stack djvm)))))) :rule-classes (:type-prescription :rewrite))

Some facts about methods.


(defthm stringp-java-method-class-name
  (implies (force (java-method-p method))
           (stringp (java-method-class-name method)))
  :rule-classes (:rewrite :type-prescription))

(defthm stringp-java-method-sig (implies (force (java-method-p method)) (stringp (java-method-sig method))) :rule-classes (:rewrite :type-prescription))

(defthm stringp-java-method-sig-frame-method (implies (force (frame-p frame)) (stringp (java-method-sig (frame-method frame)))) :rule-classes (:rewrite :type-prescription))

(defthm java-method-p-java-method-binding (implies (and (force (class-decl-p class-decl)) (java-method-bound? name sig (class-decl-methods class-decl))) (java-method-p (java-method-binding name sig (class-decl-methods class-decl)))))

(in-theory (disable java-method-bound? java-method-binding))

(defthm java-method-p-if-java-bytecode-method-p (implies (java-bytecode-method-p method) (java-method-p method)) :hints (("Goal" :in-theory (enable java-bytecode-method-p))))


This page is URL http://www.computationallogic.com/software/djvm/html-0.5/state.html