Need Information on how to create bytecode



Yeah i had that problem some time ago -- the best you seem to be able
to do is look at the opcode sets for other vm and working out what
subset you want. You also have to decide how your vm should operate,
either stack based or register based. Most VMs seem to be stack based
these days (CLR, JVM, ...) though a few are register based.

As far as your examples go they should end up looking fairly similar
to what they'd look like on a real (non-virtual) machine, eg
$someValue = 1 + 2
would become something along the lines of (on a stack based vm)
lit 1 ; load 1 onto the top of stack (TOS)
lit 2 ; load 2 onto the TOS
add ; pull top two elements off stack add them together and push result
sto someValue ; pop TOS and place into location for someValue

or (on a register vm)
lit r1, 1 ; load 1 into register r1
lit r2, 2 ; load 2 into register r2
add r3, r1, r2 ; r3 <- r1+r2
mov someValue, r3

and your more complex example:
$i = true;
if($i)
$otherValue = someFunction(arg0, arg1);
else
$otherValue = "Not true";

would become similar to this (stack based), labels are symbolic you
compiler would need to make them actual addresses
lit 1 ; most VMs i've seen don't actually recognise bool as being
distinct from an int
sto i ; store TOS to i
lod i ; load value of i to TOS
jpz else_label ; pop TOS, if value is 0 jump to else_label
lod arg1; the exact semantics for passing args is up to you
lod arg0;
call someFunction;
sto otherValue; assuming your calling sequence results in return
value being on the stack, but there are other options
; your calling sequence may require you explicitly clearing the args
of the stack
; or your vm may do it as part of its call sequence
jmp end;
else_label: ; start of else block
newstring "Not true" ; VMs like the CLR+JVM have an explicit new
string op as well as newobj, the resultant pointer is placed on TOS
sto otherValue
end:

The register vm version is similar, although function args and return
values are likely to be passed through registers.
A register based vm is much more complicated to implement, but can be
faster than a stack machine, especially on
machines that actually have registers. Most register based
approaches i've seen assume an infinite number of registers
and rely on the JIT to perform register allocation, which is
exceedingly non-trivial.

Given this is likely your first VM i'd strongly recommend that you
use a stack machine, it will make your life decidedly less painful.

As far as data structures go a simple struct a la :
typedef enum {lit, lod, lodarg, sto, jpz, jmp, call, newstring, ... }
optype_t;
typedef struct {
optype_t opcode;
union {
int litvalue; // for lit
some_type target; // for jmp, jpz, the address will obviously be
different in memory than in file
int offset; //for lod, and lodarg
function; //for call: in a file it should be a string, in
memory a pointer to the function info struct
...
} args;
} opcode_t;

That is a fairly simple approach but it would work.

As far as jitting goes, it is tremendously complicated, I recommend
you first see if you can go from whatever your opcode format is to
assembly that you can pass into an assembler, if you can get that
going then you know the native code you are generating is correct, so
you can set about generating binary opcodes -- but that is also
decidedly non- fun

Hope this helps,
Oliver
.



Relevant Pages

  • Re: Any "standard" names for..
    ... like special registers and the rest of the data stack as something like ... Simply TOS in a separate register, and its cheaper than OVER SWAP. ... utility work register A. NUP then is copy top of machine stack to A, ...
    (comp.lang.forth)
  • Re: RfD: Separate FP Stack
    ... the additional cost of a software-managed FP stack should be in the ... for general purpose CPUs (with separate integer and FP register sets), ... keep the integer TOS in an integer register, the FP TOS in an FP ...
    (comp.lang.forth)
  • Re: Cost of calling a standard library function
    ... because, though faster from dropping the stack and using EBP, they ... exact same register we copied from, meaning that we're copying a value ... this stuff makes sense and HLL compilers and HLL ... There's no need to optimise ...
    (alt.lang.asm)
  • Re: push pop ebp
    ... > mov ebp, 9000 ... you may use it there; EBP _is_ a general purpose register ... It also nominally the so-called "base pointer" register to be ... "stack frame"...this behaviour, though, is NOT enforced in any ...
    (alt.lang.asm)
  • Re: ml64, PROC and parameters
    ... _test_nested creates 98h bytes of stack space for the sake of the exercise. ... You should push on the stack only the non-parameter register. ... The first 4 parameters are in registers rcx, rdx, r8, r9. ... mov,rcx ...
    (microsoft.public.development.device.drivers)