Re: A novice in compiler construction wants feedback



On Sep 5, 9:35 am, "Marcus Johansson" <program...@xxxxxxxxx> wrote:

My intepreter is written in C. It has got 8 registers and a stack. They're
of a union type, something like:
typedef union {
int i;
float f;
DString *s;
Array *a;
}AnyType;

The compiled instructions are loaded into an array of, let's say:
typedef struct {
int command;
int dst;
int src;
} Instruction;

I've set up an array with function pointers matching the command values of
the instructions:
void CMD_MOVE_RR(void);
void CMD_MOVE_RC(void);
...
gCommands[MOVE_RR] = CMD_MOVE_RR;
gCommands[MOVE_RC] = CMD_MOVE_RC;
...

I then interpret the program with:

while (gProgramRunning) {
gCommands[gInstructions[gCurrentInstruction].command]();
gCurrentInstruction++;
}

Is this the usual way of doing it? And, above all, is there a smarter and
faster way of interpreting the instructions? Can I somehow get rid of all
those function calls above?

That's quite a tidy way of doing things, if you're using C.

However I couldn't manage to get things fast enough using C, or even
my own compiled language. The core (the main dispatch 'loop') of my
interpreter is in x86 assembler. I've put 'loop' in quotes because
actually it uses threaded code (jumps directly to the next command
handler).

I've managed to get down to just a 2 instruction overhead per command
(add ebx,offset; jmp [ebx]), although this relies on one register
being the 'program counter' (your gcurrentinstruction).

But overall, as the moderator says, this is very kludgy.

With C, you might look at Gcc which has a few helpful extensions, such
label variables (allowing threaded code using goto *command, although
I could never get this to work to my satisfaction.

A couple of things about your outline:

You're using the .command field of your gInstruction to index
gCommands; just put the gCommand address directly into the
gInstruction! (If you need the command index for error reports, you
can easily obtain this). And pointer step a pointer instead of using
an index (although your C compiler might take care of this).

You're also testing gProgramRunning on every single command, and of
course it is only true on the very last one. So perhaps use an
infinite loop, and detect a stop condition in the Stop command (I
assume you have a Stop command). Then you just have a small problem,
in C, of jumping out of the interpreter loop.

Perhaps use setjmp/longjmp (I've never used them), or use the
moderator's idea of putting all the code in one function; then it
might just be a goto (unless you've got stuff on the stack to take
care of).

--
Bartc
[Another thought is to make your VM instructions higher level so they do more
work between dispatches. I once did a VM for a version of the matlab math language,
and when a singler operator is "multiply large matrices" the interpreter overhead
is lost in the noise. -John]
.



Relevant Pages

  • Re: A novice in compiler construction wants feedback
    ... The compiler can be downloaded ... The compiled instructions are loaded into an array of, ... int command; ... There are faster ways of doing interpreter dispatch, ...
    (comp.compilers)
  • Re: Tcl_SetVar crash
    ... You've got to pass the interpreter to the command as an argument. ... int set_counter ... has a pointer to an initialized interpreter) ...
    (comp.lang.tcl)
  • Re: [MFC REQUEST] Filename completion in sh(1)
    ... pointer ckrealloc; ... void stunalloc; ... int unaliascmd(int, char **); ... Exit immediately if any untested command fails in non-interactive mode. ...
    (freebsd-current)
  • [PATCH] trivial: fix common spelling mistakes
    ... * well as any reset of TSC during the boot process. ... registers based on the command. ... * register value directly. ... int count; ...
    (Linux-Kernel)
  • [RFC: 2.6 patch] remove the broken SCSI_ACORNSCSI_3 driver
    ... * Abandoned using the Select and Transfer command since there were ... Once debugged, remove the #undef, otherwise to debug, ... -unsigned int dmac_address ... * Purpose: differentiate between commands that have a DATA IN phase ...
    (Linux-Kernel)