misc: lang effort, performance
- From: "cr88192" <cr88192@xxxxxxxxxxxxxxxxxx>
- Date: Wed, 5 Jul 2006 15:31:20 +1000
well, yesterday, I was able to tune the speed of my vm (a very small set of
examples, but I figure them at least an ok to test the effectiveness of core
aspects of the vm):
int fib(int x)if(x>2)fib(x-1)+fib(x-2) else 1;
then again, other code would probably run faster or slower, but this is a
guesstimate.
I added a define that allowed me to enable/disable ref counting to see what
would happen.
disabling ref counting causes most of the run-time to shift over to the vm
step (and a few closely related functions), implying that the only real way
to get that much more in this case would be either static or jit
compilation.
note that normally my interpreter is compiled with full debug/profiler
options.
a c implementation of the algo takes about 110ms (generic compiler options).
calculating fib(32) with ref counting enabled, presently takes about 13s
(118x slower than C), and disabled about 5s (45x slower).
turning on compiler optimization, with ref counting enabled it takes about
3s (27x slower than C). disabling ref counting, it takes about 1.3s (or
about 12x slower).
note that the runtime is not all that worse if I use floats instead (about
1.8s in the last case). this is partly becuase floats have a lot of the same
optimizations as integers (can be represented directly as tagged values,
lots of type-specific bytecodes, ...).
note that floats are slightly less accurate in my vm, as the tag is stored
in the low-order mantissa bits (note that when conerting to the tagged form,
the value is rounded slightly to try to even things out).
since the intended use of the language consists of a lot of working with
numbers/vectors/matrices/... I put at least some effort into optimizing
these things (note that vectors and matrices, however, are still represented
on the heap, but oh well...).
note that I intend to leave ref counting on for the most part (but may try
to eliminate it wherever possible) mostly as it has uses related to
controlling memory use (in particular, avoiding invoking garbage
collection...), and does well enough imo to pay for its slowness.
conceptually, with ref counting on, most code should run in near-constant
memory (well, at least when not creating new data).
probably good enough for now as most of the "heavy lifting" will be done in
C anyways, but the lang may have at least some uses which demand speed
(manipulating geometry or similar...).
I guess this would depend a lot on how nice the interface is between the
lang and the host app.
I had started to consider the possibility of jit compilation. the main
difficulty then became how to approach the assembler portion of the process.
at first I had considered writing a conventional (textual) assembler,
thinking I would first generate a string, and then assemble it, but then I
realized that first off, I would need to generate the string, and secondly,
I would need to handle all manner of possible inputs.
in retrospect, this may not be the best option. I am now considering a
procedural assembler, which basically has a state or context and represents
a lot of common instructions as function calls (less common ones could be
strings, with more generic encode functions).
ASM_MovRegImm(ctx, ASM_EAX, BS1_MM_FALSE);
....
ASM_Instr(ctx, ASM_MOV_EAX_EDX);
ASM_InstrImm32(ctx, ASM_ADD_EAX_IMM, 12<<3);
....
quite possibly, the (named) opcodes will be macros referring to strings
giving the opcodes in question.
this is crude, but should be workable (and I can skip out on a lot of the
cases I will probably never need anyways).
or such...
now, for the functions that can be effectively jit compiled, the jit
compiler will take the bytecoded blocks and transform them into jitted
blocks...
now, this is likely to only apply to a small set of the possible function
pool, in particular:
functions that do not capture the environment, and do not generate any
closures;
only accept a fixed number of arguments;
are bound by static typing rules;
....
but, even if never fully implemented, the vm is still probably fast enough
imo...
eg, the fib example (assuming a sane jit compiler):
fib:
push ebp
mov ebp, esp
mov eax, [ebp+8] ;load x
cmp eax, 17 ;push_2 cmp_g_fn jmp_false(l0)
jle l0 ; <implicit>
;mark <absorbed, jit context>
mov eax, [ebp+8] ;load x
sub eax, 8 ;dec_fn
;call_cf
push eax ; <implicit>
call fib ; <implicit>
pop edx ; <implicit>
push eax ;mark <absorbed, implicit>
mov eax, [ebp+8] ;load x
sub eax, 16 ;dec2_fn
;call_cf
push eax ; <implicit>
call fib ; <implicit>
pop edx ; <implicit>
mov edx, eax ;add_fn
pop eax ; <implicit>
add eax, edx ; <implicit>
pop ebp ;ret
ret ;
l0: ;<implicit>
mov eax, 9 ;push_1
pop ebp ;ret
ret ;
so yeah, the jit would need to maintain a state as well.
likely the jit compiler would reject any function which does something it
can't understand (forcing it to be interpreted instead).
as can be noted, the calling convention is likely to be a modified form of
the c convention (due to bytecode reasons, the arg order will be reversed,
and tailcalls will be present).
it is as of yet unclear whether ref-counting will be used in jit compiled
functions.
for technical reasons, it may make sense to maintain a unified area for code
(allowing optimizing calls between compiled functions).
of course, thinking of it, some optimizations (those which I had originally
intended to make inplicit for declared functions) I may make explicit
instead.
final int fib(int x)if(x>2)fib(x-1)+fib(x-2) else 1;
will enable some optimizations (for callers), as final will basically be
equivalent to saying that this function will never be modified or overriden
(so, things like inlining and direct jumps will be possible).
or such...
.
- Prev by Date: Re: RAD vs. performance
- Next by Date: Re: RAD vs. performance
- Previous by thread: status (Re: lang effort: type conversions)
- Next by thread: Re: New Seed7 Release 2006-07-07
- Index(es):
Relevant Pages
|