Re: What is the fastest method of parsing scheme?



Adrian wrote:

Allocation, initialization, and deallocation of heap
storage is usually faster in Scheme than in C, C++,
or Java.


But why would that be? In the case of C++, I can think of a few
possibilities:

- Frequent, small allocation and deallocation requests to the
underlying OS heap API
- Calls to object destructors for each small deallocation

These issues can be eliminated by the use of custom memory allocators
or pools. For example, you could construct a pool that preallocated
larger chunks of memory and that only allocates objects of a specific
size (that of the syntax object for example). The pool could also
designed such that the memory chunks are only deallocated as a whole at
the end of parsing without any call to each object's destructor,
resulting in a couple of deallocations rather than tens of thousands.

This is just speculation of course since I've learned that what you
think is intuitively true about optimization and performance often
turns out to be quite different when you actually go and measure
performance.

Any other ideas why Scheme would be faster than C++ and Java for heap
allocation? I thought Java would at least be somewhat equivalent to
Scheme given that they both use garbage collection.

The cost of allocation in Scheme[*] is the cost of incrementing a
pointer plus the cost of overflow checks. Overflow checks are not
necessary at every allocation since any series of conses, or closures,
or any objects of known size require only one check. The overflow
check can almost be aliminated altogether if the operating system
provides means of preserving the state of the system on segfaults.
A protected page is placed on top of the heap and if any allocation is
prefixed with a memory-set to the highest word in the object, then an
overflow can trigger the GC via a signal handler.

For example, in my compiler, the procedure (lambda (x y) (cons x y))
is compiled to (omitting error-checks and irrelevant code):

3: (ppc:addi acr ap 8) ; increment ap by 8, store result in acr
4: (ppc:cmpw 0 acr eap) ; compare acr to eap register
5: (ppc:ble 15) ; if enough room, goto 15
6: instructions for saving state and invoking
14: the garbage collector then falls through
15: (ppc:addi p0 ap 3) ; p0 is the tagged cons
16: (ppc:addi ap ap 8) ; ap is moved up
17: (ppc:stw ac p0 -3) ; store the car
18: (ppc:stw p1 p0 1) ; store the cdr
19: (ppc:blr) ; return

Now if I were really smart, I could've shaved another instruction or
two from this code; but I'm not, so it stays.

So, ignoring the return, we have 7 instructions: 3 for the overflow
check and 4 for the cons. The code has no memory references, only
two memory sets, which is the bare minimum since you have to store
the car and the cdr.

Now I have written a decent amount of high-performance C code for my
garbage collector, and from my experience, C is not easy to optimize.
You just cannot convince a C compiler to place values (like ap) in
a registers across calls that modify it and return the updated ap in
a register. A C compiler keeps on saving it on the stack and passing
a pointer to it, then loading it back from the stack. If you place it
in a global variable, then it loses all chances of being in a register.
So, everything you do allocation-wise will involve many memory saves and
loads that are not needed in the equivalent compiled Scheme code.

I don't know what a Java compiler would do. My impression is that you
cannot allocate a 2-word cons cell in Java. A pair has to be in a class
and must point to its class descriptor so it has to be 3 to 4 words in
size at the minimum. Doubling the memory footprint for something so
commonly used in Scheme will degrade performance (I cannot guess by how
much though).

Aziz,,,

[*] I'm talking about good native-code scheme compilers; not scheme
interpreters, compilers for byte-code interpreters, or JIT compilers
for byte-code objects.
.



Relevant Pages

  • Re: Java slow ?
    ... > not comparing apples to apples. ... The things that mainly make Java performance different from C++ ... than manual dynamic memory management in C++. ... None of those problems are solved with a native compiler, ...
    (comp.lang.java.programmer)
  • Re: Apache Axis C++ on the iSeries???
    ... but Java has only got faster since then. ... A Java-compatible JVM is integrated ... on your point of view) because a JIT compiler (used by ... the GarbageCollector reclaims memory no longer in use automatically. ...
    (comp.sys.ibm.as400.misc)
  • Re: C++ Garbage Collector on VMS?
    ... debugged application needs to have a system level process running in the ... background cleaning up allocated memory and other app cleanup stuff on ... That's just the Java approach. ... the Multics PL/I compiler, ca. 1967. ...
    (comp.os.vms)
  • Re: Compile Time Garbage Collection impossible?
    ... compiler which produces assembly code for a specific target instead of ... Google for region-based memory management. ... Java creates very tangled memory (+ tends to allocate much more than ...
    (comp.compilers)
  • Re: Small complete distribution
    ... initializing the DLL. ... using the .net/com compiler. ... My compiled code does not take up 100 meg of memory. ... more without java. ...
    (comp.soft-sys.matlab)