Re: forth and virtual memory



Thomas Pornin <pornin@xxxxxxxxx> writes:
According to Anton Ertl <anton@xxxxxxxxxxxxxxxxxxxxxxxxxx>:
Writing a file with mmap() is more complex than reading a file with
mmap(); IIRC you have to specify the file length in advance with
truncate() or ftruncate(), so in some sense you do pre-allocate the
complete file (although on many OSs (f)truncate() alone creates a
sparse file, which does not need much space).

That's the problem, actually. On most Unix-like systems, ftruncate()
creates a sparse file, which means that the space is not really
allocated. When the application writes to the pages, the OS will
allocate space for those pages, but if the write are performed somewhat
randomly in the file,

Why should the application do that? And why should it then write
linearly if it does not use mmap(). If the data arrives non-linearly,
I would expect the application to seek in the file and write randomly
even with the write() interface, resulting in just the same
fragmentation (if any).

then this will lead to severe fragmentation on the
filesystem (filesystem allocation heuristics are tuned for low
fragmentation when new files are written linearly).

That depends on the file system. However, I don't know how usual file
systems deal with non-linearly written files. I expect that there
will be not much fragmentation in the sense that the file is
distributed across the disk. But if you read the file linearly, you
might have to perform some seeks; OTOH, if the data was written in
some funny order, I would expect it to be read in some funny order,
too, maybe even the same order, so ordering the blocks by allocation
time might even be a winning strategy.

There's also the effect that writes are delayed, so if the pages are
written fast enough, the file system might not even notice that they
were written in a funny order.

That's a nice dream, but in the real world I have yet to see a case
where that effect outweighs the overhead of copying garbage
collection.

The overhead of copying garbage collection can be very small, even
negative, in particular in systems with little memory.

True, on systems with too little memory (compared to live memory)
copying GC tends to fail quickly rather than thrashing around; it
fails, because there is not enough memory for the to-space.

Either kind of GC does work pretty badly in this situation, though. I
like GC, but if you believe that it works well in systems with little
memory, you are suffering from delusions.

I did write a Java-to-C translator for an embedded system; I programmed
a "mark-compact" GC which was not the fastest GC ever, but it turned out
to be appropriate. It used only a few percents of the computing time,
allowed for very fast allocation (just bump up the pointer), and it
could handle low memory situations satisfactorily (i.e. more than 90% of
available RAM used for live objects, with no extra administrative
pointers between objects, and elementary allocation still worked).

If it used only a few % of execution time in that situation, then the
application did not allocate much compared to whatever else it spent
its execution time on.

A compacting memory manager (GC or not) can be a good design for this
situation (it will fail even later, after even more thrashing:-), but
it's different from a copying collector.

In any case, that's not what you claimed earlier: you claimed that the
copying GC moves objects around to better fit the access patterns,
resulting in more cache-friendly behaviour by the program.

Yes, and it's not known as a champion of high performance.

What Java is known for, and what it actually does, are two distinct
things.

So you claim it is a champion of high performance? But then you also
claim:

According to my tests, the Java JIT compiler is somewhat equivalent (on
x86 32-bit hardware) to what "gcc -O" provides. However, if the code
uses arrays, then performance drops, because Java checks bounds on array
accesses, whereas C does not; hence, it is common to observe
computationaly intensive algorithms which run 2 or 3 times slower in
Java than in C.

Boy, Java takes a factor of 2-3 on array bound checks (after
optimization), when the typical numbers of the array bound checking
overhead I have read are 10% (and that's from papers on optimizing
them, which have every reason to inflate their significance).

Garbage collection, however, is not an issue here. Still according to my
tests, the GC itself uses an almost negligible amount of CPU.

My measurements indicate that some of the benchmarks (from SpecJVM98,
DaCapo) spend a significant amount of time in GC.

The main impediment to Java speed is that Sun engineers have figured out
that performance is overrated (especially the kind of performance which
is measured by microbenchmarks)

And yet they use a compilation strategy that excels at small
benchmarks: On small benchmarks the overhead of running an interpreter
at first and running a sophisticated compiler afterwards is relatively
small, and then they can benefit from the fast code that the
sophisticated compiler produced. On applications that strategy is not
so great.

BTW, if Sun is the impediment to Java speed, then Java should be fast
on the non-Sun JVMs. While I have seen some benchmark results that
indicate that there are faster JVMs than Sun's, the difference is not
big.

and thus concentrate on other areas
(e.g. properly scaling to machines with dozens of processors).

If performance is overrated, why would anyone use such machines?

Anyway, they certainly have to scale to these machines, or Sun won't
be able to sell them. But how is the scaling measured? With
benchmarks again.

BTW, on such big machines a GC that rearranges data for the cache
would be even more beneficial. Does Sun's JVM use such a GC? I think
they would have made big PR about it if they did.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2008: http://www.complang.tuwien.ac.at/anton/euroforth/ef08.html
.



Relevant Pages

  • Re: C++ vs Java "new" (no flame war please!)
    ... of such objects that would fit in available memory. ... memory and much more time than would the JVM for the equivalent Java ... The "allocation" ... If you have a small object with a string, ...
    (comp.lang.java)
  • Re: D gets it right
    ... >releasing memory properly using good old free, ... Java and ... (define (make-adder k) ... stack-based allocation method like that espoused by propoents of RAII, ...
    (comp.programming)
  • Re: A 21st Century Apple II?
    ... if you don't actually *read* the benchmarks and see what ... Java wins comprehensively are the only two that do either memory de- ... Don't forget to use the Intel C++ compiler if you want to compare ...
    (comp.sys.apple2)
  • Re: C++ vs Java "new" (no flame war please!)
    ... such objects that would fit in available memory. ... you have a small object with a string, ... so it would only do one allocation for the Foo object and none for the String. ... Even if one did copy the String, the time overhead of the Java allocation and copy would be much less than for the C++ code you showed. ...
    (comp.lang.java)
  • Re: forth and virtual memory
    ... filesystem (filesystem allocation heuristics are tuned for low ... in particular in systems with little memory. ... This just states that garbage collection, when properly done, is not ... the Java JIT compiler is somewhat equivalent (on ...
    (comp.lang.forth)