Re: Gforth and gcc "progress"
- From: Andrew Haley <andrew29@xxxxxxxxxxxxxxxxxxxxxxx>
- Date: Sun, 24 Jun 2007 15:06:40 -0000
Anton Ertl <anton@xxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Today I played around with optimizations for Gforth in the context of
various gcc versions. With 0.6.2, the situation was:
gcc-2.95: nice and fast, although a little buggy.
gcc-3.0: slower, thanks to some GCSE misbehaviour
gcc-3.x, x>1 (certainly for 3.3 and 3.4): very slow: they fixed the
GCSE problem, but on the way destroyed gforth's dynamic code
generation and the branch prediction advantage of using threaded
code (PR15242).
gcc-4.x: also very slow: Thanks to anal-retentive syntax checking that
was introduced without prior warning, dynamic code generation is
turned off even though it would work in principle
What syntax does this refer to?
(PR15242 is mostly fixed, with the exception of PR25285, and they
ignore that).
So in the meantime we introduced workarounds for PR15242, but as a
result the performance suffered for the other compilers, too. As a
result, a few days ago the performance picture looked like this (on a
2.2GHz Athlon 64 X2):
sieve bubble matrix fib
0.248 0.340 0.112 0.388 0.6.9, gcc-2.95.4 --enable-force-reg
0.188 0.292 0.128 0.308 0.6.2, gcc-2.95.1 --enable-force-reg
Well, quite a bit of slowdown compared to 0.6.2. So today I worked on
getting some of the gcc-2.95 speed back and improving the gcc-4.x
speed. So the current CVS has the following speeds (all configured
with --enable-force-reg):
sieve bubble matrix fib
0.208 0.296 0.108 0.328 gcc 2.95.4 20011002 (Debian prerelease)
0.264 0.344 0.120 0.360 gcc 3.4.6 (Debian 3.4.6-5)
0.384 0.432 0.296 0.520 gcc 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
Nice progress by the gcc maintainers, eh?-(
The slowdown between 2.95 and 3.4 can be explained with PR15242; our
workaround helps a lot, but some slowness cannot be worked around (in
particular, not the part that I got back for 2.x and 4.x today).
The slowdown of gcc-4.1 seems to come from bad register allocation and
a failure of copy propagation. I actually had to reduce what
--enable-force-reg does on this compiler, otherwise the compiler would
not compile, or produce wrong code. As an example of the low quality
of the resulting code, consider this:
0.6.9, gcc 4.1 0.6.9, gcc 2.95.4 0.6.2, gcc 2.95.1 optimal on K7,K8
Code + Code + Code + Code +
mov edi, 21C [esp] mov eax, 4 [esi] mov eax, 4 [esi] add ecx, 4 [esi]
mov edx, ebp add esi, # 4 add esi, # 4 add ebx, # 4
add ebx, # 4 add ecx, eax add ebx, # 4 add esi, # 4
mov ecx, 4 [edi] add ebx, # 4 add ecx, eax jmp -4 [ebx]
add edi, # 4 mov eax, -4 [ebx] jmp -4 [ebx] end-code
add edx, ecx jmp eax end-code
mov 21C [esp], edi end-code
mov ebp, edx
mov esi, -4 [ebx]
mov eax, esi
jmp eax
end-code
The difference between 0.6.9 and 0.6.2 on gcc-2.95 is due to a
workaround for PR15242 that I have not (yet?) made
gcc-version-specific.
There seem to be a lot of different issues here, and it's quite hard
for me to disentangle them. Do you mean that with gcc 4.1, you can't
force all the Forth system pointers you need into registers, because
the compiler runs out of registers, but you could do this with earlier
compilers?
Andrew.
.
- Follow-Ups:
- Re: Gforth and gcc "progress"
- From: Anton Ertl
- Re: Gforth and gcc "progress"
- References:
- Gforth and gcc "progress"
- From: Anton Ertl
- Gforth and gcc "progress"
- Prev by Date: Onyx?
- Next by Date: Re: Build your own Forth for Microchip PIC (Episode 837)
- Previous by thread: Re: Gforth and gcc "progress"
- Next by thread: Re: Gforth and gcc "progress"
- Index(es):
Relevant Pages
|