Re: Gforth and gcc "progress"



anton@xxxxxxxxxxxxxxxxxxxxxxxxxx (Anton Ertl) writes:
The slowdown of gcc-4.1 seems to come from bad register allocation and
a failure of copy propagation. I actually had to reduce what
--enable-force-reg does on this compiler, otherwise the compiler would
not compile, or produce wrong code. As an example of the low quality
of the resulting code, consider this:

0.6.9, gcc 4.1 0.6.9, gcc 2.95.4 0.6.2, gcc 2.95.1 optimal on K7,K8
Code + Code + Code + Code +
mov edi, 21C [esp] mov eax, 4 [esi] mov eax, 4 [esi] add ecx, 4 [esi]
mov edx, ebp add esi, # 4 add esi, # 4 add ebx, # 4
add ebx, # 4 add ecx, eax add ebx, # 4 add esi, # 4
mov ecx, 4 [edi] add ebx, # 4 add ecx, eax jmp -4 [ebx]
add edi, # 4 mov eax, -4 [ebx] jmp -4 [ebx] end-code
add edx, ecx jmp eax end-code
mov 21C [esp], edi end-code
mov ebp, edx
mov esi, -4 [ebx]
mov eax, esi
jmp eax
end-code

Ok, in order to get something better for this, I turned off caching
the TOS by default, resulting in the following +:

Code +
( $804C502 ) mov esi , dword ptr 8 [ebp] \ $8B $75 $8
( $804C505 ) mov eax , dword ptr 4 [ebp] \ $8B $45 $4
( $804C508 ) add esi , eax \ $1 $C6
( $804C50A ) add ebx , # 4 \ $83 $C3 $4
( $804C50D ) mov dword ptr 8 [ebp] , esi \ $89 $75 $8
( $804C510 ) add ebp , # 4 \ $83 $C5 $4
( $804C513 ) mov esi , dword ptr FC [ebx] \ $8B $73 $FC
( $804C516 ) mov edx , esi \ $89 $F2
( $804C518 ) jmp 804BAF6 \ $E9 $D9 $F5 $FF $FF
end-code

That looks better but the times are slower (see below). On a hunch I
checked ?BRANCH, and found:

Code ?branch
( $804BCDD ) add ebp , # 4 \ $83 $C5 $4
( $804BCE0 ) mov eax , dword ptr [ebx] \ $8B $3
( $804BCE2 ) mov edi , dword ptr 0 [ebp] \ $8B $7D $0
( $804BCE5 ) test edi , edi \ $85 $FF
( $804BCE7 ) jne 804BCF5 \ $75 $C
( $804BCE9 ) mov edx , dword ptr [eax] \ $8B $10
( $804BCEB ) lea ebx , dword ptr 4 [eax] \ $8D $58 $4
( $804BCEE ) mov esi , edx \ $89 $D6
( $804BCF0 ) jmp 804BAF6 \ $E9 $1 $FE $FF $FF
( $804BCF5 ) add ebx , # 8 \ $83 $C3 $8
( $804BCF8 ) mov esi , dword ptr FC [ebx] \ $8B $73 $FC
( $804BCFB ) mov edx , esi \ $89 $F2
( $804BCFD ) jmp 804BAF6 \ $E9 $F4 $FD $FF $FF
end-code

Yes, PR25285 strikes here (instead of the "jmp 804BAF6", a better
compiler would write "jmp esi"); that's the one the gcc maintainers
prefer to ignore. Ok, turn on the workaround for that (that's what
causes the slowdown between 2.95 and 3.4), and we get some mixed
results:

sieve bubble matrix fib
0.208 0.296 0.108 0.328 gcc 2.95.4 20011002 (Debian prerelease)
0.264 0.344 0.120 0.360 gcc 3.4.6 (Debian 3.4.6-5)
0.384 0.432 0.296 0.520 gcc 4.1.2 (default configuration)
0.476 0.748 0.280 0.476 gcc 4.1.2 STACK_CACHE_DEFAULT_FAST=0
0.364 0.524 0.288 0.472 gcc 4.1.2 STACK_CACHE_DEFAULT_FAST=0 condbranch_opt=0

condbranch_opt=0 is one of the workarounds for PR15242 and PR25825.
STACK_CACHE_DEFAULT_FAST=0 turns off caching the TOS by default.

In conclusion, no matter what we do, gcc-4.1 sucks. I have heard that
gcc-4.2.0 is similar (it does not build with 32-bit support on the
Debian boxes I have here, so I cannot check this myself).

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2007: http://www.complang.tuwien.ac.at/anton/euroforth2007/
.



Relevant Pages

  • Re: Fastest MacIntel Forth?
    ... Aparently no problem with writing close to data (or MacForth does not ... compile that code in such a way, or there was some luck involved). ... gives as good Intel code as it gets (apart from the transcendental ... M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html ...
    (comp.lang.forth)
  • STATE-less text interpreter (was: LC53 statistics)
    ... I guess for that one would use a factor of: that does not execute], ... Another suggestion for dealing with STATE and parsing words is to ... compile always, and, when "interpreting", execute each line (or each ... M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html ...
    (comp.lang.forth)
  • Re: RfD -- FVALUE vsn 2.0
    ... Another variation would use an intelligent COMPILE,: ... >body postpone literal ... M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html ... New standard: http://www.forth200x.org/forth200x.html ...
    (comp.lang.forth)
  • Re: eForth 64bit trouble
    ... semantics (i.e., through the xt2 field), but instead through the ... intelligent COMPILE,. ... M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html ...
    (comp.lang.forth)