Re: Gforth and gcc "progress"
- From: anton@xxxxxxxxxxxxxxxxxxxxxxxxxx (Anton Ertl)
- Date: Sun, 24 Jun 2007 16:30:39 GMT
anton@xxxxxxxxxxxxxxxxxxxxxxxxxx (Anton Ertl) writes:
The slowdown of gcc-4.1 seems to come from bad register allocation and
a failure of copy propagation. I actually had to reduce what
--enable-force-reg does on this compiler, otherwise the compiler would
not compile, or produce wrong code. As an example of the low quality
of the resulting code, consider this:
0.6.9, gcc 4.1 0.6.9, gcc 2.95.4 0.6.2, gcc 2.95.1 optimal on K7,K8
Code + Code + Code + Code +
mov edi, 21C [esp] mov eax, 4 [esi] mov eax, 4 [esi] add ecx, 4 [esi]
mov edx, ebp add esi, # 4 add esi, # 4 add ebx, # 4
add ebx, # 4 add ecx, eax add ebx, # 4 add esi, # 4
mov ecx, 4 [edi] add ebx, # 4 add ecx, eax jmp -4 [ebx]
add edi, # 4 mov eax, -4 [ebx] jmp -4 [ebx] end-code
add edx, ecx jmp eax end-code
mov 21C [esp], edi end-code
mov ebp, edx
mov esi, -4 [ebx]
mov eax, esi
jmp eax
end-code
Ok, in order to get something better for this, I turned off caching
the TOS by default, resulting in the following +:
Code +
( $804C502 ) mov esi , dword ptr 8 [ebp] \ $8B $75 $8
( $804C505 ) mov eax , dword ptr 4 [ebp] \ $8B $45 $4
( $804C508 ) add esi , eax \ $1 $C6
( $804C50A ) add ebx , # 4 \ $83 $C3 $4
( $804C50D ) mov dword ptr 8 [ebp] , esi \ $89 $75 $8
( $804C510 ) add ebp , # 4 \ $83 $C5 $4
( $804C513 ) mov esi , dword ptr FC [ebx] \ $8B $73 $FC
( $804C516 ) mov edx , esi \ $89 $F2
( $804C518 ) jmp 804BAF6 \ $E9 $D9 $F5 $FF $FF
end-code
That looks better but the times are slower (see below). On a hunch I
checked ?BRANCH, and found:
Code ?branch
( $804BCDD ) add ebp , # 4 \ $83 $C5 $4
( $804BCE0 ) mov eax , dword ptr [ebx] \ $8B $3
( $804BCE2 ) mov edi , dword ptr 0 [ebp] \ $8B $7D $0
( $804BCE5 ) test edi , edi \ $85 $FF
( $804BCE7 ) jne 804BCF5 \ $75 $C
( $804BCE9 ) mov edx , dword ptr [eax] \ $8B $10
( $804BCEB ) lea ebx , dword ptr 4 [eax] \ $8D $58 $4
( $804BCEE ) mov esi , edx \ $89 $D6
( $804BCF0 ) jmp 804BAF6 \ $E9 $1 $FE $FF $FF
( $804BCF5 ) add ebx , # 8 \ $83 $C3 $8
( $804BCF8 ) mov esi , dword ptr FC [ebx] \ $8B $73 $FC
( $804BCFB ) mov edx , esi \ $89 $F2
( $804BCFD ) jmp 804BAF6 \ $E9 $F4 $FD $FF $FF
end-code
Yes, PR25285 strikes here (instead of the "jmp 804BAF6", a better
compiler would write "jmp esi"); that's the one the gcc maintainers
prefer to ignore. Ok, turn on the workaround for that (that's what
causes the slowdown between 2.95 and 3.4), and we get some mixed
results:
sieve bubble matrix fib
0.208 0.296 0.108 0.328 gcc 2.95.4 20011002 (Debian prerelease)
0.264 0.344 0.120 0.360 gcc 3.4.6 (Debian 3.4.6-5)
0.384 0.432 0.296 0.520 gcc 4.1.2 (default configuration)
0.476 0.748 0.280 0.476 gcc 4.1.2 STACK_CACHE_DEFAULT_FAST=0
0.364 0.524 0.288 0.472 gcc 4.1.2 STACK_CACHE_DEFAULT_FAST=0 condbranch_opt=0
condbranch_opt=0 is one of the workarounds for PR15242 and PR25825.
STACK_CACHE_DEFAULT_FAST=0 turns off caching the TOS by default.
In conclusion, no matter what we do, gcc-4.1 sucks. I have heard that
gcc-4.2.0 is similar (it does not build with 32-bit support on the
Debian boxes I have here, so I cannot check this myself).
- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2007: http://www.complang.tuwien.ac.at/anton/euroforth2007/
.
- References:
- Gforth and gcc "progress"
- From: Anton Ertl
- Gforth and gcc "progress"
- Prev by Date: Re: Gforth and gcc "progress"
- Next by Date: Re: Build your own Forth for Microchip PIC (Episode 837)
- Previous by thread: Re: Gforth and gcc "progress"
- Next by thread: Build your own Forth for microchip PIC (Episode 839)
- Index(es):
Relevant Pages
|