Re: Code density and performance?



"Nick Maclaren" <nmm1@xxxxxxxxxxxxx> wrote in message
news:db8tae$cl5$1@xxxxxxxxxxxxxxxxxxxxxxx
> In article <wra8y08e708.fsf@xxxxxxxxxxxxxxxxxxxxx>,
> Christoph Breitkopf <chris@xxxxxxxxxxxxxxxx> wrote:
>>
>>(http://research.compaq.com/wrl/projects/Database/isca98_1.pdf)
>>
>>states a rather bad instruction cache behavior for an Oracle
>>SMTP application. Since then, software has grown, and, thanks
>>to object-orientation, code that once was one big chunk is now
>>spread into methods all over the code space. Still, I am not
>>knowledgeable enough about caches to judge if the is really
>>a factor in overall performance.
>
> A quick glance at that gives a strong sense of deja vue - that sort
> of analysis dates from the 1960s and 1970s, and nothing has changed
> except for the scale. Your point was made then, too, as a reason
> that the then trendy 'structured programming' (with every little
> basic operation made into a single function) was a performance
> problem.
>
> Then, as now, the effect of Dcache has 'steps' in it, but that of
> Icache is more like doubling the cache size reduces the miss rate
> by 30%. One has to really use a HUGE cache to do anything useful
> about a 19% miss rate and, conversely, one has to compress code
> beyond reason if you use that approach.
>
> One solution adopted to the 'method' problem (which is the same as
> using a large, common set of basic operations) was and is inlining,
> but that leads to bloat. A better one was tools for code ordering
> (which sometimes allow replication), though that is closely related
> to overlay management and suffered the same fate. These approaches
> have the potential to tackle even 19% miss rates.
>
> Potential. They aren't easy to design or use, and are very out of
> fashion. One thing that most vendors seem to have missed is that
> they are a natural for profile-based optimisation, which some of us
> were doing in the 1970s for overlay designs. That could and should
> be reinvented. It shouldn't be patented, but probably will be.

That's exactly what Visual C -- and I presume other commercial
compilers -- are doing when compiling applications with POGO:

* Compiler decides which function should be optimized for speed,
and which for space; usually only ~5-15% of all functions are
optimized for speed.
* Compiler makes inline decisions based on profile results,
not on the usual heuristics. That means that we are inlining
more aggressively in the very hot functions, less aggressive
in the "normal" functions, and only when saving size in the
cold functions.
* Compiler often can speculatively inline virtual functions,
thus partially fixing part of the OOP-style problems.
* Compiler use profile results to layout functions.
* Compiler use profile results to layout code inside functions.

(I listed only optimizations relevant to current discussion; much
more optimizations benefit from profiling data).

Large server applications benefit from POGO much more
than SPEC benchmarks. SQL Server gains more than 30%.
You can dig out that number from official MS document at
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dv_vstechart/html/profileguidedoptimization.asp

Thanks,
Eugene

>
> Regards,
> Nick Maclaren.


.



Relevant Pages

  • Profile-driven optimization
    ... Recent versions of compilers perform profile-driven optimizations, ... compiler optimizations that come into my mind. ... This must be some sort of optimizations ... Multiflow's compiler was using profile ...
    (comp.compilers)
  • Re: C optimization questions
    ... Are these forms going to be seen any differently by the compiler for optimization purposes? ... Am I right to assume the compiler can figure out the right approach given any of these forms? ... but the best way is to simply compile and profile all three to see for yourself. ... They require half the storage, so twice as many are loaded into each cache line, which can halve the cache misses. ...
    (microsoft.public.win32.programmer.mmedia)
  • Re: WaitForSingleObject() will not deadlock
    ... One is to hijack the semantics of volatile to disable compiler optimizations ... and otherwise let the compiler to agressive optimization. ... Agressive optimizations are the ones that work on the edge of the semantics of the ... Because the compiler can see into lock and unlock, it is able to reduce f ...
    (microsoft.public.vc.mfc)
  • Re: WaitForSingleObject() will not deadlock
    ... represent an incorrect implementation of the language. ... the *compiler* does not guarantee this. ... but to state it in terms of the execution instead of the formal semantics of the language ... as long as the optimizations do not change the semantics of the language). ...
    (microsoft.public.vc.mfc)
  • Re: optimized code
    ... > loop invariants are handled by the JIT not by the compiler fron-ends. ... generates the best optimized MSIL of any of the .NET languages. ... standard native code optimizations on MSIL code. ...
    (microsoft.public.dotnet.languages.csharp)

Loading