Re: Implementation of pop count instruction



"Nick Maclaren" <nmm1@xxxxxxxxxxxxx> wrote in message
news:dhb5u7$8oh$1@xxxxxxxxxxxxxxxxxxxxxxx
>
> In article <43391010@xxxxxxxxxxxxx>,
> "Dan Koren" <dankoren@xxxxxxxxx> writes:
> |> > "Laurent Desnogues" <l-desnogues_delme@xxxxxx> wrote in message
> |> > news:dhav2r$lrf$1@xxxxxxxxxxxxxxxxxx
> |> >>
> |> >> You really should take results measured in tight loops with
> |> >> a grain of salt when using table lookups. It often happens
> |> >> that when your code is put in real code (as opposed to benchmark
> |> >> loop code) the result is very different due to cache trashing,
> |> >> even when the table is small.
> |> >
> |> > Hhmmm.... Do you realize the
> |> > entire table is 256 bytes and
> |> > fits completely in one or two
> |> > cache lines?
> |>
> |> Not to mention that this code is
> |> executed so frequently by the
> |> library routines that rely on
> |> it that it stays in the cache.
>
> Sigh. I can confirm Laurent's comment from actual measurements.

"Actual measurements" of my code ?!?

> You have missed the point.

Not any more than you did.

> The effect of introducing such a table is effectively to reduce
> the associativity by one for the range of lines covered by the
> table. This can be harmless, or can cause meltdown. It will
> fairly rarely cause major problems with 4-way associativity, but
> will quite commonly do so with 2-way and very commonly with
> direct mapped.
>
> And God help you if you have similar problems with the TLB :-(


God help HP ;-)



dk


.



Relevant Pages

  • Re: Alpha 21264? L2 cache (was Re: Off-chip cache considerations?)
    ... More associativity does not always require more pins for external caches. ... the processor chip, and the data array is external, this would require ... look-aside rather than a look-through cache configuration. ...
    (comp.arch)
  • Re: AMD quad cores: the whole story unfolded
    ... MCP and MCM are similar. ... how they plan to do the L3 cache. ... L1/L2 to preserve exclusivity... ... Because the advantage of higher associativity rapidly falls off above 2 ...
    (comp.sys.ibm.pc.hardware.chips)
  • Re: higher associativity and lower performance
    ... is of higher associativity, and the higher associativity has *lower* ... hit rate, if any? ... contain a subset of the data that's in the higher associativity cache. ...
    (comp.arch)
  • Re: Evaluation of caches
    ... direct-mapped cache of size N has about the same miss rate as 2-way set associative cache of size N/2. ... Also, http://en.wikipedia.org/wiki/CPU_cache has a diagram showing miss rate as a function of cache size and associativity for the SPEC CPU2000 benchmark, from referenced work by Hill and Cantin. ... Niklas Holsti ...
    (comp.arch.embedded)
  • Re: On-chip, 7-way associative
    ... ....appears to be a decent reference or primer. ... On-chip means the cache is physically a part of the CPU's chip. ... caches with more associativity suffer fewer ...
    (comp.os.vms)