Re: Complex sub-block (bank) indexing



On 9/13/2010 2:47 PM, Paul A. Clayton wrote:
Given the use of banking to support multiple concurrent accesses,
would it be possible to decrease conflicts by not having every address
and way use simple bit-field-selection-based indexing?

Yes. Although in my terminology, it is the set index and the tag that is typically determined using bitfield selection.

Non-power-of-2 moduli have been extensively studied, e.g.

Using Prime Numbers for Cache Indexing to Eliminate Conflict Misses
by M Kharbutli - 2004 - Cited by 49 - Related articles
Mazen Kharbutli , Yan Solihin , Jaejin Lee, Eliminating Conflict Misses Using Prime Number-Based Cache Indexing, IEEE Transactions on Computers, v.54 n.5,

I believe it was well-known before then, the novelty perhaps being applying it to an L2 or L3.

Prime indexing an L1$ is probably too slow. Prime indexing an L2 may be able to hide the moduli. Main memory prime interleaving and banking is well known; e.g. IIRC Matrix used 17 way banking. We have discussed circuits for such moduli on comp.arch before.

There are other techniques. E.g. Alliant used a skewing function in accessing the cache shared between its parallel processors in the 80s/90s.

XORing a few upper bits into the set index is also quite common.

KEY: you probably need to have the same cache line size everywhere, so prime cache line sizes are a stretch too far given compatibility concerns. (They're a good idea; you just have to have control of more of the system than is typical.)

Oooo.... bitmask coherency, one of my latest hobbyhorses, has less need of having commensurate cache line sizes everywhere. Sure, you can use any mixture of line sizes in any WB cache system, at the cost of discarding partial lines and/or having to do RFOs, and more complicated HITM dirty probe response. But with bitmask coherency many of these problems go away. Discarding partial lines is now more of a performance optimizatin than a correctness issue.





.



Relevant Pages

  • Re: Examining the VM splay tree effectiveness
    ... Making general comparative statements on indexing methods without ... Splay trees work somewhat better as long as there is some locality ... the CPU cache line dirtying of the splay rotation is. ... the plus side not many other data structures are either. ...
    (freebsd-current)
  • Using multiple indices on a single table in a query
    ... but sometimes the columns just are worth indexing. ... cached, this would still be pretty quick. ... a single column index is much like a column cache. ... Why don't queries make use of multiple indices on a single table in a query? ...
    (microsoft.public.sqlserver.programming)
  • Re: Using multiple indices on a single table in a query
    ... but sometimes the columns just are worth indexing. ... > cached, this would still be pretty quick. ... a single column index is much like a column cache. ... Why don't queries make use of multiple indices on a single table in a query? ...
    (microsoft.public.sqlserver.programming)
  • Single Address Decode 2-way Skewed Associativity
    ... colored' addressed cache blocks in a higher level cache with directory ... limit the indexing variation, e.g., by using the control bits as ... but such might be beneficial for the tag ... accesses since a single ECC can be used (or error detection with error ...
    (comp.arch)
  • Re: Single Address Decode 2-way Skewed Associativity
    ... First, obviously to use the same index, the mapping of each ... pair mapping as the base and maintaining all the WAY 1 indexing ... to reduce some power-of-two stride conflict problems. ... for "elbow" replacement in which the best replacement candidate ...
    (comp.arch)