Re: Directory implementation
- From: "Paul A. Clayton" <paaronclayton@xxxxxxxxxxxxx>
- Date: Fri, 20 Jul 2007 17:45:17 -0700
On Jul 18, 4:26 pm, MitchAlsup <MitchAl...@xxxxxxx> wrote:
It seems to me, that if one wants to build a system large enough that
a directory approach is warrented;
Then, use an ECC code to signal that the rest of the line contains the
directory information for that line. That is, you get an ECC-fail on
the read of the first (or all) words of the line with the property
that only a directory entry uses this syndrome, and that the data is
correct as it stands; its just not data its a directory entry.
This gives you hundreds of bits to exploit (512-ish) for unary mapping
of readers and a direct pointer to the writer, or any other crazy
scheme one wants to try out that uses lots of bits.
Mitch
Neat idea. Presumably one would want it to be an 'outrageous' ECC
failure (i.e., maximum number of inconsistent bits; perhaps with a
near maximal inconsistency being interpreted as indicating a
directory entry not data [i.e., something like parity protection for
the ECC code when acting as directory-indicator {would one want to
use such as a generic directory entry parity failure, broadcasting
a request for data and only taking a data failure if there were
no caches with valid data?}]). I guess in such a system caches could
not silently evict a clean block (since the main memory does not
contain the current data) and there would need to be something like a
clean owner state (i.e., on an eviction a clean owner would transmit
the data to the home node which could hold it until another sharer
confirms that it is now the owner or, when there are no sharers,
transfer it to memory) or an eviction buffer (holding on to the data
until either receiving a message that there was at least one cache
with valid data or receiving a message confirming successful transfer
of the data [after having received a message requesting the data]).
This would make using the ECC bits for instruction memory to hold
decode, branch, or other hints more difficult. (Software could
ensure that instruction memory is recoverable from on-disk data, but
such a slowdown for a not extremely rare case might not be acceptable.
Alternately, one could require broadcast invalidations when
instruction memory is written; but such might use too much
interconnect
bandwidth and even too much cache tag-check bandwidth.)
I take it that you do not think it worthwhile in such large systems
to have a slightly earlier receipt of directory entry (which would,
it seems, only be modestly beneficial when other caches hold the
data--perhaps allowing a faster transmission of the data from a
cached copy perhaps with lower bandwidth use than providing the data
from home or allowing a write to be made visible faster; the latency
benefit might be minuscule with the large average transmission times
[even if memory accesses are usually from nodes near the home?])?
(I am beginning to see how optimizing cache coherence in larger
systems can easily become quite complex. Ugh!)
BTW, thanks for the response!
Paul A. Clayton
just a technophile
.
- References:
- Directory implementation
- From: Paul A. Clayton
- Re: Directory implementation
- From: Paul A. Clayton
- Re: Directory implementation
- From: MitchAlsup
- Directory implementation
- Prev by Date: Re: Is memory coherence assumption essential?
- Next by Date: Re: Microprocessor with best known debugging features?
- Previous by thread: Re: Directory implementation
- Next by thread: Virtualization Effects
- Index(es):