Re: A way to speed up level 1 caches
- From: "Del Cecchi" <delcecchiofthenorth@xxxxxxxxx>
- Date: Sat, 24 Feb 2007 21:50:24 -0600
"Maynard Handley" <name99@xxxxxxxxxx> wrote in message
news:name99-A565EC.16270724022007@xxxxxxxxxxxx
In article <m3irdr9m8l.fsf@xxxxxxxxxx>,
Anne & Lynn Wheeler <lynn@xxxxxxxxxx> wrote:
Maynard Handley <name99@xxxxxxxxxx> writes:
Each L1 cache is now two identical such caches.
Which one is used is gated by whether the system is in user or
system
mode. This is a simple extra bit line, so doesn't hurt your cycle
time,
unlike doubling associativity to double the cache size.
Now we get the system material segregated in its world, the user
material segregated in its world, and the two aren't stumbling over
each
other.
we claimed about something similar in the 168 (and follow-ons)
... most of the stuff for virtual memory ... at least with the early
virtual memory system started at zero and grew upwards. starting at
least with 168, they chose the "8mbyte" virtual address bit for one of
the indexes. the issue was that the mainline batch system of the time
was MVS with 24bit (16mbyte) virtual address spaces ... with the
kernel occupying the first 8mbytes of every (application) virtual
address space ... nominally leaving 8mbytes in every application
virtual address space for application code.
Purposefully choosing the 8mbyte virtual address bit for one of the
index bits ... resulting in partitioning half for application code and
half for kernel code. there was some amount of complaints from other
operating systems that didn't have that way of organizing code.
This idea is pretty much the equivalent of what I'm suggesting, yeah.
(Or at least could be; I don't know enough about the associativity of
your TLBs at the time, etc, to know if it's exactly what I mean.)
Of course a global cache is better. No-one denies that a single 8*4KiB
cache beats two 4*4KiB caches. The point is, you can't get the 8-way
cache and meet cycle time. My suggestion is a very quick and simple way
to give a partition that is automatic, cheap, costs no cycle time, and
is useful for a range of workloads.
It's obviously not a general partitioning scheme, but it's not trying
to
be. It's a very specific exploitation of the fact that OS code/data is
both usually distinct from app code/data AND frequently interleaves
with
the execution of app code/data.
Something like
"A cache-based system is adapted for dynamic cache partitioning. A cache
is partitioned into a plurality of cache partitions for a plurality of
entities. Each cache partition can be assigned as a private cache for a
different entity. If a first cache partition satisfying a first
predetermined cache partition condition and a second cache partition
satisfying a second predetermined cache partition condition are detected,
then the size of the first cache partition is increased by a
predetermined segment and the size of the second cache partition is
decreased by the predetermined segment. An entity can perform cacheline
replacement exclusively in its assigned cache partition, and also be
capable of reading any cache partition. " US6865647: Dynamic cache
partitioning
There are a bunch more. I got 30 searching delphion on "cache
partitioned" and there was a bunch of history in the patents,
disappearing into the mists of time.
Here is the first claim of another (6295580)
What is claimed is: 1. A method of operating a cache memory arranged
between a processor and a main memory of a computer, the processor being
constructed and arranged to execute a plurality of processes wherein each
process includes a sequence of instructions, the method comprising:
a.. dividing the cache memory into cache partitions, each cache
partition having a plurality of addressable storage locations for holding
items in the cache memory;
b.. allocating to each process a partition indicator identifying which,
if any, of said cache partitions is to be used for holding items for use
in the execution of that process; and
c.. when the processor requests an item from main memory during
execution of said current process and that item is not held in the cache
memory, fetching the item from main memory and loading it into one of the
plurality of addressable storage locations in the identified cache
partition and when the processor requests an item from main memory and
that item is held in the cache memory, said item is accessed from the
cache memory regardless of the cache partition in which the item is held
in the cache memory;
d.. wherein the partition indicator is included in a group identifier
for the process, the group identifier identifying an address space for
the process; and
e.. wherein the processor issues addresses comprising a virtual page
number and a line-in page number and wherein a translation look-aside
buffer is provided for translating the virtual page number to a real page
number for accessing the main memory, the translation look-aside buffer
also receiving the group identifier and deriving therefrom the partition
indicator for the current process.
Del. Just sitting here in a blizzard.
.
- References:
- A way to speed up level 1 caches
- From: Maynard Handley
- Re: A way to speed up level 1 caches
- From: Anne & Lynn Wheeler
- Re: A way to speed up level 1 caches
- From: Maynard Handley
- A way to speed up level 1 caches
- Prev by Date: Re: Lecture on Transactional Memory
- Next by Date: Re: Santa Clara Valley: prepare next CA fest, Terje will be bored
- Previous by thread: Re: A way to speed up level 1 caches
- Next by thread: Re: A way to speed up level 1 caches
- Index(es):