Re: How does this make you feel?



Jan Vorbrüggen wrote:
> > g) And then, these features (essentially) disappeared from new ISA
> > designs, including those for most microprocessors.
>
> One of the few exceptions was the transputer. Because of the instructions
> supporting channel communications - which for processor-internal channels
> are just a special form of memcpy - it was natural to support this as a
> seperate instruction as well. For the second generation, a nice MOVE2D
> instruction was added that allowed you to, for instance, extract a column
> of a 2D matrix into a contiguous array, operate on it (e.g., perform an
> FFT), and scatter it back into the 2D matrix.

Nice. I seem to recall reading advertisments for the transputer in way
old Byte magazines.

> > + Maybe C and UNIX distorted CPU design, especially with RISCs
> > - Possible, but as I've posted various times, various RISC CPU
> > designers definitely cared about non-C languages and non-UNIX operating
> > systems.
>
> I do think this did play some role - I'm convinced it put support for
> descriptor-like data structures on a lower priority than it would other-
> wise have had.

C and UNIX originated a while ago. It's not too difficult to see that
their original designers never anticipated certain real-world problem
domains. POSIX .4 is but one example of a feature required of modern
systems that ends up being difficult to implement and use today.

> > + MAybe later designers' insistence on measuring performance impacts
> > versus implementation costs caused them to ignroe potentially-wonderful
> > features whose only problem was that they needed a new OS and new
> > language to make use of them.
>
> Rather, I'd think that smart programmers showed they could use RISC
> primitives to implement, say, a memcpy just as efficiently as microcode
> could, except perhaps for some of the cache effects (see below) and, of
> course, at the expense of quite complicated code to handle all possible
> alignments etc.

But this is what computers are for: makeing the job easier for the
user. If it is possible to offload work from the programmer in such a
common case, why not?

> > - It is no accident that it takes 2 pages to describe MVCL.
>
> Indeed. And it increases the likelihood some implementation gets some
> corner case wrong.

Laugh. Well there are always going to be people who don't fully read
the documentation as they should!

> > *Sometimes*, with enough work on the design, the hardware can
> > indeed do better if it know an entire address+length in one fell swoop.
> > For example, in a uniprocessor with write-back caches, smething like a
> > MVCL can avoid fetching a cache line that is abotut to be completely
> > overwritten.
>
> ...which is way ISA designer added such things was WH64 (write hint 64
> bytes - Alpha). That is of course much more generally useful besides
> being used by memcpy and friends.

Noted above.

> Incidentally, the transputer's MOVE instruction supplies a nice lesson in
> why such designs are difficult. It was, of course, interruptible, because
> supporting interrupts efficiently was a design goal for the transputer.
> In the case of an interrupt, the current state was saved by microcode into
> defined memory locations and restored later (due the design, there could
> only be one level of interrupt). However, the first implementations got
> the saved state wrong: when the instruction was resumed, it would re-read
> the last location that had already been processed. "Normal" memory doesn't
> care, but if you use this to read out a FIFO, you have a problem. Various
> software workarounds needed to be developed for this oversight...

Microcode updates are much more common these days, and so bugs in the
CPU are not quite the problem they once were.

This discussion has drifted a little off topic from the core of my
initial posting (which is natural for Usenet) and I feel that some of
you may have misread my intentions. Over the last couple of days I
have had some time to consider my position in light of your comments
and have decided that I obviously must make my case more explicit. So.

The idea that prompted my original post concerned a (possibly) new way
of constructing CPU registers. In my initial discussion, I suggested
that there might be a second pseudo-register that would modify the
behaviour of a conventionally conceieved register. Another poster
suggested implicitly that I could be talking about using a second
register to modify another. This point is not quite moot. I am
advocating a change in the way registers are conceived from a CPU
design standpoint.

I want to view a register as something that has mutable semantics. I
want it to apply to a general instruction set on some hypothetical
architecture in a meaningfull way. While it may make conventional
sense to modify the behavior of one register with another, as is seen
with various cannonical addressing modes today, this is not in line
with the philosophy that I am thinking about. I want you to think
about complexifying the way a register is implemented on-chip; if this
thought experiment logically indicates certain changes to the design of
a CPU or its ISA instrution set, that is part of the next step of
design. The specifics of its on-chip implementation are best left
unsaid at the moment.

For now, let's consider a hardware register construct and see what it
gives us in terms of its flexability. I suggest that for the purpose
of this discussion we are not going to talk about modifying some legacy
system. This is a _de novo_ design that turns on the structure of its
registers.

So, we might suggest 'R1' as a 64-bit register that may take on
additional properties that affect its application with an instruction.
The on-chip logic that accompanies the register includes provisions to
change its behavior (according to my previous suggestion). First,
there is a way to partition its bits into a address and length
component: specified perhaps by a special instruction that sets its
parameters:

rcfg r1, 54:8

.... which gives us 54 bits of addressing and eight bits of length.
There has to be a third component, register 'chunk' size, that in
effect dictates the 'word' size of that particular register. In this
hypothetical design, different registers may have different partitions
and 'chunk' sizes. So, we might have a hypothetical register config
instruction like this:

rcfg r1, 54:8:19

.... which gives us a register configured to address 'chunks' of 2^19
bits (65536 bytes), with 54 bits of address space, and 8 bits of
'length' which allows in this case a span of 16M in 'steps' of 64k.

I prefer this approach as it illustrates the idea of a register that
has attributes, rather than a register that is simply modified by
another. Such a register would be configured by software to work
within an arbitrary lexical scope in most practical applications. From
a compiler perspective -- at least -- this indicates an entirely
different approach to register allocation and use. This is a different
way to view memory addressing.

On the hypothetical chip, there are a bunch of differences. Instead of
simply storing a number, the value of a register now has context that
is dependent upon its configuration. The 'chunk', or effective word
size is variable. The partiition of the register is variable, subject
to interdependent semantic constraints which might mandate that an
'address' register (or a register used for addressing) must be capable
of addressing the entire virtual memory space. Futher, the CPU should
be able to validate addresses and addressing modes during ins. decode
in order to raise an appropriate signal.

It was mentioned earlier in this thread that in the past many things
have been tried, and it is implied that many innovations had proven
impractical. For this case, silicon implementation issues aside for
the moment, what is gained by complexifying registers?

o We would have a way of lexically scoping memory ranges and (dare I
say it) block size to a register in a flexable way.

o The semantics of *arbitrary* traditional instructions are
potentially made much richer, and in a way that reduces code size.
There is an entire discussion waiting in the wings to be had over how
traditional addressing modes might be applied to such registers:
register indexed, indirect, pre- and post incriment/decriment, etc.

o More functionality is taken on by the CPU directly, indicating a
shift from RISC towards CISC in the space of CPU ins. set tendencies.

o Potential efficiency improvements at the cost of more dedicated
on-chip logic. Plus bugs, of course.

o A architectural wedge towards improving the nature of current
virtual memory protection and paging schemes. Given a rethink of the
way memory is accessed by CPU instructions, it makes sense to consider
that the MMU would benefit from a congruent redesign in order to make
its design philosophy towards memory sync with the CPU.

As I am a complete amature at the CPU architecture game, I nevertheless
ask you to consider the potential benefits and liabilities to making
registers more intelligent. What does this hypothetical arrangement
potentially offer in terms of improvements over traditional register
and instruction set architecture?

As a software programmer and language dabbler, it seems to me that the
division of labour is set in the wrong place along the programmer/CPU
line. I believe that making the CPU smarter will make for an
environment that is more useful to the programmer, but of course I
cannot prove it just yet. At the minimum, I think that a CPU that
handles complex registers such as I describe would exhibit more
flexibility and better performance than traditional solutions. But
this is only my uninformed opinion.

I trust that my narrowing of focus on this issue clarifies my position
and philosophical approach to this issue. To those of you who have
taken the time to reply directly, you have my gratitude and apprection.


Regards,

Steve

.



Relevant Pages

  • Re: How does this make you feel?
    ... >> XOR instruction that applies to a 1M range of VM; ... > register gives a memory address, and the second gives a byte-count (up ... in a CPU that has one execution pathway. ... instruction set and the specifics of its addressing modes; ...
    (comp.arch)
  • Re: The coming death of all RISC chips.
    ... those has to be a branch or FPU instruction. ... performance killing restriction of only one register per op. ... Exposing the accumulators to the instruction set means that you cannot ... build a lower end CPU with fewer accumulators, or a higher end CPU with ...
    (comp.arch)
  • Re: Admired designs / designs to study
    ... But the instruction set was a complete bitch to work with. ... some sample assembler code in their ISA before building the cpu. ... It had an 8 bit opcode, and 16 regs needs 4 bits and that uses too ... the memory address register to use in memory access instructions. ...
    (comp.arch)
  • Re: interrupting for overflow and loop termination
    ... >> Don't most processors use register renaming when a register is ... "Most" distinct CPU designs are in-order issue, ... "Most" actual CPU chips are in-order, ... chance to participate in a CPU design, it is much more likely to be ...
    (comp.arch)
  • Re: The variable bit cpu
    ... It is almost trivial to implement a virtual variable-bit CPU in any ... at any moment, the n-bit bus will really only ... need a note in a register elsewhere as to where in the 32-bits the next ... this processor design is as clunky as an early IBM card sorter. ...
    (comp.lang.pascal.borland)

Loading