Re: Geriatric Pentium



In comp.lang.java.advocacy, Roedy Green
<my_email_is_posted_on_my_website@xxxxxxxxxxxxxx>
wrote
on Wed, 15 Mar 2006 20:29:20 GMT
<njsg12hblnnuetbfrtn179s8puo75tg0bo@xxxxxxx>:
Intel sent me by free copy of the Pentium user's manual in 1993. That
was 13 years ago, a half a century in Internet years.

It seems odd we have stuck with that architecture so long.

Consider how easily Apple moved from 68000 to Power PC to Pentium.
Application code is no longer anywhere near as tied to hardware
architectures as it was.

I wonder what a new generation would be like.

Little-endian, 32-bit paragraph/compatibility mode, most likely. This
is necessitated by Windows. (Dumb, I know.)


Here are some ideas I would like to see explored.

1. a sliding window of registers with mindless little background
processor scavenging spare ram cycles to back it up to ram and restore
from ram.

I have no idea precisely what this means, especially
since most processors have on board cache anyway, of
varying sizes. Briefly put, if the processor has seen
the address before, it doesn't have to go out to the
main memory buss to fetch it again, unless the record
of its prior visit has been flushed out in the meantime.
Note that if it stores something the cache is of course
updated, as well as a memory write cycle occurring.
It is also possible to invalidate cache entries.

The main problem with programs such as Java is that Java
gloms onto memory and won't release it back to the system
like a good little citizen.

Also, any "mindless process" involves transistors switching
-- power dissipation, which is an issue for mobiles and
other such.


2. a stack with the top N slots cached in registers with a mindless
little background processor scavenging spare ram cycles to back it up
to ram and restore it from ram

See above.


3. dozens of CPUS simulating several CPUs each.

This capability has been in Intel since the 486 timeframe;
It's called V86 mode. In fact, Linux has the capability
to run as a process, emulating an entirely new machine
(User Mode Linux). Software products such as Vmware are
also available.

IBM was doing this sort of thing since my college daze
(early 80's) -- and probably earlier.


4. make the chip itself simpler. Leave optimising to software. Use
the room saved for more on chip caching.

I'll admit I for one wouldn't mind, especially since the absolute
cleanest instruction set I've ever seen is in VAX systems.
Motorola 68000/68010 wasn't too bad, either (I've not fiddled
with newer variants instruction-wise).

The x86 family is, in comparison, a steaming pile of ... well, in
any event it could be vastly improved, if not replaced altogether
with something that at least resembles a RISC processor.


5. a dark room see http://mindprod.com/jgloss/darkroom.html

The main issue here is to ensure, presumably via a covering lake, that
the darkroom can never be reverse-engineered via an optical microscope
while the chip is operating. Otherwise, this about covers it.

Admittedly, there's a flip side: this is, after all, the USA, where
people write their pin numbers on the backs of their ATM cards.
(Maybe they're smarter in Canada? :-) )

Also, free software aficionados are going to have problems garnering
authorization keys.


6. a low overhead short term process, so that you could use it for
unravelling loops, giving i mod 4 to hyperthread processors .0, 1, 2,
3. Most of the time each iteration of a for:each has no connection
with the others.

Most of the time. However, loops such as

for(int i = 0; i < 4; i++) callme(i);

may require some analysis to do correctly. Note that the processor
will see something more along the lines of

MOVL #0,R3
L$1: CMPL #4,R3
BGE L$2
PUSHL R3
CALL _callme
LEA 4(SP), SP
INCL R3
BR L$1
L$2:

(actually, the machine code equivalent thereof).

Since callme() can do anything at all, the processor (or the compiler)
must be extremely careful here.

(Don't try to find the processor which can handle the above code; it's
most definitely ad hoc! :-) )


7. hardware gc assist.

I'm not sure how best to implement this. It would be nice, admittedly.
Of course, gc is process-specific anyway.


8. big enough address space so that all file/io was done via memory
mapping. Legacy styles would be simulated on top of that. Presumably
then a unified cache of memory could be more intelligently handled
for global optimisation.

How big a file are you contemplating here? 192-bit would be more
than enough to tag every electron on this Earth
(log_2(6.022*10^23 at/mol * 1 mol/g * 5.976 * 10^27 g/Earth) = 171.27).


9. program load is done in the ordinary case by memory mapping a
snapshot of a running program all fully initialised into RAM. It might
even have its files pre opened using a fast open, giving the OS hints
so it only has to verify nothing has changed in the usual case.

Program load is close to that now. The only thing that
really needs to be seen by the kernel during initial load
is the executable header, which is on the first page of
the executable. Once the initial PC is determined (stored
in a slot in the executable header), and the fixups handled
(that's an ugly mess no matter how it's done, though if one
can guarantee an all-position-independent-code instruction
set it's far easier), the pages will just fault in upon
first use.

--
#191, ewill3@xxxxxxxxxxxxx
Windows Vista. Because everyone wants a really slick-looking 8-sided wheel.
.



Relevant Pages

  • Re: speeding up my runtime on a c6713.
    ... it for cache, and possibly enabled the cache controller? ... I ran out of IRAM memory so I moved all my variables to ERAM. ... "Are you using all 256k of DSP ram, or have you reserved up to 64k of ...
    (comp.dsp)
  • Re: speed of int vs bool for large matrix
    ... >>as much data that will fit the Level 1 cache. ... I am not speaking about file I/O but RAM I/O. ... Memory access go through the bus at 300 MHZ, ...
    (comp.lang.c)
  • Re: speeding up my runtime on a c6713.
    ... it for cache, and possibly enabled the cache controller? ... I ran out of IRAM memory so I moved all my variables to ERAM. ... "Are you using all 256k of DSP ram, or have you reserved up to 64k of ...
    (comp.dsp)
  • Re: Slow File Load Through ODBC Driver
    ... buffer cache from 772 Megs to 132Megs ... virtual) ram immediately to fox when fox generated a large select statement. ... XP professional (suspecting kernel memory handling) -- no effect; ...
    (microsoft.public.fox.programmer.exchange)
  • Re: Steve Jobs demos Macintosh in 1984
    ... Woody wrote: ... Stop annoying the fucking hell out of me, ... For sure not much computer memory had been made in 1948 - but it could ... For sure you got a bit more RAM to use in the Speccy - ...
    (uk.comp.sys.mac)