Re: 21st Century ISA goals?



Tim McCaffrey wrote:


Somebody else posted something along the lines of "whats the big deal, a read from PCI can be counted in nanoseconds, or at most microseconds?".

It is difficult to keep this in perspective. With a 3.0 Ghz processor, a microsecond is 3000 clock cycles, with something like the Core 2 or Athlon 64, that is potential 12000 instructions that are not executed because your waiting for a PCI read to complete.

Also, all those memory reads are going to have (at least for the first access) are going to have worse case latency (50-200ns, or 150-600 cycles).

All these latencies (and others, like interrupt overhead, etc) add up and limit the total number of I/Os a second. Not a problem for a desktop, but a mainframe or server needs all that it can get.

Also note that any memory traffic that the PCI devices are causing is bandwidth the CPU(s) can't use (as Nick has pointed out, even with a 1333 Mhz bus bandwidth is a limitation once you get to 4 CPUs or more).

- Tim



I may have been the one guilty of the read or write comment :-), but thinking about it more, the whole concept of a low level io device bus closely coupled to the main cpu is obsolete. A compromise designed to reduce hardware costs from a time when micros were never meant to be serious computing devices. Channel style io probably took hundreds of ssi ttl devices to implement in the old days and the cost would have dwarfed most micro systems. Now, everything is micro systems and hardware is cheap. I do a lot of programming at hardware level, but looking at your post re pci transactions again, the thing that stands out most of all is the amount of cpu involved low level itsy traffic across the bus, ie: (5), where the device can't even get it's i/o descriptor / control block until cache has been flushed back to memory. As you imply, it's not just the r/w times, but more the knockon effects.

The answer might be a separate processor to manage i/o, closely coupled via an optimised interface to the cpu and memory on one side, perhaps dma transfers only, cpu microcode support, disable cache ability etc, Should be doable in a single lsi device now. The i/o bus itself then becomes a simple, agnostic, bare bones queued message interface talking to intelligent io devices. If bit serial, it could run over a variety of media and protocols. The io processor allocates a message queue for each device, which themselves have their own queues to make the whole thing asynchronous.

For many types of io, there is no reason why data needs to appear in main memory at all. For example, if network devices include the protocol stack and the graphics card understands html and java, internet data could be stream messaged directly from network to graphics devices with no cpu intervention other than setup and teardown. Similar scenario for server style disk to network transfers where no intermeadiate processing is required. Part of this might also be organising and storing data in a format that minimises processing between devices.

I'm sure none of this is new, but cpu style buses really need a decent burial, not more stuff glued on top...

Chris



--

----------------------
Greenfield Designs Ltd
Electronic and Embedded System Design
Oxford, England
(44) 1865 750 681
.



Relevant Pages

  • Re: How PC parallel port Is connected internally?
    ... P II, III, and IV chips all use a front side bus to communicate with ... chipset (that provides the PCI, AGP, and sideband channel pins, as ... A bus transaction from the cpu consisted of a transaction phase, ... On the "other side" of the chipset chips (north bridge) was the PCI ...
    (comp.arch.embedded)
  • Re: send more than 8 bits with parallel port
    ... The ISA bus was really nice and not too complex for a serious hobbyist ... trying to do a PCI card as a hobbyist? ... CPU, and were running embedded NT in the Microdyne/L3-com RCB2000 ...
    (sci.electronics.basics)
  • Re: A simple question about DMA, please help me.
    ... held by the DMA controller and the CPU is set idle until this transfer ... memory to fetch instructions while the DMA transfer is continuing. ... The PCI bus changed that -- it eliminated the separate lines for each ...
    (comp.lang.asm.x86)
  • Re: ULA&RAM above 32767
    ... Splitting data buses allows CPU and ULA to perform bus cycles to its corresponding memory, ... The problem is how to communicate the two data buses so the CPU can read and write to memory in the other data bus. ...
    (comp.sys.sinclair)
  • Re: Fresh install wont compile requirement libraries for cvsup
    ... > memory at 133MHz. ... > won't have a 7:1 multiplier and you'll ... pentium runs faster than the bus speed. ... 9:2) to obtain the 450MHz CPU speed. ...
    (freebsd-questions)

Loading