Re: RFC : SOME IDEAS FOR THE APPLE II FPGA'ers



Jorge Chamorro Bieling wrote:

<snip>

To run as fast as possible, the CPU has to be custom designed so as to
maximize the execution of *6502* intructions (which are at most 3 bytes
long), that's why I thought about a 3 bytes bus.

(later I found that a 3 bytes bus is odd to design for a number of
reasons, so I think it should better be 4 bytes wide)

It doesn't have to be a "bus", it's just a 3-byte "pull" from memory.
And 3 is a lot easier than 4, since 3 bytes can't cross two 2-byte bank
boundaries, and 4 can.

Think in terms of loading a 3-byte register from an arbitrary byte
address in a memory composed of two 16-bit banks...

<snip>

And (again) I don't agree with Michael when he says that :

"And there goes Apple peripheral compatibility--I'm out."


This does not mean that.

It's the MMU task to interface to the real Apple II in which this thing
should (hopefully, sometime) be plugged in.

Then you will have to slow down to 1MHz in the I/O space, and beyond a
relatively modest speedup (say 20x), it will not change system-level
performance much if you go faster between slowdowns--that was my point.

To understand this quantitatively, you have to pick an application or
a suite of applications that are important to users.

2. Write an emulator for whatever processor you can clock at ridiculous
speeds (much higher than you can clock an FPGA) and hope that you can
optimise it enough to outpace option #1. Don't forget you need to emulate all I/O in the system including video, which takes a large chunk
out of your headroom.


Given a certain FPGA,
Given the desire to execute *6502* code,
If you're looking for a fast *6502* instruction set processor...
Why would you put inside a *reduced* instruction set processor instead ?

....thanks, but no, thanks.


To suggest you can emulate a 6502 on a RISC processor core within an FPGA faster than a native optimised 6502 implementation is utterly ludicrous.


I believe what you say:

Given *the same* hardware resources (a certain FPGA, for example),
A direct implementation in hardware of a 6502 instruction set processor
has to run necessarily faster than a reduced instruction set processor
implementation emulating it in software.

It depends on the relative speed of memory. From the 1970's to the
1980's, logic was so much faster than memory (10x+) that it made
perfect sense *not* to implement the application architecture in
hardware, but to implement what was, for all the world, either a RISC
or VLIW processor in hardware and then microprogram (emulate) the
target architecture on it.

For any sufficiently memory-based architecture today, the same rules
will apply. If the logic can do 10 cycles in the time it takes to
access "main memory", then an interpretive implementation is quite
reasonable, and can perform indistinguishably from a direct hardware
implementation that is similarly paced by memory speed.

When you do things in parallel at different speeds, the slowest one
determines the time to completion. Having the faster one go faster
than that doesn't help.

The only reason I can think of to prefer going this way is that you can
pick a built, fast, tested, reliable, off-the-shelf risc core so you
don't have to design a processor.

A practical advantage.

And although speed may not be an advange then, there's one BIG
advantage, one that I like a lot: it is that the 6502 living inside it
is just software, and therefore can be easily debugged, modified,
upgraded, etc.

A capability advantage (of course, an FPGA "hard" implementation isn't
really *hard* either--but it may be more *difficult* ;-).

-michael

Music synthesis for 8-bit Apple II's!
Home page: http://members.aol.com/MJMahon/

"The wastebasket is our most important design
tool--and it is seriously underused."
.



Relevant Pages

  • Re: A chip too far? Where is your solution Mr Larkin?
    ... inconceivably vast task-switching overhead. ... configuring the hardware will be as loading a program into ... memory, you configure it ONCE for each program. ... With some tricks only part of the FPGA would be used. ...
    (sci.electronics.design)
  • Re: Interesting problems about high performance computing
    ... Most engineers still don't realize the processing power of the FPGA. ... Almost 25% to 50% your speed is lost to opcode and memory access. ... are left with program efficiency, how well it is translated to hardware. ...
    (comp.arch.fpga)
  • Re: High memory
    ... memory and then copy it up above 1MB...but if you want to put ... outside the CPU, memory is seen in a completely different light...this ... 1MB with "real mode memory" labelled on it or anything...the actual memory ... the system bus to actual hardware devices...hence, ...
    (alt.lang.asm)
  • Re: Crashes and the Blue Screen of Death!
    ... but the most common cause is hardware failure. ... The most common cause of this is hardware memory corruption. ... are listed on the Windows 2000 Hardware Compatibility List. ... recommended that all users install them as they become available. ...
    (microsoft.public.win2000.general)
  • Re: The variable bit cpu
    ... > However looking at the big picture the effort to scale fixed bit hardware ... those bits are kept in the opcode rather than in each memory ... And while a *few* applications might need to scale beyond 256 bits, ... And one *big* thing you're missing here is that have a variable-bit ...
    (alt.lang.asm)