Re: Chuck's plan



On Sat, 30 May 2009 03:41:33 +1000, Jeff Fox <fox@xxxxxxxxxxxxxxxxxxx> wrote:

On May 29, 12:26 am, Wayne <news_putmynamehere...@xxxxxxxxxxxxxxx>
wrote:

So many questions. I thought they had all been answered multiple
times in the past. It is sometimes dangerous to answer a dozen

I was not around on all those discussions and forums. I tend to answer questions for other people when I have the ability, but I have been on groups where we have people that just ignore questions, and then complain if they see one that has been answered in the distant past. They don't respond very much to kindness, so now I have determined that if they don't like answering questions, they should start a wiki. They probably won't be pleased, but they are not too pleasing anyway.

questions when someone asks because they may turn each answer
into a dozen more questions and other people often join in and

Scientific exploration to find the facts, as some answers might leave people scratching their heads. Then, unlike real life, there is only so many questions on these things.

and ask the same questions again also. And one can get seriously
misquoted in the process.

Ask what questions again, clarify maybe, but I have not seen anything in the G range.

I am interested what in particularly stopped it
from having an automated memory bus, rather than the software driven bus?

A decade ago the on-chip architecture was asymetric. F21
had a Forth CPU, a video I/O coprocessor, an analog I/O
coprocessor, a network router, realtime clock, and a memory
interface coprocessor. If you added memory you could make
a symetric Forth multiprocessor with multiple nodes. But
the design being asymmetric meant that each part of the
design was different, had to be designed separately, had
to tested separately and had to programmed separately.
Furthermore the memory interface being hard-wired could only
support whatever chip interface was chosen at design time.
This required predicting which memory chips will be most
available and most inexpensive in the future which isn't
always possible.

Fairly easy to have guessed that, wherever DDR-? continues, is another thing. But I see what you may have meant with the memory bus controller, as I think sram like interfaces (for cache chips) that chuck originally announced in the early part of the century (similar to what low cost 1 transistor memory chips could carry), but of course, things like DDR and other mass-market interfaces would require a much larger memory interface controller than the more straight forward sram interface (still much better than nothing).

A design decision was made a decade ago to go to symetric
multiprocessing on-chip. Each node would be the same, so
there is only one processor to design not a half dozen
different ones for every chip designed. The idea was
that things done with dedicated hardware before would be
done with a Forth core and software this way.

At 10mhz (which it now turns out to be from a mistaken figure given as 10mwps) on multiple cores compared to 1Ghz on one core. I remember putting forward the idea of multiple cores being the same, but will having one that has a sram interface break the bank? I know at this stage, reconfigurability for different markets is prominent, so custom interfaces for sound and graphics are not important (however, a sram core with DAC's could be a good use for a optional sram interface core). However, for many mid to high level markets, the functionality of a sram interface (or on chip 1 transistor ram) is a valuable option (probably more on chip memory option these days).

On the memory side it means that the interface is more
flexible and can support a wider range of functions.
On the design side it means one design instead of
the thirty different designs Chuck did in the past.

Is that not a bit extreme, then having two cores (and you actually have various interface options on different cores as it is).


I would have thought that it would not be too much of a strain, as only
one core needed it?

Once a decision is made to use a symetric design instead of
designing thirty different coprocessor systems then there is
the requirement that all core be the same. Having one node
be completely different and be like the design from twenty
years earlier would violate design goals and be too much
of a strain by definition.

Hardly, but I understand the idealism.

It is like the group that insisted that a more powerful
router more independent and using less software for
interprocessor communication between nodes would
improve performance. People often ask, why not just
add this or that? An you can tell them that the thing
that they are describing makes each core ten times
as big and expensive and power hungry and isn't going
to give a ten times increase in overall performance.

I understand where you are coming from, but I am just thinking in terms of performance per mw, if an improvement can raise the ratio it is valuable, on the other hand, some applications require the absolute minimum power. A hybrid between master/salve and symmetrical makes sense, and the master can be very compatible with the programs on the slaves. At the same time, the current system requires very specific code on each of the cores, and the external cores carry additional functions, so you must mean the future G series is different from the intellasys (unless you are talking about pure instruction set symmetry). However, two cores is a lot less than 30.

However, most of the things I ask for are less than more in finale design requirements of a system, but then you might not appreciate the level of simplicity I am aiming at compared to regular approaches.

It is a little like the suggestion that what needs to be
done is to go back to programmable logic like more than
twenty years ago.

There is precious few FPGA that offer any performance advantage over the average chip, except the complex modules/memory/IO they include.

not such a bottle next to every other core, and can drag in data faster,
however, I am interested in how this played into the design?

I think the idea of the symetric design has been explained before.

Am, here a number of time incidentally, I don't actually remember it being covered in any of the the posts I read in times past (but then my ISP's news server did crash before I went away and I had no updates fro 2 or 3 months, and I was away for 5 years doing camera projects). The information on the intellasys was way too sparse and spread out for everybody to keep track of it (such as at the fig forth meetings in California) not on a wiki.

I know you mean each bit takes less than 250mhz, that is a lot lower then
the core frequency, or what intra chip buses now deliver.

Not really. If you service an external 18-bit port or a neighbor port
in a loop and it has a 4ns read and 4ns write and a 2ns loop branch it
is going to transfer 18 bits every 10ns whether it is between
neighbors
on-chip or to an external port or device.

Sorry, I thought we were talking serial inter-core.

It may sound low compared to a 700MHz core, but the idea is that
the core has the size, cost, and power consumption of chips that
are a hundred times slower. The 10ns 100Mhz loop looks pretty fast
compared to other small and cheap designs. You can't get
everything at once, there are tradeoffs.

Yes, and there is a long way to go, however, at this stage, more performance per core is the ability to remove cores and complexity.

comp.arch.embedded the fastest chips out there can only bit-bang
protocols at 4Mbps maximum and apparently no one else can do
it at 20 to 30mbps.

Still, if it means more cores to reach a performance level then the added functionality (when you don't see a question mark after a question I am usually being rhetorical instead fo asking a question).

I would have expected that a word could have been dumped to the serial bus (link wakes up other core with first bit, core software reads port to tos, other bits automatically follow/streamed across and dumped into tos) and it streamed across to the receiving core at 700mhz*18, well within the boundaries of present intra chip busses.

Are you joking? That is what happens with the serdes. You read or
write a word and it gets converted to or from a serial stream by
hardware.
But 700MHz * 18 bits would be 12.6GHz. The idea of 12.6GHz serial
connections in .18u is very unrealistic.

If you insist, ask Chuck first.

10ns loop in software.

Ok, thanks for that. My goodness, that is slow, even for software, that is
10Mhz..

No 10ns is 100MHz not 10Mhz. ;-)

You said:
You can pump two parallel ports at ~10mwps each
I took that to mean mega words (18bit) per second Jeff. If it is 100 mega words per second, thanks, that is great, and much better. All my zeros where there. This abates the need for a custom SRAM interface a bit.

I know that zeros confound some programmers but you seem to have
missed one. I would remind you that we have had discussions in
c.l.f about bit-banging a square wave on a 60Mhz ARM at 200KHz
with an interpreted threaded Forth in Flash, at 2MHz using
an optimizing "C" compiler, and at 4MHz using an optimizing
native code Forth compiler for ARM. The difference between
4 and 10 is big but the difference between 4 and 100 is more.

Jeff, who bit bangs a square wave in a ARM in the last ten years, they might, but I would have thought the preference in higher speed interfaces is hardware controlled.

You have demoted the Forth chip down to 10Mhz from 100Mhz.
10MHz is still faster than most stuff but you have missed
one of those pesky zeros. You keep making public statements
off by one order of magnitude on this. Answering usenet
questions can be like a tar baby.

Actually I didn't but you did, obviously quote:
You can pump two parallel ports at ~10mwps each

I am not here to be bullied into how everything I say that is right is wrong, which i have been getting lately, if you want logical objective debate that is fine.

So, you are saying sram would be restricted to 10Mhz, not 100 or
200Mhz?

No that's not what I said at all. I said that the fastest you can
stream a parallel port is 100MHz at 18-bits or 1.8Gbps and
with two this is a maximum of 3.6Gbps or 200Mwps. I also
explained in detail that that is not the signal fed by memory
chips.

I was clarifying, as I thought your previouse statement (quoted twice) was surprising, but now we have cleared that up, that is good, you don't have to repeat it.

Memory chips have address bus and control bus signals in addition
to their databus signals. When you factor those things in you drop
below 100MHz. But no a 10ns loop is not running at 10MHz! :-(

Sorry, when you said 10ns, I thought you were referring to serial i/o (have a look at your original post) as 10ns did not equal the subsequent 10mwps statement.

[further clarification to previouse 18bit serial suggestions, and the number of variations]

There are literally thousands of design constraints that one has to
be compatible with when making any change. The above paragraph would
need a couple of man months of discussion to flesh out into anything
real. If someone can sit down and design a circuit in a month or
two and show Chuck how it works he is happy. If someone expects him
to stop doing his work and discuss the design changes that they
think might be possible for a long time he is not so interested.

Jeff, pick what you want.

You quickly learn that he had thought through what you usually
think is a new idea to you and after a few hours of explanations
about how other things work and related constraints in other
parts of the design you realize that Chuck considered what
seemed like a new idea to you long ago.

Jeff, there is thinking through and there is better thinking through. If I had said, for one example, sram interface might be desirable, then that might seem a new idea to some people, but it does not mean it is, just a suggestion, Chuck and his design skills is not God, there is still room for other people. I appreciate the direction that Chuck is aiming for, and that this is a valid approach, not necessarily entirely the only or best. I usually find that people that say an idea is an old idea, are full of it, have not thought it through and less ability to do so, especially the non designers that travel on the coat tails of other people's successes in design (most professionals and hobbyists), as it invariably usually turns out that the modification I make to an old idea, makes it work, and because they lack the skill to access the difference between the old and the modification. I also appreciate that you have the insight I lack, and that I am not necessarily correct etc, but invariably when people pull the superior on me, they turn out to be wrong eventually. Even if Chuck were to examine it, I cannot guarantee that he will always be correct, just the nature of logic and the human condition. However, I respect your knowledge, so are you sure Jeff.

Removed other comments on 10ns thing.

I found it funny that people used to Windows insisted that
a 4megabyte address space was not big enough to do anything
despite most applications being about 1K in size.

I have been a fan of efficiency, and have a version of Geos Ensemble integrated object oriented OS and office suite, as a proof of what can be done in several megabytes (used to be a few, and less than one for OS and word-processor). To think of all the things we could do with 32 MB, and people now days think that 1GB is not that much.

Integrated mixed chips. Misc: Control core with large memory + array DSP cores, + custom cores and bits and pieces, anything can be turned on or
off at any time.

That sounds very conventional and like what most people do.

Just a bit of a hint, except they do it badly.

Putting RAM in a core was new to Chuck and the design has evolved
over the years. There are obvious advantages to having more memory
on-chip. One variation would be to have core sized chunks of
on-chip memory to allow two basic building blocks, core and ram.
In the future there may be chips with larger on-chip memory
or more multiply hardware or more sophisticated and higher
speed I/O circuits. But who knows.

Thanks for that.

[How cheap a mobile, these days]

$18 retail is the unsubsidised cost, the whole thing, no hidden
costs in a bundled usage plan. The processor cost was around $1
so there wasn't a large margin in reducing processor cost. The things
are incredibly complex for an $18 appliance and unless you own
the plants that make the components and are dealing with very
large volume you can't compete with that or hope to recover
development costs.

Amazing cost reduction, I thought that the previouse annulments were not that crash hot (there are also many Asian manufactures outside of the name brands).


Best Wishes

Well, thanks, sorry if I have been a bit straight forward, but after recent posts I am getting a little tired.

Forgot to make one addition/modification, my apologies.
.



Relevant Pages

  • Re: RFC: "Comet" board with AT91RM9200
    ... Before we cut down the design and start ... > The chip supports 32-bit SDRAM. ... It has only one CAN interface, but that is not a problem for me. ... The Cogent CSB337 costs about 400 USD and the Atmel ...
    (comp.arch.embedded)
  • Re: RFC: "Comet" board with AT91RM9200
    ... All design files and software will be licensed ... > job, the OPT blocks are nice to have, but make my design more complicated. ... The chip supports 32-bit SDRAM. ... > ESS CompactFlash Interface in True-IDE mode ...
    (comp.arch.embedded)
  • Re: Does anyone know of a 4 port serial MCU?
    ... Nice chip Leon if you really want that big a grunt. ... The core supply actually takes about 1.25A with all four cores ... They already have several large design wins. ... The single-core chip will be $1. ...
    (comp.arch.embedded)
  • Re: Chucks plan
    ... from having an automated memory bus, rather than the software driven bus? ... design was different, had to be designed separately, had ... This required predicting which memory chips will be most ... done with a Forth core and software this way. ...
    (comp.lang.forth)
  • matters of a celestial people and earths core changing
    ... The Design and Creation of Mindful-Souls took place in a much ... for by Powers of their Governing God Head, through their Mediator ... Now concerning the Seeds of King David that I am commissioned to look ... Earth's Core is Changing Back, and the completion of this change will ...
    (rec.crafts.textiles.quilting)

Loading