Re: the 'switch' limit...
On Sun, Nov 01, 2009 at 12:59:06PM -0700, BGB / cr88192 wrote:
but, the hash is not used for opcode lookup/decoding, rather it is used for
grabbing already decoded instructions from a cache (which is based on memory
You could extend this to work on blocks of instructions rather than on single
instructions, where blocks start at places that are jumped to by the program.
If you do that, you will have to do the decoding of x86 code and the hash
table lookups far less often. I believe this is what QEMU does (they call
it "dynamic translation").
Whenever people say "I thought", they were usually wrong.
- Re: Hyperthreading vs. SMP
... >> How is memory contention maintained ... sharing the same cache. ... > the superscaler processor has multiple instructions in flight already ... > processor may also have speculative execution when conditional ...
- LL/SC (load-linked/store-conditional)
... architecture]] - instruction sets that separated load and store memory operations ... Many, probably most, implementations have found it easier to package the atomic RMW ... Typically LL/SC are implemented via a snoopy bus protocol: ... for LL/SC may be more complicated than they would be for non-LL/SC instructions. ...
- Re: Interesting presentation
... similar number of instructions to access. ... to generate a fault whenever the page is not in on-chip memory. ... drum with single page transfer per i/o ... ... instruction between main memory and extended store. ...
- Re: Superstitious learning in Computer Architecture
... Without a LOT of logic or some other better approach, re-executing the instructions requires re-decoding and it ties up the cache memory bus transferring more data as instructions than the instructions are working on. ... There is most of an order of magnitude in speed sacrificed by even HAVING a cache in a single ALU system, and more than an order of magnitude in multiple-ALU systems! ...
- Re: Iyonix instruction timings and RAM speed results
... I get 166/127 for main RAM and 5.7/62 for PCI Video memory on my original ... unrolled LDM/STM instructions for these, ... "add floating point" instruction takes almost 200 clock cycles, ... MOV R0,R0,LSL #1 ...