Re: System Generator pcore I/O performance results



Newman,

Thanks for writing back.

I tried: 1. starting the timer 2. writing 8 samples 3. reading
timer 4. dividing timer result by 8 -->
This gave me an average write time of 20 cc's. So it did lower it
some.

It's interesting...I'm finding that it takes 21 cc's to read/write
data from/to external SRAM. I would think that the FSL link should be
*much* faster since it's accessing memory on-chip. In fact, the
mb_ref_guide states a latency of 2 cc's for using non-blocking "put"
and "get" operations for transferring data over FSL. Blocking
accesses stall until there is space available on the FSL. What I am
doing is a very simple design, and there shouldn't be any blocking, at
least not from the program I am implementing. There must be some way
to get better performance than what I'm seeing.

I'm not implementing cache with this design.

I looked at main.s and couldn't really make much sense of the assembly
code. I did searches for put, get, fsl and found nothing. I would be
interested to know how the compiler is translating to machine code as
well...is there some option for seeing c-code interspersed with
related assembly? I set compiler options to no optimization and
create symbols for assembly.

Joel

On Apr 10, 10:42 pm, "Newman" <newman5...@xxxxxxxxx> wrote:
On Apr 10, 11:34 pm, "Newman" <newman5...@xxxxxxxxx> wrote:





On Apr 10, 12:12 pm, "eejw" <wilder_j...@xxxxxxxxxxx> wrote:

Sorry...typo

16-bit word (not "16-byte word") in passing data from MB -> pcore.

On Apr 10, 11:07 am, "eejw" <wilder_j...@xxxxxxxxxxx> wrote:

Hello all:

I have a question regarding using SysGen to create a co-processor
that's used in a microblaze design. I'm using EDK v9.1 through the
base system builder wizard to create a design used on a Xilinx ML401
dev. board.

I've already generated a simple pcore and connected that to the
microblaze proc. in EDK. Data are being passed from MB -> pcore and
pcore -> MB through shared memory (using the "from register" and "to
register" in SysGen).

Using the provided function calls for communicating from MB -> pcore,
I do the following:

findavg_sm_0_Write(FINDAVG_SM_0_D0,FINDAVG_SM_0_D0_DIN, datasamp[0]);
findavg_sm_0_Write(FINDAVG_SM_0_D1,FINDAVG_SM_0_D1_DIN, datasamp[1]);
findavg_sm_0_Write(FINDAVG_SM_0_D2,FINDAVG_SM_0_D2_DIN, datasamp[2]);
etc.

To check performance, I start timer, do function call to write shared
memory, then read value from timer.

So it's just:

//start timer
findavg_sm_0_Write(FINDAVG_SM_0_D0,FINDAVG_SM_0_D0_DIN, datasamp[0]);
//read count register

I'm seeing that it takes 28 clock cycles to pass a 16-byte word from
MB -> pcore in this way. This seems *way* too long.

To improve performance, the API documents that were generated when I
created the pcore suggest to remove this line in the xparameters.h
file:

#define FINDAVG_SM_0_SG_ENABLE_FSL_ERROR_CHECK

I did that, but it doesn't help.

I didn't do anything special regarding connecting my pcore to the MB.
Just added it through the Hardware -> Configure coprocessor... tool in
EDK which connects the pcore to MB through an FSL.

Has anyone investigated this and can share any words of wisdom?

thanks,
Joel- Hide quoted text -

- Show quoted text -

could start timer
do 4 writes to different locations
then read the elapsed value
divide value by 4 manually

it would be interesting to see if the value is still 28 clocks
does MB have a cache?
chipscope or simulation would highlight what's going on

Newman- Hide quoted text -

- Show quoted text -
findavg_sm_0_Write(FINDAVG_SM_0_D0,FINDAVG_SM_0_D0_DIN, datasamp[0]);

also, disassemble the write function to see how efficiently it
compiled the instruction
I would think that it should be around 1 assembly op- Hide quoted text -

- Show quoted text -


.



Relevant Pages

  • Re: System Generator pcore I/O performance results
    ... dividing timer result by 8 --> ... and "get" operations for transferring data over FSL. ... I'm not implementing cache with this design. ... interested to know how the compiler is translating to machine code as ...
    (comp.arch.fpga)
  • Re: Lahman, how ya doing?
    ... One also has to be careful about books with 'UML' or a specific OOPL in the title; they are good at describing how to express your design in the UML or OOPL syntax but they tend to be short on advice about coming up with a good design in the first place. ... The author has made clear in an early chapter that the book is about OOA, and what UML is introduced will serve that purpose. ... When we talked about things like hardware propagation delays before you didn't seem to think they were relevant (i.e., all the controller processing could be completed in a base clock tick interval (100 ms) in the real controller). ... But speaking of maintainability, if it were to be used in a real-time system with Timer reading the system clock rather than incrementing a counter, could it easily accomodate that? ...
    (comp.object)
  • Re: Strange Behavior
    ... design issues in this version. ... As to the Queue issue & searching it, there is only one server this is ... not tossing another timer at the problem. ... Thread creation is something that should take ...
    (microsoft.public.vc.mfc)
  • Re: Synthesisable Timer in VHDL
    ... i am currently working on a "toy" design of my first big project (in ... VHDL) on the Xilinx Spartan III starter kit. ... way of implement the timer in VHDL? ... clock period of your clock and the time period for your delays. ...
    (comp.arch.fpga)
  • Improving the design of a multi-threaded program
    ... I find it an interesting challenge to help you in improving your timer ... design. ... As far as I can see, the worker processes receive and execute disjoint ... The main controller feeds the timer process via a pool (implemented ...
    (comp.programming.threads)