Re: System Generator pcore I/O performance results
- From: "eejw" <wilder_joel@xxxxxxxxxxx>
- Date: 11 Apr 2007 08:54:49 -0700
Newman,
Thanks for writing back.
I tried: 1. starting the timer 2. writing 8 samples 3. reading
timer 4. dividing timer result by 8 -->
This gave me an average write time of 20 cc's. So it did lower it
some.
It's interesting...I'm finding that it takes 21 cc's to read/write
data from/to external SRAM. I would think that the FSL link should be
*much* faster since it's accessing memory on-chip. In fact, the
mb_ref_guide states a latency of 2 cc's for using non-blocking "put"
and "get" operations for transferring data over FSL. Blocking
accesses stall until there is space available on the FSL. What I am
doing is a very simple design, and there shouldn't be any blocking, at
least not from the program I am implementing. There must be some way
to get better performance than what I'm seeing.
I'm not implementing cache with this design.
I looked at main.s and couldn't really make much sense of the assembly
code. I did searches for put, get, fsl and found nothing. I would be
interested to know how the compiler is translating to machine code as
well...is there some option for seeing c-code interspersed with
related assembly? I set compiler options to no optimization and
create symbols for assembly.
Joel
On Apr 10, 10:42 pm, "Newman" <newman5...@xxxxxxxxx> wrote:
On Apr 10, 11:34 pm, "Newman" <newman5...@xxxxxxxxx> wrote:
On Apr 10, 12:12 pm, "eejw" <wilder_j...@xxxxxxxxxxx> wrote:
Sorry...typo
16-bit word (not "16-byte word") in passing data from MB -> pcore.
On Apr 10, 11:07 am, "eejw" <wilder_j...@xxxxxxxxxxx> wrote:
Hello all:
I have a question regarding using SysGen to create a co-processor
that's used in a microblaze design. I'm using EDK v9.1 through the
base system builder wizard to create a design used on a Xilinx ML401
dev. board.
I've already generated a simple pcore and connected that to the
microblaze proc. in EDK. Data are being passed from MB -> pcore and
pcore -> MB through shared memory (using the "from register" and "to
register" in SysGen).
Using the provided function calls for communicating from MB -> pcore,
I do the following:
findavg_sm_0_Write(FINDAVG_SM_0_D0,FINDAVG_SM_0_D0_DIN, datasamp[0]);
findavg_sm_0_Write(FINDAVG_SM_0_D1,FINDAVG_SM_0_D1_DIN, datasamp[1]);
findavg_sm_0_Write(FINDAVG_SM_0_D2,FINDAVG_SM_0_D2_DIN, datasamp[2]);
etc.
To check performance, I start timer, do function call to write shared
memory, then read value from timer.
So it's just:
//start timer
findavg_sm_0_Write(FINDAVG_SM_0_D0,FINDAVG_SM_0_D0_DIN, datasamp[0]);
//read count register
I'm seeing that it takes 28 clock cycles to pass a 16-byte word from
MB -> pcore in this way. This seems *way* too long.
To improve performance, the API documents that were generated when I
created the pcore suggest to remove this line in the xparameters.h
file:
#define FINDAVG_SM_0_SG_ENABLE_FSL_ERROR_CHECK
I did that, but it doesn't help.
I didn't do anything special regarding connecting my pcore to the MB.
Just added it through the Hardware -> Configure coprocessor... tool in
EDK which connects the pcore to MB through an FSL.
Has anyone investigated this and can share any words of wisdom?
thanks,
Joel- Hide quoted text -
- Show quoted text -
could start timer
do 4 writes to different locations
then read the elapsed value
divide value by 4 manually
it would be interesting to see if the value is still 28 clocks
does MB have a cache?
chipscope or simulation would highlight what's going on
Newman- Hide quoted text -
- Show quoted text -
findavg_sm_0_Write(FINDAVG_SM_0_D0,FINDAVG_SM_0_D0_DIN, datasamp[0]);
also, disassemble the write function to see how efficiently it
compiled the instruction
I would think that it should be around 1 assembly op- Hide quoted text -
- Show quoted text -
.
- Follow-Ups:
- Re: System Generator pcore I/O performance results
- From: Newman
- Re: System Generator pcore I/O performance results
- From: eejw
- Re: System Generator pcore I/O performance results
- References:
- System Generator pcore I/O performance results
- From: eejw
- Re: System Generator pcore I/O performance results
- From: eejw
- Re: System Generator pcore I/O performance results
- From: Newman
- Re: System Generator pcore I/O performance results
- From: Newman
- System Generator pcore I/O performance results
- Prev by Date: Re: Spartan 3E Not enough block ram.
- Next by Date: Re: CPLD + µC with reasonably-priced tools?
- Previous by thread: Re: System Generator pcore I/O performance results
- Next by thread: Re: System Generator pcore I/O performance results
- Index(es):
Relevant Pages
|