Re: Spartan 3E Not enough block ram.



Ken Soon wrote:

Yup I saw this
# RAMs : 24
16x64-bit dual-port distributed RAM : 6
1920x12-bit dual-port block RAM : 9
1920x12-bit registered dual-port distributed RAM : 3
4096x36-bit dual-port block RAM : 1
4096x9-bit dual-port block RAM : 2
8x64-bit dual-port distributed RAM : 3
Since then I have been playing around with my codes, and I can identify
which instances in the codes are using which kind of RAM.

Currently I have tried to change some of the instances to use distributed
ram, but forcing the constraint RAM_STYLE to be pipe_distributed
And, going down the list of instances that uses the block ram, when I change
it for for my 6 horizontal and 3 vertical coeffcient instances, viola, it
immediately dropped down to 39 out of 36 block rams!
Hmm, strange, it dropped so much.

This is not strange: block-RAMs have 36bits-wide ports at most. Since you have some very small x64 memories that were previously forced into BRAMs, they ended up costing two BRAMs each. With three 8x64 and six 16x64 RAMs, this is 18 BRAMs recovered right there.

Anyway, next i try to work on some instances to change to use distributed
rams. However, I have to be careful to have a balance of not overshooting
the LUTs i have together with the block rams as well.
Then I worked with some line buffer modules under my top modules and well
after synthesis, everything was well under the resources limit. However, the
problem came i tried to implement it. The user constraint file that belong
trial synthesis had a timing constraint and my design timing was twice over
this constraint.
Then i used the timing analyzer and cross probe the problem and could see
the path looked to be quite long.

Distributed RAM is slow unless you give it many output register stages to redistribute: each LUT can provide 16bits and these are patched together with muxes to provide larger memories. Your address signals will also have huge fanout which further contributes to the slowness. Since your 1920x12 distributed RAM probably only absorbed one register, the very long paths you are seeing is from address bits down to some part of the way through the output muxes down to the absorbed FFs and then from those FFs through the remaining address muxes to the destination FFs.

So now, i tried to work on this problem by using some of the optimization
options in the ISE
Under the map properties, I selected map option level as high. The runtime
took really long, in the end, i got this message.

The router has detected a very high timing score (5245937) for this design.

I thinking of just trying to meet the timing. but when do or can I set the
option "-xe c". I dont see any dos command line for me anywhere...

Do not bother with increasing PAR effort, this will do you no good. You need to either put that 1920x12 RAM in BRAMs or add register stages that synthesis will redistribute within the distributed memory to improve your timing score. Start by adding two register levels to your 1920x12 distributed memory's output and your score will most likely drop from over 5M to possibly under 200k. Add extra registers until your timings are met or improvements stall. After this, you will need to realign your processing pipeline to account for the delays on this large distributed memory.

BTW, what was your LUT and slice-FF usage with that last attempt?
.



Relevant Pages