Re: speeding up my runtime on a c6713.



jleslie48 wrote:
On Apr 29, 12:18 pm, "William C Bonner <wbon...@xxxxxxxxxxxxx>"
<wimbon...@xxxxxxxxx> wrote:
On Apr 28, 5:27 pm, jleslie48 <j...@xxxxxxxxxxxxxxxxxx> wrote:



On Apr 28, 7:09 pm, "William C Bonner <wbon...@xxxxxxxxxxxxx>"
<wimbon...@xxxxxxxxx> wrote:
Are you writing a program that utilizes the TI DSP BIOS, or just runs
straight on the processor with no BIOS?
Are you using all 256k of DSP ram, or have you reserved up to 64k of
it for cache, and possibly enabled the cache controller?
How many of your variables are automatic / stack variables, vs how
many are global?
I believe the pragma before any variables you want to label for a
different section is #pragma DATA_SECTION("sectionname") but I don't
have my compiler in front of me so I can't be sure. If that's correct,
then it goes along with #pragma CODE_SECTION("othersectionname").
Take note that the C++ does not declare the variable in the pragma,
while the C version does.
On Apr 28, 1:41 pm, jleslie48 <j...@xxxxxxxxxxxxxxxxxx> wrote:
I'm running way too slow, but I know I've done a few things
inefficiently on in my C program.
1) I ran out of IRAM memory so I moved all my variables to ERAM.
What did this cost me?
1A) How can I pick and choose where in memory my C++ variables
reside?
2) instead of using 'float variablea;" I used 'double variablea;'
what did this cost me and what can I expect by changing all my
variables to float (32 bit vs 64 bit.) ?
3) How else can I effect the runtime of my program, I see there is a
clock properties, setting, I know by removing all my fprintf's I pick
up save some 20% of the runtime, What about switching from debug to
release mode, or something else I haven't considered?
I'm using the TI DSP with BIOS.
"Are you using all 256k of DSP ram, or have you reserved up to 64k of
it for cache, and possibly enabled the cache controller?"
I don't know about the 256k DSP ram, or the reserve 64k cache, how
would check and for that matter what are the implications of it? I'm
new to DSP programming, and I've gotten as far as to get my routines
and algorithms to run, but now I need to optimize and I'm not sure how
to proceed.
" How many of your variables are automatic / stack variables, vs how
many are global?"
I was using mostly global static variables, specifically an array of
10,000 of a structure consisting of several double values:
typedef struct {
double dtimeindex;
double damplitude;
double dampfrombestfitline;
double dsdvalue;
} itmlstrec_type;
itmlstrec_type itemlist [10000];
this is of course over the top, but in the PC world of 2gb memory
machines, out-of-sight out of mind. Now that I'm dealing with a real
machine, I have remember my roots and build and design clean. The
above stucture is clearly full of pork, and needs to be trimmed. #1)
float will cut the size in half, and as I understand it,#2) the C6713
chip has a floating point math that's fast, but only for the 32-bit
version, with the 64-bit floating point precision values, I'm on the
slow boat to china...
"#pragma CODE_SECTION("othersectionname").
Take note that the C++ does not declare the variable in the pragma,
while the C version does."
pragma is fine with me, I've seen it used before but I've personally
never had the need to use it. I'm programming in straight C.- Hide quoted text -
I've spent the las two years porting code to a DSP from a windows
environment, so have gone through many of the problems you are facing
now.

I'm not using the BIOS environment. My build environment may be
significantly different because of that. I'm used to using a linker
command file with both a SECTIONS and MEMORY chunk in it, that map the
memory on my board, and then map the symbols into various memory
chunks. I'm not sure if you are using the BIOS if those items may be
configured graphicaly in the environment instead of in a text cmd
file.

I'm manually including the CSL (Chip Support Library) headers and
linking to the csl library. My memory map uses only 192k of internal
ram, and then I explicitly call the CSL call to enable caching of my
external ram using the other 64k of internal ram. (I'm using the 6713
DSP, which has 256k of internal L2 ram, up to 64k of which can be used
by the cache controller)

Your simple structure above takes up 32 bytes, so an array of 10,000
is taking up 320k, or 0x4E200. That would mean that it won't fit in
internal memory at all on a 6713. Converting to use floats instead of
doubles would drop you to 160k, which would at least fit in internal
ram, and leave you at least 32k for other code or variables, (or 96k
if you are not using cacheing.)

On a completely different subject, I think you started asking
questions on a DSP mailing list that I follow and were flamed by one
person for asking uninformed questions. I'd recommend sticking with
that list, and just fine tuning your questions a bit more with as much
supporting evidence as possible. For me, reading that list is a much
more common happening than news feeds. Usually the worst that happens
is to be completely ignored.

Wim.

Wim,

Thanks very much for your analysis. Monday morning will see how well
I run in float mode, and whether I fit back into IRAM memory space,
and most importantly if my runtime changes significantly. One more
follow up, what exactly is the Caching setting and where do I change
it?

Jonathan



In BIOS tcf GUI right-click on Global Settings and go to Properties. Then on 621x/671x tab you can set the cache mode to the setting you want. The 4-way cache corresponds to L2 being split as 64k cache and 192k SRAM. Be sure that you also set the MAR bitmask to 0x0001 such that your external SDRAM is cacheable. That will make a big difference when it comes to your performance with external code/data. Note that if yo use select the 4-way cache you must MANUALLY make sure that your IRAM section is not bigger than 192k or else you will probably have some run-time failure.

In terms of performance a single-precision floating point multiply ties up a functional unit for one cycle whereas double-precision floating point multiply ties up the functional unit for 4 cycles. There's also the obvious size differences which can affect your performance by using up more registers, etc.

Brad
.



Relevant Pages

  • Re: speeding up my runtime on a c6713.
    ... it for cache, and possibly enabled the cache controller? ... I ran out of IRAM memory so I moved all my variables to ERAM. ... "Are you using all 256k of DSP ram, or have you reserved up to 64k of ...
    (comp.dsp)
  • Re: speed of int vs bool for large matrix
    ... >>as much data that will fit the Level 1 cache. ... I am not speaking about file I/O but RAM I/O. ... Memory access go through the bus at 300 MHZ, ...
    (comp.lang.c)
  • Re: Geriatric Pentium
    ... processor scavenging spare ram cycles to back it up to ram and restore ... Note that if it stores something the cache is of course ... as well as a memory write cycle occurring. ... Program load is close to that now. ...
    (comp.lang.java.advocacy)
  • Re: Slow File Load Through ODBC Driver
    ... buffer cache from 772 Megs to 132Megs ... virtual) ram immediately to fox when fox generated a large select statement. ... XP professional (suspecting kernel memory handling) -- no effect; ...
    (microsoft.public.fox.programmer.exchange)
  • Re: speeding up my runtime on a c6713.
    ... it for cache, and possibly enabled the cache controller? ... then it goes along with #pragma CODE_SECTION. ... I'm using the TI DSP with BIOS. ... I don't know about the 256k DSP ram, or the reserve 64k cache, how ...
    (comp.dsp)