Re: Prediction of local code modifications
- From: George Neuner <gneuner2@xxxxxxxxxxx>
- Date: Fri, 04 Apr 2008 18:23:56 -0400
On 3 Apr 2008 21:22:40 GMT, Tim Frink <plfriko@xxxxxxxx> wrote:
Please attribute other posters so we can all follow the conversation.
e.g., Max Hailperin wrote:
(1) For each block, make a yes/no decision on whether to move it to
the faster memory. The address of each block within its memory (the
slow memory or the fast memory) is equal to the sum of the size of the
preceding blocks that were assigned to the same memory. That is, the
order of the blocks within each memory remains the same as in the
original layout and no padding bytes are introduced.
Sure, the order remains the same. However, the block addresses might
change due to additional jump instructions that I have to insert to
preserve correct program flow. Let's say I have this contiguous
allocation of blocks in memory:
A B C D
and b is the implicit successor, i.e. it's on the fall-through edge.
Further, it should be assumed that all blocks start at an 8byte
alignment boundary so that no penalties are incurred during fetching.
Moving B to faster memory, I must add an additional jump instruction
at the end of block A to direct control flow to B. This addition
instruction changes to increases the addresses of the following blocks
C and D ...
As long as the code blocks are in RAM and have a single entry and
exit, you can patch in the jumps dynamically as you relocate the code.
Taking your example, blocks A B C & D and relocating B -> B', you can
overwrite the beginning of B with a jump to B' and append to B' a jump
to C. No address changes are necessary in the unmoved blocks. If you
need to reuse the fast RAM code space, you restore the overwritten
words of B from B' before swapping in new code.
It's been a while since I've done any DSP programming and I don't know
what chip you're using, but if your compiler can generate position
independent code, it will simplify your task greatly.
For this problem, dynamic programming is indeed a useful technique.
Consider working sequentially through the list of blocks, deciding for
each one whether to move it to fast memory or not. The optimal decision
on whether to move block k does not depend on the specific choices you
made regarding each of blocks 1 through k-1. Instead, it just depends
on the total size in bytes of the blocks in the range from 1 through k-1
that were selected for fast memory. This total size is what determines
all three relevant facts: (1) how many bytes of the fast memory's
capacity are still available for blocks k and onward, (2) what the
address of block k would be if left in the slow memory, (3) what the
address of block k would be if moved to the fast memory.
Ok, this make sense if I decide "in advance" what blocks will be
moved. I always assumed a given memory layout where I tries to
successively move blocks without a "total" view at the entire set of
blocks.
However, I still see the problem with the additional jump instructions
that I have to add. I'm unfamiliar with dynamic programming (yes, I
will study some literature on this soon :-)), so maybe you can answer
my question. Can I express in the formulation of the knapsack problem
the fact that the movement of block N+1 to faster memory will lead to
an increased code (by the jump) of block N?
And I'm still not very sure how to integrate my misalignment problem
into the problem of dynamic programming. Let's say I have tree blocks
A B C D. I could in advance compute the execution time of each block
when it was placed in slow memory or in fast memory. The most
efficient way would be to put either the entire text section of the
code into the slow or the fast memory to get the two reference values
for that. However, these values are only true if considered as one
set. When I decide to move block B to faster memory, I can't guarantee
for block D (which was decided to stay in slow memory) that it's run
time is what was decided in advance since it might incur a fetch
misalignment penalty slowing down its execution below the values that
was determined when B was still between A and C and D was not
misaligned. Can such a situation be also modeled with dynamic
programming?
Yes. However, the best execution time will naturally be all code in
fast RAM. If your chip can do RAM -> RAM DMA (and you don't otherwise
need the DMA channel), you can arrange a close to ideal situation as
long as the fast RAM can hold each individual code block.
Place two small routines in fast RAM - one to DMA a specified code
block into fast RAM and the other to start execution of the new block
when the DMA is complete. Each code block should call the relocation
routine as early as possible to stage the next block for execution and
end with a call to the restart routine.
If you double buffer your code or if the current block is longer than
the subsequent block, you may be able to time the DMA operation so
that it overlaps execution of the current block and thus not lose any
time waiting for the DMA operation to complete.
George
--
.
- References:
- Re: Prediction of local code modifications
- From: Tim Frink
- Re: Prediction of local code modifications
- From: Max Hailperin
- Re: Prediction of local code modifications
- From: Tim Frink
- Re: Prediction of local code modifications
- Prev by Date: instruction bundling (scheduling?)
- Next by Date: Re: Prediction of local code modifications
- Previous by thread: Re: Prediction of local code modifications
- Next by thread: Re: Prediction of local code modifications
- Index(es):
Relevant Pages
|
|