Re: Interesting problems about high performance computing
- From: evansamuel@xxxxxxxxxxx
- Date: Wed, 20 Jun 2007 13:45:14 GMT
Most engineers still don't realize the processing power of the FPGA. True,
today's processors operate at high clock frequencies. People don't truly
understand how much of that raw speed is lost to processor overhead.
I created a FPGA process for processing real time 1280pixel
32bit camera scans to identifiy the leading edge of incoming documents,
determine
the skew angle and rotate the image in realtime while the image was still
being scanned. This process originally required 8 High speed DSP's with
significant
propagation delay and large amounts of memory.
I have also converted many software processes to the FPGA enviroment.
Each have had dramatic improvements in speed that no processor could
ever come close to matching.
Here is a short list of problems your software program may be
experiencing.
1. Almost 25% to 50% your speed is lost to opcode and memory access.
This number varies depending on CPU cache efficiency.
2. Depending on the software program size and memory access. You
could be forcing excessive cache dumps and reload which can reduce
speed another 10 to 50%.
3. You must then content with program efficiency. Is the program
optimized
for speed.
4. Last, the efficiency of your software compiler. Is it using extensive
use of libraries or inline code.
When a program is converted to hardware you eliminate items 1, 2, and 4.
You
are left with program efficiency, how well it is translated to hardware.
The first thing to do is reorder the statements in the program. Section the
program
into stages and identify the loop/repitition structure. Each stage has a
dependency on the previous computation. This
will usually lead to each stage having multiple computations. These can be
executed in parallel in the FPGA and many equations can be executed in just
one
FPGA clock cycle. After reordering the statements in the
proper order for hardware conversion, recompile the program and run to
insure in still functions correctly. This is now your basic template for
conversion.
If you use large amounts of data in the program beyond the capacity of the
FPGA,
you will need to create a multi channel DMA controller w/cache.
This controller will provide access to each stage needing external memory.
Second, when using decimal calculation, determine the maximum decimal error.
Floating point offers many advantages but slows down computation, consume
large number of resources and add to the overall complexity of the hardware.
You
should use fix point computation if possible.
Fix point decimal accuracy (decimal portion not including integer size)
1 byte = 2 decimal places
2 byte = 4 decimal places
3 byte = 7 decimal places
4 byte = 9 decimal places
5 byte = 12 decimal places
6 byte = 14 decimal places
Xilinx devices have built in multipliers (18x18) which are fast and save
hardware.
Division is possible but uses more resources.
A spartan3 can do the work. The Virtex 2, and 4 are faster and offer more
memory and multipliers with the addition of CPU's to help with other tasks.
A DSP does provide avantages over a standard processor
but does not compare to the raw power of the FPGA.
evansamuel@xxxxxxxxxxx
.
- References:
- Interesting problems about high performance computing
- From: hitsx@xxxxxxxxxx
- Interesting problems about high performance computing
- Prev by Date: Re: Linux 2.6.20 on MicroBlaze now available
- Next by Date: Re: Suggestions for Xilinx based evaluation board for image processing
- Previous by thread: Re: Interesting problems about high performance computing
- Next by thread: Re: Interesting problems about high performance computing
- Index(es):
Relevant Pages
|