Re: Matlab Vectorisation Speed - How is it done in c++?
- From: "Tim Davis" <davis@xxxxxxxxxxxx>
- Date: Mon, 17 Dec 2007 13:42:47 +0000 (UTC)
sturlamolden <sturlamolden@xxxxxxxx> wrote in message
<08100346-586e-41fb-bb41-9d9342d269ed@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>...
On 17 Des, 01:05, Phil Winder<philipwin...@xxxxxxxxxxxxxx> wrote:
The test
Im currently porting some matlab algorithms to c++ code.
how fast theycode I have is testing the vector math capabilities and
I amcan go. I have found that it can be very, very fast and
matlab do it? Andstrugglling to reproduce the speed in c++. How does
hard, andhow can it be reproduced in c++?
Beating the performance of vectorized Matlab code is very
usually not worth the effort.such as blas/
Matlab makes calls to optimized C and Fortran libraries
atlas, lapack and fftw. You cannot duplicate theirefficacies in C++
for at least two reasons:makes Fortran
1. There are issues related to the language syntax that
particularly easy to optimize for compilers, such as lackof pointer
aliasing. This is particularly important for optimalallocation of
registers when the CPU goes into a tight loop.libraries as fast
2. A lot of effort have been put into making these
as possible. This includes optimal use of cache and branchprediction.
Duplicating these efforts on your own is going to take therest of
your life to complete.libraries as
My advice would be this:
If you want speed in your C++ app, link and call the same
Matlab do. Most of them are available for free. Give theC++ compiler
pointer aliasing hints wherever possible.your L1 cache.
In addition:
Use optimization level 3 on numerical code and level 2 on non-
numerical code. Process the data in chunks that fit in
Force the CPU to prefetch if you know it will help.Page-align your
arrays in RAM. Memory access is terribly slow, traverseas few times
as possible. Never use strided memory access. Manuallyunroll tight
loops. Exploit arithmetic pipelining of four subsequentoperations.
Avoid divisions, transform to a multiplcation. Exploitmultiple CPUs:
use MPI or OpenMP, forkjoin with labour-sharingthreadpools, etc. Use
inline assembly to access SIMD parallel registers.by D.
Also remember Hoare's statement about optimization, quoted
Knuth: "Premature optimization is the root of all evil incomputer
programming." Profile your code. Direct your optimizationsto the
important bottlenecks. They are likely to be few. Do asmuch as you
can with the bottlenecks, and never mind the reminding 90%of your
code.goint to be
But before you begin: ask yourself if the hard work is
worth the effort.
Regarding (1): I write in C and I haven't found (1) to be
that much of an issue (although I do worry about it and it's
well worth it for you to mention here). I think the more
recent versions of gcc are able to work around this issue.
More serious for C is the abuse of pointers (indirect
addressing, which requires lots of memory traffic). Memory
traffic is more of a problem than register allocation,
anyway (which you point out too, regarding the stride issue)..
Regarding (2): Yes, that's definitely true. One could
write an m-file script that does x=A\b without backslash, in
maybe 50 lines of M (LU factorization if square, QR with
Householder if rectangular). Backslash itself has maybe
250,000 lines of code (that's a guess, but an educated one
since I wrote about half that).
Some vector operations are trivial (a = b+c) to write in C
or Fortran. If you write a=b*c where b and c are matrices,
then there's no way you'll match performance in an optimized
BLAS library.
Rule of thumb: if the work is O(n) where n is the size of
the data, then there's a decent chance that simple C or
Fortran code can match (not beat) MATLAB. If the work is
higher than O(n) than you probably can't beat MATLAB with
simple C. Matrix add fits in the former category; matrix
multiply doesn't.
You can always call the BLAS / LAPACK yourself, in the dense
case, or use available C code for the sparse case. Lots of
the code in x=a*b, x=A\b, etc is open source.
.
- Follow-Ups:
- Re: Matlab Vectorisation Speed - How is it done in c++?
- From: Steven G. Johnson
- Re: Matlab Vectorisation Speed - How is it done in c++?
- From: Phil Winder
- Re: Matlab Vectorisation Speed - How is it done in c++?
- References:
- Matlab Vectorisation Speed - How is it done in c++?
- From: Phil Winder
- Re: Matlab Vectorisation Speed - How is it done in c++?
- From: sturlamolden
- Matlab Vectorisation Speed - How is it done in c++?
- Prev by Date: Re: play wav-files under Linux
- Next by Date: Re: speckle noise elimination
- Previous by thread: Re: Matlab Vectorisation Speed - How is it done in c++?
- Next by thread: Re: Matlab Vectorisation Speed - How is it done in c++?
- Index(es):
Relevant Pages
|